| Message ID | 20260424200255.356798-5-mzamazal@redhat.com |
|---|---|
| State | Superseded |
| Headers | show |
| Series |
|
| Related | show |
On 24/04/2026 21:02, Milan Zamazal wrote: > + /* > + * The constructed table is compressed by converting from floats to bytes. > + * This makes the texture uploaded to a GPU smaller and we don't have to > + * deal with textures containing float values. > + * The byte range 0..255 represents floating point values 1.0..4.0. Values > + * outside this range are clamped. When accessed in the shader, the byte > + * range is represented by 0.0..1.0 range. Then the resulting pixel value > + * can be computed as > + * rgb + rgb * 3.0 * LUT_VALUE > + */ This is ~ certainly something we can skip with a compute shader. I'll bring a laptop on holidays and will get an RFC mutlipass pipeline done. I almost feel its worthwhile doing the work to fit this into a compute shader since we calculate float to byte and then do another conversion in the shader - when if we just fed in floats... Being more pragmatic. This code works now - but it is certainly the most obvious use-case for ... Actually why can't we just pass floats to the shader .. ? glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, data); and then rgb = rgb * texture2D(lsc_tex, textureOut).rgb; We would cut out a lot of jump through hoops that way. Yeah we don't really need a compute shader to upload a float. Yeah ... really thinking about this - I think the conversion to bytes is unnecessary. Just calculate everything in float and upload it to the GPU - our worst case is passing GL_RGBA16F and loosing some precision, probably preferable unless the maximum value of the float exceeds 65.5k which it probably ? doesn't 16f is plenty of precision for a gain LUT and halves the texture memory/bandwidth required. I'd try : glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_RGBA, GL_FLOAT, data); rgb = rgb * texture2D(lsc_tex, textureOut).rgb; And dropping of that conversion to bytes - seems like just overhead to me. Check "GL_OES_texture_half_float_linear" in the GL_EXTENSIONS - we already interrogate these and just don't do LSC if float/half-float is missing. BTW the internet machine tells me these two extensions became part of GLES 3.0 in 2012 and were optional before that so we are "pretty safe" in conjoining LSC to "GL_OES_texture_half_float_linear" IMO and gracefully refusing to run LSC without it. --- bod
Hi Bryan, Bryan O'Donoghue <bod.linux@nxsw.ie> writes: > On 24/04/2026 21:02, Milan Zamazal wrote: >> + /* >> + * The constructed table is compressed by converting from floats to bytes. >> + * This makes the texture uploaded to a GPU smaller and we don't have to >> + * deal with textures containing float values. >> + * The byte range 0..255 represents floating point values 1.0..4.0. Values >> + * outside this range are clamped. When accessed in the shader, the byte >> + * range is represented by 0.0..1.0 range. Then the resulting pixel value >> + * can be computed as >> + * rgb + rgb * 3.0 * LUT_VALUE >> + */ > > This is ~ certainly something we can skip with a compute shader. > > I'll bring a laptop on holidays and will get an RFC mutlipass pipeline done. > > I almost feel its worthwhile doing the work to fit this into a compute shader since we calculate float to > byte and then do another conversion in the shader - when if we just fed in floats... > > Being more pragmatic. This code works now - but it is certainly the most obvious use-case for ... > > Actually why can't we just pass floats to the shader .. ? > > glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, data); > > and then > > rgb = rgb * texture2D(lsc_tex, textureOut).rgb; > > We would cut out a lot of jump through hoops that way. Yeah we don't really need a compute shader to > upload a float. > > Yeah ... really thinking about this - I think the conversion to bytes is unnecessary. > > Just calculate everything in float and upload it to the GPU - our worst case is passing GL_RGBA16F and > loosing some precision, probably preferable unless the maximum value of the float exceeds 65.5k which it > probably ? doesn't > > 16f is plenty of precision for a gain LUT and halves the texture memory/bandwidth required. > > I'd try : > > glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_RGBA, GL_FLOAT, data); Ah, that's it, thank you for the tip. My previous float attempts resulted in a black screen, using 16 bits does the trick. With your suggestion (and s/RGBA/RGB/), it works and is ~3 % faster in my environment than the compressed bytes version. So I'll switch to floats in v4. > rgb = rgb * texture2D(lsc_tex, textureOut).rgb; > > And dropping of that conversion to bytes - seems like just overhead to me. > > Check "GL_OES_texture_half_float_linear" in the GL_EXTENSIONS - we already interrogate these and just > don't do LSC if float/half-float is missing. > > BTW the internet machine tells me these two extensions became part of GLES 3.0 in 2012 and were optional > before that so we are "pretty safe" in conjoining LSC to "GL_OES_texture_half_float_linear" IMO and > gracefully refusing to run LSC without it. > > --- > bod
Milan Zamazal <mzamazal@redhat.com> writes: > Hi Bryan, > > Bryan O'Donoghue <bod.linux@nxsw.ie> writes: > >> On 24/04/2026 21:02, Milan Zamazal wrote: >>> + /* >>> + * The constructed table is compressed by converting from floats to bytes. >>> + * This makes the texture uploaded to a GPU smaller and we don't have to >>> + * deal with textures containing float values. >>> + * The byte range 0..255 represents floating point values 1.0..4.0. Values >>> + * outside this range are clamped. When accessed in the shader, the byte >>> + * range is represented by 0.0..1.0 range. Then the resulting pixel value >>> + * can be computed as >>> + * rgb + rgb * 3.0 * LUT_VALUE >>> + */ >> >> This is ~ certainly something we can skip with a compute shader. >> >> I'll bring a laptop on holidays and will get an RFC mutlipass pipeline done. >> >> I almost feel its worthwhile doing the work to fit this into a compute shader since we calculate float to >> byte and then do another conversion in the shader - when if we just fed in floats... >> >> Being more pragmatic. This code works now - but it is certainly the most obvious use-case for ... >> >> Actually why can't we just pass floats to the shader .. ? >> >> glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, data); >> >> and then >> >> rgb = rgb * texture2D(lsc_tex, textureOut).rgb; >> >> We would cut out a lot of jump through hoops that way. Yeah we don't really need a compute shader to >> upload a float. >> >> Yeah ... really thinking about this - I think the conversion to bytes is unnecessary. >> >> Just calculate everything in float and upload it to the GPU - our worst case is passing GL_RGBA16F and >> loosing some precision, probably preferable unless the maximum value of the float exceeds 65.5k which it >> probably ? doesn't >> >> 16f is plenty of precision for a gain LUT and halves the texture memory/bandwidth required. >> >> I'd try : >> >> glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_RGBA, GL_FLOAT, data); > > Ah, that's it, thank you for the tip. My previous float attempts > resulted in a black screen, using 16 bits does the trick. And I wonder why... > With your suggestion (and s/RGBA/RGB/), it works and is ~3 % faster in > my environment than the compressed bytes version. So I'll switch to > floats in v4. > >> rgb = rgb * texture2D(lsc_tex, textureOut).rgb; >> >> And dropping of that conversion to bytes - seems like just overhead to me. >> >> Check "GL_OES_texture_half_float_linear" in the GL_EXTENSIONS - we already interrogate these and just >> don't do LSC if float/half-float is missing. It's not listed among EGL_EXTENSIONS reported in my environment but it still works. >> BTW the internet machine tells me these two extensions became part of GLES 3.0 in 2012 and were optional >> before that so we are "pretty safe" in conjoining LSC to "GL_OES_texture_half_float_linear" IMO and >> gracefully refusing to run LSC without it. >> >> --- >> bod
diff --git a/src/ipa/simple/algorithms/lsc.cpp b/src/ipa/simple/algorithms/lsc.cpp new file mode 100644 index 000000000..585957e74 --- /dev/null +++ b/src/ipa/simple/algorithms/lsc.cpp @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: LGPL-2.1-or-later */ +/* + * Lens shading correction + */ + +#include "lsc.h" + +#include <algorithm> +#include <cmath> + +#include <libcamera/base/log.h> + +#include "awb.h" + +namespace libcamera { + +namespace ipa::soft::algorithms { + +LOG_DEFINE_CATEGORY(IPASoftLsc) + +int Lsc::init([[maybe_unused]] IPAContext &context, const YamlObject &tuningData) +{ + int ret_r = lscR.readYaml(tuningData["grids"], "ct", "r"); + int ret_g = lscG.readYaml(tuningData["grids"], "ct", "g"); + int ret_b = lscB.readYaml(tuningData["grids"], "ct", "b"); + + if (ret_r < 0 || ret_g < 0 || ret_b < 0) { + LOG(IPASoftLsc, Error) + << "Failed to parse 'lsc' parameter from tuning file."; + return -EINVAL; + } + + return 0; +} + +int Lsc::configure([[maybe_unused]] IPAContext &context, + [[maybe_unused]] const IPAConfigInfo &configInfo) +{ + return 0; +} + +void Lsc::prepare(IPAContext &context, [[maybe_unused]] const uint32_t frame, + [[maybe_unused]] IPAFrameContext &frameContext, DebayerParams *params) +{ + unsigned int ct = + context.activeState.awb.temperatureK.value_or(kDefaultTemperature); + + const LscMatrix matrixR = lscR.getInterpolated(ct); + const LscMatrix matrixG = lscG.getInterpolated(ct); + const LscMatrix matrixB = lscB.getInterpolated(ct); + + /* + * The constructed table is compressed by converting from floats to bytes. + * This makes the texture uploaded to a GPU smaller and we don't have to + * deal with textures containing float values. + * The byte range 0..255 represents floating point values 1.0..4.0. Values + * outside this range are clamped. When accessed in the shader, the byte + * range is represented by 0.0..1.0 range. Then the resulting pixel value + * can be computed as + * rgb + rgb * 3.0 * LUT_VALUE + */ + DebayerParams::LscLookupTable lut; + constexpr unsigned int gridSize = DebayerParams::kLscGridSize; + auto float2byte = [](float factor) -> uint8_t { + return std::round((std::clamp(factor, 1.0f, 4.0f) - 1.0) / 3.0 * 255); + }; + for (unsigned int i = 0, j = 0; i < gridSize * gridSize; i++) { + lut[j++] = float2byte(matrixR.data()[i]); + lut[j++] = float2byte(matrixG.data()[i]); + lut[j++] = float2byte(matrixB.data()[i]); + } + params->lscLut = lut; +} + +REGISTER_IPA_ALGORITHM(Lsc, "Lsc") + +} /* namespace ipa::soft::algorithms */ + +} /* namespace libcamera */ diff --git a/src/ipa/simple/algorithms/lsc.h b/src/ipa/simple/algorithms/lsc.h new file mode 100644 index 000000000..ef3474aec --- /dev/null +++ b/src/ipa/simple/algorithms/lsc.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: LGPL-2.1-or-later */ +/* + * Lens shading correction + */ + +#pragma once + +#include "libcamera/internal/matrix.h" + +#include <libipa/interpolator.h> + +#include "algorithm.h" + +namespace libcamera { + +namespace ipa::soft::algorithms { + +class Lsc : public Algorithm +{ +public: + Lsc() = default; + ~Lsc() = default; + + int init(IPAContext &context, const YamlObject &tuningData) override; + int configure(IPAContext &context, + const IPAConfigInfo &configInfo) override; + void prepare(IPAContext &context, + const uint32_t frame, + IPAFrameContext &frameContext, + DebayerParams *params) override; + +private: + using LscMatrix = Matrix<float, DebayerParams::kLscGridSize, DebayerParams::kLscGridSize>; + Interpolator<LscMatrix> lscR; + Interpolator<LscMatrix> lscG; + Interpolator<LscMatrix> lscB; +}; + +} /* namespace ipa::soft::algorithms */ + +} /* namespace libcamera */ diff --git a/src/ipa/simple/algorithms/meson.build b/src/ipa/simple/algorithms/meson.build index 73c637220..c9f6e5590 100644 --- a/src/ipa/simple/algorithms/meson.build +++ b/src/ipa/simple/algorithms/meson.build @@ -6,4 +6,5 @@ soft_simple_ipa_algorithms = files([ 'agc.cpp', 'blc.cpp', 'ccm.cpp', + 'lsc.cpp', ])