Message ID | 20241120180104.1221846-10-mzamazal@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series |
|
Related | show |
Hi, awesome to see this series! I currently giving this a try - found two issues so far: 1. Currently things crash because of the wrong gamma clamp, see below. 2. With that fixed I get a Red<->Blue channel switch. Haven't found the line yet, but it also happens without this patch, so probably happens earlier in the series. On 20.11.24 19:01, Milan Zamazal wrote: > This patch applies color correction matrix (CCM) in debayering if the > CCM is specified. Not using CCM must still be supported for performance > reasons. > > The CCM is applied as follows: > > [r1 r2 r3] > [r g b] * [g1 g2 g3] > [b1 b2 b3] > > The CCM matrix (the right side of the multiplication) is constant during > single frame processing, while the input pixel (the left side) changes. > Because each of the color channels is only 8-bit in software ISP, we can > make 9 lookup tables with 256 input values for multiplications of each > of the r_i, g_i, b_i values. This way we don't have to multiply each > pixel, we can use table lookups and additions instead. Gamma (which is > non-linear and thus cannot be a part of the 9 lookup tables values) is > applied on the final values rounded to integers using another lookup > table. > > We use int16_t to store the precomputed multiplications. This seems to > be noticeably (>10%) faster than `float' for the price of slightly less > accuracy and it covers the range of values that sane CCMs produce. The > selection and structure of data is performance critical, for example > using bytes would add significant (>10%) speedup but would be too short > to cover the value range. > > The color lookup tables are changed to unions to be able to serve as > both simple lookup tables or CCM lookup tables. This is arguable but it > keeps the code easier to understand if nothing else. It is important to > make the unions on the whole lookup tables rather than their values > because the latter might cause a noticeable slowdown on simple lookup > due to the sheer fact that the table is not compact. Using unions makes > the initialization of the tables dubious because it is done before the > decision about using CCM is made but the initial values are dubious > anyway. > > The tables are copied (as before), which is not elegant but also not a > big problem. There are patches posted that use shared buffers for > parameters passing in software ISP (see software ISP TODO #5) and they > can be adjusted for the new parameter format. > > With this patch, the reported per-frame slowdown when applying CCM is > about 45% on Debix Model A and about 85% on TI AM69 SK. > > Signed-off-by: Milan Zamazal<mzamazal@redhat.com> > --- > .../internal/software_isp/debayer_params.h | 20 ++++++- > src/ipa/simple/algorithms/lut.cpp | 58 ++++++++++++++----- > src/ipa/simple/algorithms/lut.h | 1 + > src/libcamera/software_isp/debayer.cpp | 53 ++++++++++++++++- > src/libcamera/software_isp/debayer_cpu.cpp | 34 ++++++++--- > src/libcamera/software_isp/debayer_cpu.h | 7 ++- > src/libcamera/software_isp/software_isp.cpp | 6 +- > 7 files changed, 145 insertions(+), 34 deletions(-) > > diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h > index 7d8fdd481..531db6968 100644 > --- a/include/libcamera/internal/software_isp/debayer_params.h > +++ b/include/libcamera/internal/software_isp/debayer_params.h > @@ -18,11 +18,25 @@ namespace libcamera { > struct DebayerParams { > static constexpr unsigned int kRGBLookupSize = 256; > > + struct CcmRow { > + int16_t c1; > + int16_t c2; > + int16_t c3; > + }; > + > using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>; > + using CcmLookupTable = std::array<CcmRow, kRGBLookupSize>; > + using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>; > + > + union LookupTable { > + ColorLookupTable simple; > + CcmLookupTable ccm; > + }; > > - ColorLookupTable red; > - ColorLookupTable green; > - ColorLookupTable blue; > + LookupTable red; > + LookupTable green; > + LookupTable blue; > + GammaLookupTable gammaLut; > }; > > } /* namespace libcamera */ > diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp > index 8fca96b0a..259976e08 100644 > --- a/src/ipa/simple/algorithms/lut.cpp > +++ b/src/ipa/simple/algorithms/lut.cpp > @@ -44,6 +44,11 @@ void Lut::updateGammaTable(IPAContext &context) > context.activeState.gamma.blackLevel = blackLevel; > } > > +int16_t Lut::ccmValue(unsigned int i, float ccm) const > +{ > + return std::round(i * ccm); > +} > + > void Lut::prepare(IPAContext &context, > [[maybe_unused]] const uint32_t frame, > [[maybe_unused]] IPAFrameContext &frameContext, > @@ -55,27 +60,48 @@ void Lut::prepare(IPAContext &context, > * observed, it's not permanently prone to minor fluctuations or > * rounding errors. > */ > - if (context.activeState.gamma.blackLevel != context.activeState.blc.level) > + const bool gammaUpdateNeeded = > + context.activeState.gamma.blackLevel != context.activeState.blc.level; > + if (gammaUpdateNeeded) > updateGammaTable(context); > > auto &gains = context.activeState.awb.gains; > auto &gammaTable = context.activeState.gamma.gammaTable; > const unsigned int gammaTableSize = gammaTable.size(); > - > - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > - const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / > - gammaTableSize; > - /* Apply gamma after gain! */ > - unsigned int idx; > - idx = std::min({ static_cast<unsigned int>(i * gains.red / div), > - gammaTableSize - 1 }); > - params->red[i] = gammaTable[idx]; > - idx = std::min({ static_cast<unsigned int>(i * gains.green / div), > - gammaTableSize - 1 }); > - params->green[i] = gammaTable[idx]; > - idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), > - gammaTableSize - 1 }); > - params->blue[i] = gammaTable[idx]; > + const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / > + gammaTableSize; > + > + if (!context.activeState.ccm.enabled) { > + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > + /* Apply gamma after gain! */ > + unsigned int idx; > + idx = std::min({ static_cast<unsigned int>(i * gains.red / div), > + gammaTableSize - 1 }); > + params->red.simple[i] = gammaTable[idx]; > + idx = std::min({ static_cast<unsigned int>(i * gains.green / div), > + gammaTableSize - 1 }); > + params->green.simple[i] = gammaTable[idx]; > + idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), > + gammaTableSize - 1 }); > + params->blue.simple[i] = gammaTable[idx]; > + } > + } else if (context.activeState.ccm.changed || gammaUpdateNeeded) { > + auto &ccm = context.activeState.ccm.ccm; > + auto &red = params->red.ccm; > + auto &green = params->green.ccm; > + auto &blue = params->blue.ccm; > + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > + red[i].c1 = ccmValue(i, ccm[0][0]); > + red[i].c2 = ccmValue(i, ccm[0][1]); > + red[i].c3 = ccmValue(i, ccm[0][2]); > + green[i].c1 = ccmValue(i, ccm[1][0]); > + green[i].c2 = ccmValue(i, ccm[1][1]); > + green[i].c3 = ccmValue(i, ccm[1][2]); > + blue[i].c1 = ccmValue(i, ccm[2][0]); > + blue[i].c2 = ccmValue(i, ccm[2][1]); > + blue[i].c3 = ccmValue(i, ccm[2][2]); > + params->gammaLut[i] = gammaTable[i / div]; > + } > } > } > > diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h > index b635987d0..8d50a23fc 100644 > --- a/src/ipa/simple/algorithms/lut.h > +++ b/src/ipa/simple/algorithms/lut.h > @@ -27,6 +27,7 @@ public: > > private: > void updateGammaTable(IPAContext &context); > + int16_t ccmValue(unsigned int i, float ccm) const; > }; > > } /* namespace ipa::soft::algorithms */ > diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp > index f0b832619..97106b22b 100644 > --- a/src/libcamera/software_isp/debayer.cpp > +++ b/src/libcamera/software_isp/debayer.cpp > @@ -23,11 +23,56 @@ namespace libcamera { > * \brief Size of a color lookup table > */ > > +/** > + * \struct DebayerParams::CcmRow > + * \brief Type of a single row of a color correction matrix > + */ > + > +/** > + * \var DebayerParams::CcmRow::c1 > + * \brief First column of a CCM row > + */ > + > +/** > + * \var DebayerParams::CcmRow::c2 > + * \brief Second column of a CCM row > + */ > + > +/** > + * \var DebayerParams::CcmRow::c3 > + * \brief Third column of a CCM row > + */ > + > /** > * \typedef DebayerParams::ColorLookupTable > + * \brief Type of the simple lookup tables for red, green, blue values > + */ > + > +/** > + * \typedef DebayerParams::CcmLookupTable > + * \brief Type of the CCM lookup tables for red, green, blue values > + */ > + > +/** > + * \typedef DebayerParams::GammaLookupTable > + * \brief Type of the gamma lookup tables for CCM > + */ > + > +/** > + * \union DebayerParams::LookupTable > * \brief Type of the lookup tables for red, green, blue values > */ > > +/** > + * \var DebayerParams::LookupTable::simple > + * \brief Simple lookup table for red, green, blue values > + */ > + > +/** > + * \var DebayerParams::LookupTable::ccm > + * \brief CCM lookup table for red, green, blue values > + */ > + > /** > * \var DebayerParams::red > * \brief Lookup table for red color, mapping input values to output values > @@ -43,6 +88,11 @@ namespace libcamera { > * \brief Lookup table for blue color, mapping input values to output values > */ > > +/** > + * \var DebayerParams::gammaLut > + * \brief Gamma lookup table used with color correction matrix > + */ > + > /** > * \class Debayer > * \brief Base debayering class > @@ -57,10 +107,11 @@ Debayer::~Debayer() > } > > /** > - * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs) > + * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs, ccmEnabled) > * \brief Configure the debayer object according to the passed in parameters > * \param[in] inputCfg The input configuration > * \param[in] outputCfgs The output configurations > + * \param[in] ccmEnabled Whether a color correction matrix is applied > * > * \return 0 on success, a negative errno on failure > */ > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp > index 52baf43dc..d8976f183 100644 > --- a/src/libcamera/software_isp/debayer_cpu.cpp > +++ b/src/libcamera/software_isp/debayer_cpu.cpp > @@ -11,6 +11,7 @@ > > #include "debayer_cpu.h" > > +#include <algorithm> > #include <stdlib.h> > #include <sys/ioctl.h> > #include <time.h> > @@ -50,8 +51,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) > enableInputMemcpy_ = true; > > /* Initialize color lookup tables */ > - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) > - red_[i] = green_[i] = blue_[i] = i; > + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > + red_.simple[i] = green_.simple[i] = blue_.simple[i] = i; > + red_.ccm[i].c1 = red_.ccm[i].c2 = red_.ccm[i].c3 = 0; > + green_.ccm[i].c1 = green_.ccm[i].c2 = green_.ccm[i].c3 = 0; > + blue_.ccm[i].c1 = blue_.ccm[i].c2 = blue_.ccm[i].c3 = 0; > + } > } > > DebayerCpu::~DebayerCpu() = default; > @@ -61,12 +66,24 @@ DebayerCpu::~DebayerCpu() = default; > const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \ > const pixel_t *next = (const pixel_t *)src[2] + xShift_; > > -#define STORE_PIXEL(b, g, r) \ > - *dst++ = blue_[r]; \ > - *dst++ = green_[g]; \ > - *dst++ = red_[b]; \ > - if constexpr (addAlphaByte) \ > - *dst++ = 255; \ > +#define GAMMA(value) \ > + *dst++ = gammaLut_[std::clamp(value, 0, 1023)] This should be 255 instead 1023, correct? > + > +#define STORE_PIXEL(b, g, r) \ > + if constexpr (ccmEnabled) { \ > + DebayerParams::CcmRow &red = red_.ccm[r]; \ > + DebayerParams::CcmRow &green = green_.ccm[g]; \ > + DebayerParams::CcmRow &blue = blue_.ccm[b]; \ > + GAMMA(red.c3 + green.c3 + blue.c3); \ > + GAMMA(red.c2 + green.c2 + blue.c2); \ > + GAMMA(red.c1 + green.c1 + blue.c1); \ > + } else { \ > + *dst++ = blue_.simple[b]; \ > + *dst++ = green_.simple[g]; \ > + *dst++ = red_.simple[r]; \ > + } \ > + if constexpr (addAlphaByte) \ > + *dst++ = 255; \ > x++; > > /* > @@ -764,6 +781,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output > green_ = params.green; > red_ = swapRedBlueGains_ ? params.blue : params.red; > blue_ = swapRedBlueGains_ ? params.red : params.blue; > + gammaLut_ = params.gammaLut; > > /* Copy metadata from the input buffer */ > FrameMetadata &metadata = output->_d()->metadata(); > diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h > index b2ec8f1bd..35c51e4fd 100644 > --- a/src/libcamera/software_isp/debayer_cpu.h > +++ b/src/libcamera/software_isp/debayer_cpu.h > @@ -137,9 +137,10 @@ private: > /* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */ > static constexpr unsigned int kMaxLineBuffers = 5; > > - DebayerParams::ColorLookupTable red_; > - DebayerParams::ColorLookupTable green_; > - DebayerParams::ColorLookupTable blue_; > + DebayerParams::LookupTable red_; > + DebayerParams::LookupTable green_; > + DebayerParams::LookupTable blue_; > + DebayerParams::GammaLookupTable gammaLut_; > debayerFn debayer0_; > debayerFn debayer1_; > debayerFn debayer2_; > diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp > index 0e1d3a457..08eca192e 100644 > --- a/src/libcamera/software_isp/software_isp.cpp > +++ b/src/libcamera/software_isp/software_isp.cpp > @@ -79,9 +79,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor) > for (unsigned int i = 0; i < 256; i++) > gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5); > for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > - debayerParams_.red[i] = gammaTable[i]; > - debayerParams_.green[i] = gammaTable[i]; > - debayerParams_.blue[i] = gammaTable[i]; > + debayerParams_.red.simple[i] = gammaTable[i]; > + debayerParams_.green.simple[i] = gammaTable[i]; > + debayerParams_.blue.simple[i] = gammaTable[i]; > } > > if (!dmaHeap_.isValid()) {
There's a red <-> blue color swap when the ccm is active, see below: On 20.11.24 19:01, Milan Zamazal wrote: > This patch applies color correction matrix (CCM) in debayering if the > CCM is specified. Not using CCM must still be supported for performance > reasons. > > The CCM is applied as follows: > > [r1 r2 r3] > [r g b] * [g1 g2 g3] > [b1 b2 b3] > > The CCM matrix (the right side of the multiplication) is constant during > single frame processing, while the input pixel (the left side) changes. > Because each of the color channels is only 8-bit in software ISP, we can > make 9 lookup tables with 256 input values for multiplications of each > of the r_i, g_i, b_i values. This way we don't have to multiply each > pixel, we can use table lookups and additions instead. Gamma (which is > non-linear and thus cannot be a part of the 9 lookup tables values) is > applied on the final values rounded to integers using another lookup > table. > > We use int16_t to store the precomputed multiplications. This seems to > be noticeably (>10%) faster than `float' for the price of slightly less > accuracy and it covers the range of values that sane CCMs produce. The > selection and structure of data is performance critical, for example > using bytes would add significant (>10%) speedup but would be too short > to cover the value range. > > The color lookup tables are changed to unions to be able to serve as > both simple lookup tables or CCM lookup tables. This is arguable but it > keeps the code easier to understand if nothing else. It is important to > make the unions on the whole lookup tables rather than their values > because the latter might cause a noticeable slowdown on simple lookup > due to the sheer fact that the table is not compact. Using unions makes > the initialization of the tables dubious because it is done before the > decision about using CCM is made but the initial values are dubious > anyway. > > The tables are copied (as before), which is not elegant but also not a > big problem. There are patches posted that use shared buffers for > parameters passing in software ISP (see software ISP TODO #5) and they > can be adjusted for the new parameter format. > > With this patch, the reported per-frame slowdown when applying CCM is > about 45% on Debix Model A and about 85% on TI AM69 SK. > > Signed-off-by: Milan Zamazal<mzamazal@redhat.com> > --- > .../internal/software_isp/debayer_params.h | 20 ++++++- > src/ipa/simple/algorithms/lut.cpp | 58 ++++++++++++++----- > src/ipa/simple/algorithms/lut.h | 1 + > src/libcamera/software_isp/debayer.cpp | 53 ++++++++++++++++- > src/libcamera/software_isp/debayer_cpu.cpp | 34 ++++++++--- > src/libcamera/software_isp/debayer_cpu.h | 7 ++- > src/libcamera/software_isp/software_isp.cpp | 6 +- > 7 files changed, 145 insertions(+), 34 deletions(-) > > diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h > index 7d8fdd481..531db6968 100644 > --- a/include/libcamera/internal/software_isp/debayer_params.h > +++ b/include/libcamera/internal/software_isp/debayer_params.h > @@ -18,11 +18,25 @@ namespace libcamera { > struct DebayerParams { > static constexpr unsigned int kRGBLookupSize = 256; > > + struct CcmRow { > + int16_t c1; > + int16_t c2; > + int16_t c3; > + }; > + > using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>; > + using CcmLookupTable = std::array<CcmRow, kRGBLookupSize>; > + using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>; > + > + union LookupTable { > + ColorLookupTable simple; > + CcmLookupTable ccm; > + }; > > - ColorLookupTable red; > - ColorLookupTable green; > - ColorLookupTable blue; > + LookupTable red; > + LookupTable green; > + LookupTable blue; > + GammaLookupTable gammaLut; > }; > > } /* namespace libcamera */ > diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp > index 8fca96b0a..259976e08 100644 > --- a/src/ipa/simple/algorithms/lut.cpp > +++ b/src/ipa/simple/algorithms/lut.cpp > @@ -44,6 +44,11 @@ void Lut::updateGammaTable(IPAContext &context) > context.activeState.gamma.blackLevel = blackLevel; > } > > +int16_t Lut::ccmValue(unsigned int i, float ccm) const > +{ > + return std::round(i * ccm); > +} > + > void Lut::prepare(IPAContext &context, > [[maybe_unused]] const uint32_t frame, > [[maybe_unused]] IPAFrameContext &frameContext, > @@ -55,27 +60,48 @@ void Lut::prepare(IPAContext &context, > * observed, it's not permanently prone to minor fluctuations or > * rounding errors. > */ > - if (context.activeState.gamma.blackLevel != context.activeState.blc.level) > + const bool gammaUpdateNeeded = > + context.activeState.gamma.blackLevel != context.activeState.blc.level; > + if (gammaUpdateNeeded) > updateGammaTable(context); > > auto &gains = context.activeState.awb.gains; > auto &gammaTable = context.activeState.gamma.gammaTable; > const unsigned int gammaTableSize = gammaTable.size(); > - > - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > - const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / > - gammaTableSize; > - /* Apply gamma after gain! */ > - unsigned int idx; > - idx = std::min({ static_cast<unsigned int>(i * gains.red / div), > - gammaTableSize - 1 }); > - params->red[i] = gammaTable[idx]; > - idx = std::min({ static_cast<unsigned int>(i * gains.green / div), > - gammaTableSize - 1 }); > - params->green[i] = gammaTable[idx]; > - idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), > - gammaTableSize - 1 }); > - params->blue[i] = gammaTable[idx]; > + const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / > + gammaTableSize; > + > + if (!context.activeState.ccm.enabled) { > + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > + /* Apply gamma after gain! */ > + unsigned int idx; > + idx = std::min({ static_cast<unsigned int>(i * gains.red / div), > + gammaTableSize - 1 }); > + params->red.simple[i] = gammaTable[idx]; > + idx = std::min({ static_cast<unsigned int>(i * gains.green / div), > + gammaTableSize - 1 }); > + params->green.simple[i] = gammaTable[idx]; > + idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), > + gammaTableSize - 1 }); > + params->blue.simple[i] = gammaTable[idx]; > + } > + } else if (context.activeState.ccm.changed || gammaUpdateNeeded) { > + auto &ccm = context.activeState.ccm.ccm; > + auto &red = params->red.ccm; > + auto &green = params->green.ccm; > + auto &blue = params->blue.ccm; > + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > + red[i].c1 = ccmValue(i, ccm[0][0]); > + red[i].c2 = ccmValue(i, ccm[0][1]); > + red[i].c3 = ccmValue(i, ccm[0][2]); > + green[i].c1 = ccmValue(i, ccm[1][0]); > + green[i].c2 = ccmValue(i, ccm[1][1]); > + green[i].c3 = ccmValue(i, ccm[1][2]); > + blue[i].c1 = ccmValue(i, ccm[2][0]); > + blue[i].c2 = ccmValue(i, ccm[2][1]); > + blue[i].c3 = ccmValue(i, ccm[2][2]); > + params->gammaLut[i] = gammaTable[i / div]; > + } > } > } > > diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h > index b635987d0..8d50a23fc 100644 > --- a/src/ipa/simple/algorithms/lut.h > +++ b/src/ipa/simple/algorithms/lut.h > @@ -27,6 +27,7 @@ public: > > private: > void updateGammaTable(IPAContext &context); > + int16_t ccmValue(unsigned int i, float ccm) const; > }; > > } /* namespace ipa::soft::algorithms */ > diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp > index f0b832619..97106b22b 100644 > --- a/src/libcamera/software_isp/debayer.cpp > +++ b/src/libcamera/software_isp/debayer.cpp > @@ -23,11 +23,56 @@ namespace libcamera { > * \brief Size of a color lookup table > */ > > +/** > + * \struct DebayerParams::CcmRow > + * \brief Type of a single row of a color correction matrix > + */ > + > +/** > + * \var DebayerParams::CcmRow::c1 > + * \brief First column of a CCM row > + */ > + > +/** > + * \var DebayerParams::CcmRow::c2 > + * \brief Second column of a CCM row > + */ > + > +/** > + * \var DebayerParams::CcmRow::c3 > + * \brief Third column of a CCM row > + */ > + > /** > * \typedef DebayerParams::ColorLookupTable > + * \brief Type of the simple lookup tables for red, green, blue values > + */ > + > +/** > + * \typedef DebayerParams::CcmLookupTable > + * \brief Type of the CCM lookup tables for red, green, blue values > + */ > + > +/** > + * \typedef DebayerParams::GammaLookupTable > + * \brief Type of the gamma lookup tables for CCM > + */ > + > +/** > + * \union DebayerParams::LookupTable > * \brief Type of the lookup tables for red, green, blue values > */ > > +/** > + * \var DebayerParams::LookupTable::simple > + * \brief Simple lookup table for red, green, blue values > + */ > + > +/** > + * \var DebayerParams::LookupTable::ccm > + * \brief CCM lookup table for red, green, blue values > + */ > + > /** > * \var DebayerParams::red > * \brief Lookup table for red color, mapping input values to output values > @@ -43,6 +88,11 @@ namespace libcamera { > * \brief Lookup table for blue color, mapping input values to output values > */ > > +/** > + * \var DebayerParams::gammaLut > + * \brief Gamma lookup table used with color correction matrix > + */ > + > /** > * \class Debayer > * \brief Base debayering class > @@ -57,10 +107,11 @@ Debayer::~Debayer() > } > > /** > - * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs) > + * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs, ccmEnabled) > * \brief Configure the debayer object according to the passed in parameters > * \param[in] inputCfg The input configuration > * \param[in] outputCfgs The output configurations > + * \param[in] ccmEnabled Whether a color correction matrix is applied > * > * \return 0 on success, a negative errno on failure > */ > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp > index 52baf43dc..d8976f183 100644 > --- a/src/libcamera/software_isp/debayer_cpu.cpp > +++ b/src/libcamera/software_isp/debayer_cpu.cpp > @@ -11,6 +11,7 @@ > > #include "debayer_cpu.h" > > +#include <algorithm> > #include <stdlib.h> > #include <sys/ioctl.h> > #include <time.h> > @@ -50,8 +51,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) > enableInputMemcpy_ = true; > > /* Initialize color lookup tables */ > - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) > - red_[i] = green_[i] = blue_[i] = i; > + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > + red_.simple[i] = green_.simple[i] = blue_.simple[i] = i; > + red_.ccm[i].c1 = red_.ccm[i].c2 = red_.ccm[i].c3 = 0; > + green_.ccm[i].c1 = green_.ccm[i].c2 = green_.ccm[i].c3 = 0; > + blue_.ccm[i].c1 = blue_.ccm[i].c2 = blue_.ccm[i].c3 = 0; > + } > } > > DebayerCpu::~DebayerCpu() = default; > @@ -61,12 +66,24 @@ DebayerCpu::~DebayerCpu() = default; > const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \ > const pixel_t *next = (const pixel_t *)src[2] + xShift_; > > -#define STORE_PIXEL(b, g, r) \ > - *dst++ = blue_[r]; \ > - *dst++ = green_[g]; \ > - *dst++ = red_[b]; \ > - if constexpr (addAlphaByte) \ > - *dst++ = 255; \ > +#define GAMMA(value) \ > + *dst++ = gammaLut_[std::clamp(value, 0, 1023)] > + > +#define STORE_PIXEL(b, g, r) \ > + if constexpr (ccmEnabled) { \ > + DebayerParams::CcmRow &red = red_.ccm[r]; \ > + DebayerParams::CcmRow &green = green_.ccm[g]; \ > + DebayerParams::CcmRow &blue = blue_.ccm[b]; \ > + GAMMA(red.c3 + green.c3 + blue.c3); \ > + GAMMA(red.c2 + green.c2 + blue.c2); \ > + GAMMA(red.c1 + green.c1 + blue.c1); \ This apparently should be: + GAMMA(red.c1 + green.c1 + blue.c1); \ + GAMMA(red.c2 + green.c2 + blue.c2); \ + GAMMA(red.c3 + green.c3 + blue.c3); \ > + } else { \ > + *dst++ = blue_.simple[b]; \ > + *dst++ = green_.simple[g]; \ > + *dst++ = red_.simple[r]; \ > + } \ > + if constexpr (addAlphaByte) \ > + *dst++ = 255; \ > x++; > > /* > @@ -764,6 +781,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output > green_ = params.green; > red_ = swapRedBlueGains_ ? params.blue : params.red; > blue_ = swapRedBlueGains_ ? params.red : params.blue; > + gammaLut_ = params.gammaLut; > > /* Copy metadata from the input buffer */ > FrameMetadata &metadata = output->_d()->metadata(); > diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h > index b2ec8f1bd..35c51e4fd 100644 > --- a/src/libcamera/software_isp/debayer_cpu.h > +++ b/src/libcamera/software_isp/debayer_cpu.h > @@ -137,9 +137,10 @@ private: > /* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */ > static constexpr unsigned int kMaxLineBuffers = 5; > > - DebayerParams::ColorLookupTable red_; > - DebayerParams::ColorLookupTable green_; > - DebayerParams::ColorLookupTable blue_; > + DebayerParams::LookupTable red_; > + DebayerParams::LookupTable green_; > + DebayerParams::LookupTable blue_; > + DebayerParams::GammaLookupTable gammaLut_; > debayerFn debayer0_; > debayerFn debayer1_; > debayerFn debayer2_; > diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp > index 0e1d3a457..08eca192e 100644 > --- a/src/libcamera/software_isp/software_isp.cpp > +++ b/src/libcamera/software_isp/software_isp.cpp > @@ -79,9 +79,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor) > for (unsigned int i = 0; i < 256; i++) > gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5); > for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { > - debayerParams_.red[i] = gammaTable[i]; > - debayerParams_.green[i] = gammaTable[i]; > - debayerParams_.blue[i] = gammaTable[i]; > + debayerParams_.red.simple[i] = gammaTable[i]; > + debayerParams_.green.simple[i] = gammaTable[i]; > + debayerParams_.blue.simple[i] = gammaTable[i]; > } > > if (!dmaHeap_.isValid()) {
Hi Robert, thank you for testing. Robert Mader <robert.mader@collabora.com> writes: > Hi, awesome to see this series! I currently giving this a try - found two issues so far: > > 1. Currently things crash because of the wrong gamma clamp, see below. > 2. With that fixed I get a Red<->Blue channel switch. Haven't found the > line yet, but it also happens without this patch, so probably > happens earlier in the series. > > On 20.11.24 19:01, Milan Zamazal wrote: >> This patch applies color correction matrix (CCM) in debayering if the >> CCM is specified. Not using CCM must still be supported for performance >> reasons. >> >> The CCM is applied as follows: >> >> [r1 r2 r3] >> [r g b] * [g1 g2 g3] >> [b1 b2 b3] >> >> The CCM matrix (the right side of the multiplication) is constant during >> single frame processing, while the input pixel (the left side) changes. >> Because each of the color channels is only 8-bit in software ISP, we can >> make 9 lookup tables with 256 input values for multiplications of each >> of the r_i, g_i, b_i values. This way we don't have to multiply each >> pixel, we can use table lookups and additions instead. Gamma (which is >> non-linear and thus cannot be a part of the 9 lookup tables values) is >> applied on the final values rounded to integers using another lookup >> table. >> >> We use int16_t to store the precomputed multiplications. This seems to >> be noticeably (>10%) faster than `float' for the price of slightly less >> accuracy and it covers the range of values that sane CCMs produce. The >> selection and structure of data is performance critical, for example >> using bytes would add significant (>10%) speedup but would be too short >> to cover the value range. >> >> The color lookup tables are changed to unions to be able to serve as >> both simple lookup tables or CCM lookup tables. This is arguable but it >> keeps the code easier to understand if nothing else. It is important to >> make the unions on the whole lookup tables rather than their values >> because the latter might cause a noticeable slowdown on simple lookup >> due to the sheer fact that the table is not compact. Using unions makes >> the initialization of the tables dubious because it is done before the >> decision about using CCM is made but the initial values are dubious >> anyway. >> >> The tables are copied (as before), which is not elegant but also not a >> big problem. There are patches posted that use shared buffers for >> parameters passing in software ISP (see software ISP TODO #5) and they >> can be adjusted for the new parameter format. >> >> With this patch, the reported per-frame slowdown when applying CCM is >> about 45% on Debix Model A and about 85% on TI AM69 SK. >> >> Signed-off-by: Milan Zamazal<mzamazal@redhat.com> >> --- >> .../internal/software_isp/debayer_params.h | 20 ++++++- >> src/ipa/simple/algorithms/lut.cpp | 58 ++++++++++++++----- >> src/ipa/simple/algorithms/lut.h | 1 + >> src/libcamera/software_isp/debayer.cpp | 53 ++++++++++++++++- >> src/libcamera/software_isp/debayer_cpu.cpp | 34 ++++++++--- >> src/libcamera/software_isp/debayer_cpu.h | 7 ++- >> src/libcamera/software_isp/software_isp.cpp | 6 +- >> 7 files changed, 145 insertions(+), 34 deletions(-) >> >> diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h >> index 7d8fdd481..531db6968 100644 >> --- a/include/libcamera/internal/software_isp/debayer_params.h >> +++ b/include/libcamera/internal/software_isp/debayer_params.h >> @@ -18,11 +18,25 @@ namespace libcamera { >> struct DebayerParams { >> static constexpr unsigned int kRGBLookupSize = 256; >> + struct CcmRow { >> + int16_t c1; >> + int16_t c2; >> + int16_t c3; >> + }; >> + >> using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>; >> + using CcmLookupTable = std::array<CcmRow, kRGBLookupSize>; >> + using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>; >> + >> + union LookupTable { >> + ColorLookupTable simple; >> + CcmLookupTable ccm; >> + }; >> - ColorLookupTable red; >> - ColorLookupTable green; >> - ColorLookupTable blue; >> + LookupTable red; >> + LookupTable green; >> + LookupTable blue; >> + GammaLookupTable gammaLut; >> }; >> } /* namespace libcamera */ >> diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp >> index 8fca96b0a..259976e08 100644 >> --- a/src/ipa/simple/algorithms/lut.cpp >> +++ b/src/ipa/simple/algorithms/lut.cpp >> @@ -44,6 +44,11 @@ void Lut::updateGammaTable(IPAContext &context) >> context.activeState.gamma.blackLevel = blackLevel; >> } >> +int16_t Lut::ccmValue(unsigned int i, float ccm) const >> +{ >> + return std::round(i * ccm); >> +} >> + >> void Lut::prepare(IPAContext &context, >> [[maybe_unused]] const uint32_t frame, >> [[maybe_unused]] IPAFrameContext &frameContext, >> @@ -55,27 +60,48 @@ void Lut::prepare(IPAContext &context, >> * observed, it's not permanently prone to minor fluctuations or >> * rounding errors. >> */ >> - if (context.activeState.gamma.blackLevel != context.activeState.blc.level) >> + const bool gammaUpdateNeeded = >> + context.activeState.gamma.blackLevel != context.activeState.blc.level; >> + if (gammaUpdateNeeded) >> updateGammaTable(context); >> auto &gains = context.activeState.awb.gains; >> auto &gammaTable = context.activeState.gamma.gammaTable; >> const unsigned int gammaTableSize = gammaTable.size(); >> - >> - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> - const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / >> - gammaTableSize; >> - /* Apply gamma after gain! */ >> - unsigned int idx; >> - idx = std::min({ static_cast<unsigned int>(i * gains.red / div), >> - gammaTableSize - 1 }); >> - params->red[i] = gammaTable[idx]; >> - idx = std::min({ static_cast<unsigned int>(i * gains.green / div), >> - gammaTableSize - 1 }); >> - params->green[i] = gammaTable[idx]; >> - idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), >> - gammaTableSize - 1 }); >> - params->blue[i] = gammaTable[idx]; >> + const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / >> + gammaTableSize; >> + >> + if (!context.activeState.ccm.enabled) { >> + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> + /* Apply gamma after gain! */ >> + unsigned int idx; >> + idx = std::min({ static_cast<unsigned int>(i * gains.red / div), >> + gammaTableSize - 1 }); >> + params->red.simple[i] = gammaTable[idx]; >> + idx = std::min({ static_cast<unsigned int>(i * gains.green / div), >> + gammaTableSize - 1 }); >> + params->green.simple[i] = gammaTable[idx]; >> + idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), >> + gammaTableSize - 1 }); >> + params->blue.simple[i] = gammaTable[idx]; >> + } >> + } else if (context.activeState.ccm.changed || gammaUpdateNeeded) { >> + auto &ccm = context.activeState.ccm.ccm; >> + auto &red = params->red.ccm; >> + auto &green = params->green.ccm; >> + auto &blue = params->blue.ccm; >> + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> + red[i].c1 = ccmValue(i, ccm[0][0]); >> + red[i].c2 = ccmValue(i, ccm[0][1]); >> + red[i].c3 = ccmValue(i, ccm[0][2]); >> + green[i].c1 = ccmValue(i, ccm[1][0]); >> + green[i].c2 = ccmValue(i, ccm[1][1]); >> + green[i].c3 = ccmValue(i, ccm[1][2]); >> + blue[i].c1 = ccmValue(i, ccm[2][0]); >> + blue[i].c2 = ccmValue(i, ccm[2][1]); >> + blue[i].c3 = ccmValue(i, ccm[2][2]); >> + params->gammaLut[i] = gammaTable[i / div]; >> + } >> } >> } >> diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h >> index b635987d0..8d50a23fc 100644 >> --- a/src/ipa/simple/algorithms/lut.h >> +++ b/src/ipa/simple/algorithms/lut.h >> @@ -27,6 +27,7 @@ public: >> private: >> void updateGammaTable(IPAContext &context); >> + int16_t ccmValue(unsigned int i, float ccm) const; >> }; >> } /* namespace ipa::soft::algorithms */ >> diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp >> index f0b832619..97106b22b 100644 >> --- a/src/libcamera/software_isp/debayer.cpp >> +++ b/src/libcamera/software_isp/debayer.cpp >> @@ -23,11 +23,56 @@ namespace libcamera { >> * \brief Size of a color lookup table >> */ >> +/** >> + * \struct DebayerParams::CcmRow >> + * \brief Type of a single row of a color correction matrix >> + */ >> + >> +/** >> + * \var DebayerParams::CcmRow::c1 >> + * \brief First column of a CCM row >> + */ >> + >> +/** >> + * \var DebayerParams::CcmRow::c2 >> + * \brief Second column of a CCM row >> + */ >> + >> +/** >> + * \var DebayerParams::CcmRow::c3 >> + * \brief Third column of a CCM row >> + */ >> + >> /** >> * \typedef DebayerParams::ColorLookupTable >> + * \brief Type of the simple lookup tables for red, green, blue values >> + */ >> + >> +/** >> + * \typedef DebayerParams::CcmLookupTable >> + * \brief Type of the CCM lookup tables for red, green, blue values >> + */ >> + >> +/** >> + * \typedef DebayerParams::GammaLookupTable >> + * \brief Type of the gamma lookup tables for CCM >> + */ >> + >> +/** >> + * \union DebayerParams::LookupTable >> * \brief Type of the lookup tables for red, green, blue values >> */ >> +/** >> + * \var DebayerParams::LookupTable::simple >> + * \brief Simple lookup table for red, green, blue values >> + */ >> + >> +/** >> + * \var DebayerParams::LookupTable::ccm >> + * \brief CCM lookup table for red, green, blue values >> + */ >> + >> /** >> * \var DebayerParams::red >> * \brief Lookup table for red color, mapping input values to output values >> @@ -43,6 +88,11 @@ namespace libcamera { >> * \brief Lookup table for blue color, mapping input values to output values >> */ >> +/** >> + * \var DebayerParams::gammaLut >> + * \brief Gamma lookup table used with color correction matrix >> + */ >> + >> /** >> * \class Debayer >> * \brief Base debayering class >> @@ -57,10 +107,11 @@ Debayer::~Debayer() >> } >> /** >> - * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs) >> + * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs, ccmEnabled) >> * \brief Configure the debayer object according to the passed in parameters >> * \param[in] inputCfg The input configuration >> * \param[in] outputCfgs The output configurations >> + * \param[in] ccmEnabled Whether a color correction matrix is applied >> * >> * \return 0 on success, a negative errno on failure >> */ >> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp >> index 52baf43dc..d8976f183 100644 >> --- a/src/libcamera/software_isp/debayer_cpu.cpp >> +++ b/src/libcamera/software_isp/debayer_cpu.cpp >> @@ -11,6 +11,7 @@ >> #include "debayer_cpu.h" >> +#include <algorithm> >> #include <stdlib.h> >> #include <sys/ioctl.h> >> #include <time.h> >> @@ -50,8 +51,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) >> enableInputMemcpy_ = true; >> /* Initialize color lookup tables */ >> - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) >> - red_[i] = green_[i] = blue_[i] = i; >> + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> + red_.simple[i] = green_.simple[i] = blue_.simple[i] = i; >> + red_.ccm[i].c1 = red_.ccm[i].c2 = red_.ccm[i].c3 = 0; >> + green_.ccm[i].c1 = green_.ccm[i].c2 = green_.ccm[i].c3 = 0; >> + blue_.ccm[i].c1 = blue_.ccm[i].c2 = blue_.ccm[i].c3 = 0; >> + } >> } >> DebayerCpu::~DebayerCpu() = default; >> @@ -61,12 +66,24 @@ DebayerCpu::~DebayerCpu() = default; >> const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \ >> const pixel_t *next = (const pixel_t *)src[2] + xShift_; >> -#define STORE_PIXEL(b, g, r) \ >> - *dst++ = blue_[r]; \ >> - *dst++ = green_[g]; \ >> - *dst++ = red_[b]; \ >> - if constexpr (addAlphaByte) \ >> - *dst++ = 255; \ >> +#define GAMMA(value) \ >> + *dst++ = gammaLut_[std::clamp(value, 0, 1023)] > > This should be 255 instead 1023, correct? Indeed, thank you for finding this. I'll change the upper boundary to static_cast<int>(gammaLut_.size()) - 1 in v2. >> + >> +#define STORE_PIXEL(b, g, r) \ >> + if constexpr (ccmEnabled) { \ >> + DebayerParams::CcmRow &red = red_.ccm[r]; \ >> + DebayerParams::CcmRow &green = green_.ccm[g]; \ >> + DebayerParams::CcmRow &blue = blue_.ccm[b]; \ >> + GAMMA(red.c3 + green.c3 + blue.c3); \ >> + GAMMA(red.c2 + green.c2 + blue.c2); \ >> + GAMMA(red.c1 + green.c1 + blue.c1); \ >> + } else { \ >> + *dst++ = blue_.simple[b]; \ >> + *dst++ = green_.simple[g]; \ >> + *dst++ = red_.simple[r]; \ >> + } \ >> + if constexpr (addAlphaByte) \ >> + *dst++ = 255; \ >> x++; >> /* >> @@ -764,6 +781,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output >> green_ = params.green; >> red_ = swapRedBlueGains_ ? params.blue : params.red; >> blue_ = swapRedBlueGains_ ? params.red : params.blue; >> + gammaLut_ = params.gammaLut; >> /* Copy metadata from the input buffer */ >> FrameMetadata &metadata = output->_d()->metadata(); >> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h >> index b2ec8f1bd..35c51e4fd 100644 >> --- a/src/libcamera/software_isp/debayer_cpu.h >> +++ b/src/libcamera/software_isp/debayer_cpu.h >> @@ -137,9 +137,10 @@ private: >> /* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */ >> static constexpr unsigned int kMaxLineBuffers = 5; >> - DebayerParams::ColorLookupTable red_; >> - DebayerParams::ColorLookupTable green_; >> - DebayerParams::ColorLookupTable blue_; >> + DebayerParams::LookupTable red_; >> + DebayerParams::LookupTable green_; >> + DebayerParams::LookupTable blue_; >> + DebayerParams::GammaLookupTable gammaLut_; >> debayerFn debayer0_; >> debayerFn debayer1_; >> debayerFn debayer2_; >> diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp >> index 0e1d3a457..08eca192e 100644 >> --- a/src/libcamera/software_isp/software_isp.cpp >> +++ b/src/libcamera/software_isp/software_isp.cpp >> @@ -79,9 +79,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor) >> for (unsigned int i = 0; i < 256; i++) >> gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5); >> for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> - debayerParams_.red[i] = gammaTable[i]; >> - debayerParams_.green[i] = gammaTable[i]; >> - debayerParams_.blue[i] = gammaTable[i]; >> + debayerParams_.red.simple[i] = gammaTable[i]; >> + debayerParams_.green.simple[i] = gammaTable[i]; >> + debayerParams_.blue.simple[i] = gammaTable[i]; >> } >> if (!dmaHeap_.isValid()) { Robert Mader <robert.mader@collabora.com> writes: > There's a red <-> blue color swap when the ccm is active, see below: > > On 20.11.24 19:01, Milan Zamazal wrote: >> This patch applies color correction matrix (CCM) in debayering if the >> CCM is specified. Not using CCM must still be supported for performance >> reasons. >> >> The CCM is applied as follows: >> >> [r1 r2 r3] >> [r g b] * [g1 g2 g3] >> [b1 b2 b3] >> >> The CCM matrix (the right side of the multiplication) is constant during >> single frame processing, while the input pixel (the left side) changes. >> Because each of the color channels is only 8-bit in software ISP, we can >> make 9 lookup tables with 256 input values for multiplications of each >> of the r_i, g_i, b_i values. This way we don't have to multiply each >> pixel, we can use table lookups and additions instead. Gamma (which is >> non-linear and thus cannot be a part of the 9 lookup tables values) is >> applied on the final values rounded to integers using another lookup >> table. >> >> We use int16_t to store the precomputed multiplications. This seems to >> be noticeably (>10%) faster than `float' for the price of slightly less >> accuracy and it covers the range of values that sane CCMs produce. The >> selection and structure of data is performance critical, for example >> using bytes would add significant (>10%) speedup but would be too short >> to cover the value range. >> >> The color lookup tables are changed to unions to be able to serve as >> both simple lookup tables or CCM lookup tables. This is arguable but it >> keeps the code easier to understand if nothing else. It is important to >> make the unions on the whole lookup tables rather than their values >> because the latter might cause a noticeable slowdown on simple lookup >> due to the sheer fact that the table is not compact. Using unions makes >> the initialization of the tables dubious because it is done before the >> decision about using CCM is made but the initial values are dubious >> anyway. >> >> The tables are copied (as before), which is not elegant but also not a >> big problem. There are patches posted that use shared buffers for >> parameters passing in software ISP (see software ISP TODO #5) and they >> can be adjusted for the new parameter format. >> >> With this patch, the reported per-frame slowdown when applying CCM is >> about 45% on Debix Model A and about 85% on TI AM69 SK. >> >> Signed-off-by: Milan Zamazal<mzamazal@redhat.com> >> --- >> .../internal/software_isp/debayer_params.h | 20 ++++++- >> src/ipa/simple/algorithms/lut.cpp | 58 ++++++++++++++----- >> src/ipa/simple/algorithms/lut.h | 1 + >> src/libcamera/software_isp/debayer.cpp | 53 ++++++++++++++++- >> src/libcamera/software_isp/debayer_cpu.cpp | 34 ++++++++--- >> src/libcamera/software_isp/debayer_cpu.h | 7 ++- >> src/libcamera/software_isp/software_isp.cpp | 6 +- >> 7 files changed, 145 insertions(+), 34 deletions(-) >> >> diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h >> index 7d8fdd481..531db6968 100644 >> --- a/include/libcamera/internal/software_isp/debayer_params.h >> +++ b/include/libcamera/internal/software_isp/debayer_params.h >> @@ -18,11 +18,25 @@ namespace libcamera { >> struct DebayerParams { >> static constexpr unsigned int kRGBLookupSize = 256; >> + struct CcmRow { >> + int16_t c1; >> + int16_t c2; >> + int16_t c3; >> + }; >> + >> using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>; >> + using CcmLookupTable = std::array<CcmRow, kRGBLookupSize>; >> + using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>; >> + >> + union LookupTable { >> + ColorLookupTable simple; >> + CcmLookupTable ccm; >> + }; >> - ColorLookupTable red; >> - ColorLookupTable green; >> - ColorLookupTable blue; >> + LookupTable red; >> + LookupTable green; >> + LookupTable blue; >> + GammaLookupTable gammaLut; >> }; >> } /* namespace libcamera */ >> diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp >> index 8fca96b0a..259976e08 100644 >> --- a/src/ipa/simple/algorithms/lut.cpp >> +++ b/src/ipa/simple/algorithms/lut.cpp >> @@ -44,6 +44,11 @@ void Lut::updateGammaTable(IPAContext &context) >> context.activeState.gamma.blackLevel = blackLevel; >> } >> +int16_t Lut::ccmValue(unsigned int i, float ccm) const >> +{ >> + return std::round(i * ccm); >> +} >> + >> void Lut::prepare(IPAContext &context, >> [[maybe_unused]] const uint32_t frame, >> [[maybe_unused]] IPAFrameContext &frameContext, >> @@ -55,27 +60,48 @@ void Lut::prepare(IPAContext &context, >> * observed, it's not permanently prone to minor fluctuations or >> * rounding errors. >> */ >> - if (context.activeState.gamma.blackLevel != context.activeState.blc.level) >> + const bool gammaUpdateNeeded = >> + context.activeState.gamma.blackLevel != context.activeState.blc.level; >> + if (gammaUpdateNeeded) >> updateGammaTable(context); >> auto &gains = context.activeState.awb.gains; >> auto &gammaTable = context.activeState.gamma.gammaTable; >> const unsigned int gammaTableSize = gammaTable.size(); >> - >> - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> - const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / >> - gammaTableSize; >> - /* Apply gamma after gain! */ >> - unsigned int idx; >> - idx = std::min({ static_cast<unsigned int>(i * gains.red / div), >> - gammaTableSize - 1 }); >> - params->red[i] = gammaTable[idx]; >> - idx = std::min({ static_cast<unsigned int>(i * gains.green / div), >> - gammaTableSize - 1 }); >> - params->green[i] = gammaTable[idx]; >> - idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), >> - gammaTableSize - 1 }); >> - params->blue[i] = gammaTable[idx]; >> + const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / >> + gammaTableSize; >> + >> + if (!context.activeState.ccm.enabled) { >> + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> + /* Apply gamma after gain! */ >> + unsigned int idx; >> + idx = std::min({ static_cast<unsigned int>(i * gains.red / div), >> + gammaTableSize - 1 }); >> + params->red.simple[i] = gammaTable[idx]; >> + idx = std::min({ static_cast<unsigned int>(i * gains.green / div), >> + gammaTableSize - 1 }); >> + params->green.simple[i] = gammaTable[idx]; >> + idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), >> + gammaTableSize - 1 }); >> + params->blue.simple[i] = gammaTable[idx]; >> + } >> + } else if (context.activeState.ccm.changed || gammaUpdateNeeded) { >> + auto &ccm = context.activeState.ccm.ccm; >> + auto &red = params->red.ccm; >> + auto &green = params->green.ccm; >> + auto &blue = params->blue.ccm; >> + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> + red[i].c1 = ccmValue(i, ccm[0][0]); >> + red[i].c2 = ccmValue(i, ccm[0][1]); >> + red[i].c3 = ccmValue(i, ccm[0][2]); >> + green[i].c1 = ccmValue(i, ccm[1][0]); >> + green[i].c2 = ccmValue(i, ccm[1][1]); >> + green[i].c3 = ccmValue(i, ccm[1][2]); >> + blue[i].c1 = ccmValue(i, ccm[2][0]); >> + blue[i].c2 = ccmValue(i, ccm[2][1]); >> + blue[i].c3 = ccmValue(i, ccm[2][2]); >> + params->gammaLut[i] = gammaTable[i / div]; >> + } >> } >> } >> diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h >> index b635987d0..8d50a23fc 100644 >> --- a/src/ipa/simple/algorithms/lut.h >> +++ b/src/ipa/simple/algorithms/lut.h >> @@ -27,6 +27,7 @@ public: >> private: >> void updateGammaTable(IPAContext &context); >> + int16_t ccmValue(unsigned int i, float ccm) const; >> }; >> } /* namespace ipa::soft::algorithms */ >> diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp >> index f0b832619..97106b22b 100644 >> --- a/src/libcamera/software_isp/debayer.cpp >> +++ b/src/libcamera/software_isp/debayer.cpp >> @@ -23,11 +23,56 @@ namespace libcamera { >> * \brief Size of a color lookup table >> */ >> +/** >> + * \struct DebayerParams::CcmRow >> + * \brief Type of a single row of a color correction matrix >> + */ >> + >> +/** >> + * \var DebayerParams::CcmRow::c1 >> + * \brief First column of a CCM row >> + */ >> + >> +/** >> + * \var DebayerParams::CcmRow::c2 >> + * \brief Second column of a CCM row >> + */ >> + >> +/** >> + * \var DebayerParams::CcmRow::c3 >> + * \brief Third column of a CCM row >> + */ >> + >> /** >> * \typedef DebayerParams::ColorLookupTable >> + * \brief Type of the simple lookup tables for red, green, blue values >> + */ >> + >> +/** >> + * \typedef DebayerParams::CcmLookupTable >> + * \brief Type of the CCM lookup tables for red, green, blue values >> + */ >> + >> +/** >> + * \typedef DebayerParams::GammaLookupTable >> + * \brief Type of the gamma lookup tables for CCM >> + */ >> + >> +/** >> + * \union DebayerParams::LookupTable >> * \brief Type of the lookup tables for red, green, blue values >> */ >> +/** >> + * \var DebayerParams::LookupTable::simple >> + * \brief Simple lookup table for red, green, blue values >> + */ >> + >> +/** >> + * \var DebayerParams::LookupTable::ccm >> + * \brief CCM lookup table for red, green, blue values >> + */ >> + >> /** >> * \var DebayerParams::red >> * \brief Lookup table for red color, mapping input values to output values >> @@ -43,6 +88,11 @@ namespace libcamera { >> * \brief Lookup table for blue color, mapping input values to output values >> */ >> +/** >> + * \var DebayerParams::gammaLut >> + * \brief Gamma lookup table used with color correction matrix >> + */ >> + >> /** >> * \class Debayer >> * \brief Base debayering class >> @@ -57,10 +107,11 @@ Debayer::~Debayer() >> } >> /** >> - * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs) >> + * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs, ccmEnabled) >> * \brief Configure the debayer object according to the passed in parameters >> * \param[in] inputCfg The input configuration >> * \param[in] outputCfgs The output configurations >> + * \param[in] ccmEnabled Whether a color correction matrix is applied >> * >> * \return 0 on success, a negative errno on failure >> */ >> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp >> index 52baf43dc..d8976f183 100644 >> --- a/src/libcamera/software_isp/debayer_cpu.cpp >> +++ b/src/libcamera/software_isp/debayer_cpu.cpp >> @@ -11,6 +11,7 @@ >> #include "debayer_cpu.h" >> +#include <algorithm> >> #include <stdlib.h> >> #include <sys/ioctl.h> >> #include <time.h> >> @@ -50,8 +51,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) >> enableInputMemcpy_ = true; >> /* Initialize color lookup tables */ >> - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) >> - red_[i] = green_[i] = blue_[i] = i; >> + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> + red_.simple[i] = green_.simple[i] = blue_.simple[i] = i; >> + red_.ccm[i].c1 = red_.ccm[i].c2 = red_.ccm[i].c3 = 0; >> + green_.ccm[i].c1 = green_.ccm[i].c2 = green_.ccm[i].c3 = 0; >> + blue_.ccm[i].c1 = blue_.ccm[i].c2 = blue_.ccm[i].c3 = 0; >> + } >> } >> DebayerCpu::~DebayerCpu() = default; >> @@ -61,12 +66,24 @@ DebayerCpu::~DebayerCpu() = default; >> const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \ >> const pixel_t *next = (const pixel_t *)src[2] + xShift_; >> -#define STORE_PIXEL(b, g, r) \ >> - *dst++ = blue_[r]; \ >> - *dst++ = green_[g]; \ >> - *dst++ = red_[b]; \ >> - if constexpr (addAlphaByte) \ >> - *dst++ = 255; \ >> +#define GAMMA(value) \ >> + *dst++ = gammaLut_[std::clamp(value, 0, 1023)] >> + >> +#define STORE_PIXEL(b, g, r) \ >> + if constexpr (ccmEnabled) { \ >> + DebayerParams::CcmRow &red = red_.ccm[r]; \ >> + DebayerParams::CcmRow &green = green_.ccm[g]; \ >> + DebayerParams::CcmRow &blue = blue_.ccm[b]; \ >> + GAMMA(red.c3 + green.c3 + blue.c3); \ >> + GAMMA(red.c2 + green.c2 + blue.c2); \ >> + GAMMA(red.c1 + green.c1 + blue.c1); \ > > This apparently should be: > > + GAMMA(red.c1 + green.c1 + blue.c1); \ > + GAMMA(red.c2 + green.c2 + blue.c2); \ > + GAMMA(red.c3 + green.c3 + blue.c3); \ Not exactly. If red is red and blue is blue then the order is correct but the case where "red" and "blue" are swapped (swapRedBlueGains_) is handled incorrectly for the CCM case. I'll fix it in v2 too. Thank you for catching this. >> + } else { \ >> + *dst++ = blue_.simple[b]; \ >> + *dst++ = green_.simple[g]; \ >> + *dst++ = red_.simple[r]; \ >> + } \ >> + if constexpr (addAlphaByte) \ >> + *dst++ = 255; \ >> x++; >> /* >> @@ -764,6 +781,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output >> green_ = params.green; >> red_ = swapRedBlueGains_ ? params.blue : params.red; >> blue_ = swapRedBlueGains_ ? params.red : params.blue; >> + gammaLut_ = params.gammaLut; >> /* Copy metadata from the input buffer */ >> FrameMetadata &metadata = output->_d()->metadata(); >> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h >> index b2ec8f1bd..35c51e4fd 100644 >> --- a/src/libcamera/software_isp/debayer_cpu.h >> +++ b/src/libcamera/software_isp/debayer_cpu.h >> @@ -137,9 +137,10 @@ private: >> /* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */ >> static constexpr unsigned int kMaxLineBuffers = 5; >> - DebayerParams::ColorLookupTable red_; >> - DebayerParams::ColorLookupTable green_; >> - DebayerParams::ColorLookupTable blue_; >> + DebayerParams::LookupTable red_; >> + DebayerParams::LookupTable green_; >> + DebayerParams::LookupTable blue_; >> + DebayerParams::GammaLookupTable gammaLut_; >> debayerFn debayer0_; >> debayerFn debayer1_; >> debayerFn debayer2_; >> diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp >> index 0e1d3a457..08eca192e 100644 >> --- a/src/libcamera/software_isp/software_isp.cpp >> +++ b/src/libcamera/software_isp/software_isp.cpp >> @@ -79,9 +79,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor) >> for (unsigned int i = 0; i < 256; i++) >> gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5); >> for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { >> - debayerParams_.red[i] = gammaTable[i]; >> - debayerParams_.green[i] = gammaTable[i]; >> - debayerParams_.blue[i] = gammaTable[i]; >> + debayerParams_.red.simple[i] = gammaTable[i]; >> + debayerParams_.green.simple[i] = gammaTable[i]; >> + debayerParams_.blue.simple[i] = gammaTable[i]; >> } >> if (!dmaHeap_.isValid()) {
diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h index 7d8fdd481..531db6968 100644 --- a/include/libcamera/internal/software_isp/debayer_params.h +++ b/include/libcamera/internal/software_isp/debayer_params.h @@ -18,11 +18,25 @@ namespace libcamera { struct DebayerParams { static constexpr unsigned int kRGBLookupSize = 256; + struct CcmRow { + int16_t c1; + int16_t c2; + int16_t c3; + }; + using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>; + using CcmLookupTable = std::array<CcmRow, kRGBLookupSize>; + using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>; + + union LookupTable { + ColorLookupTable simple; + CcmLookupTable ccm; + }; - ColorLookupTable red; - ColorLookupTable green; - ColorLookupTable blue; + LookupTable red; + LookupTable green; + LookupTable blue; + GammaLookupTable gammaLut; }; } /* namespace libcamera */ diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp index 8fca96b0a..259976e08 100644 --- a/src/ipa/simple/algorithms/lut.cpp +++ b/src/ipa/simple/algorithms/lut.cpp @@ -44,6 +44,11 @@ void Lut::updateGammaTable(IPAContext &context) context.activeState.gamma.blackLevel = blackLevel; } +int16_t Lut::ccmValue(unsigned int i, float ccm) const +{ + return std::round(i * ccm); +} + void Lut::prepare(IPAContext &context, [[maybe_unused]] const uint32_t frame, [[maybe_unused]] IPAFrameContext &frameContext, @@ -55,27 +60,48 @@ void Lut::prepare(IPAContext &context, * observed, it's not permanently prone to minor fluctuations or * rounding errors. */ - if (context.activeState.gamma.blackLevel != context.activeState.blc.level) + const bool gammaUpdateNeeded = + context.activeState.gamma.blackLevel != context.activeState.blc.level; + if (gammaUpdateNeeded) updateGammaTable(context); auto &gains = context.activeState.awb.gains; auto &gammaTable = context.activeState.gamma.gammaTable; const unsigned int gammaTableSize = gammaTable.size(); - - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { - const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / - gammaTableSize; - /* Apply gamma after gain! */ - unsigned int idx; - idx = std::min({ static_cast<unsigned int>(i * gains.red / div), - gammaTableSize - 1 }); - params->red[i] = gammaTable[idx]; - idx = std::min({ static_cast<unsigned int>(i * gains.green / div), - gammaTableSize - 1 }); - params->green[i] = gammaTable[idx]; - idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), - gammaTableSize - 1 }); - params->blue[i] = gammaTable[idx]; + const double div = static_cast<double>(DebayerParams::kRGBLookupSize) / + gammaTableSize; + + if (!context.activeState.ccm.enabled) { + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { + /* Apply gamma after gain! */ + unsigned int idx; + idx = std::min({ static_cast<unsigned int>(i * gains.red / div), + gammaTableSize - 1 }); + params->red.simple[i] = gammaTable[idx]; + idx = std::min({ static_cast<unsigned int>(i * gains.green / div), + gammaTableSize - 1 }); + params->green.simple[i] = gammaTable[idx]; + idx = std::min({ static_cast<unsigned int>(i * gains.blue / div), + gammaTableSize - 1 }); + params->blue.simple[i] = gammaTable[idx]; + } + } else if (context.activeState.ccm.changed || gammaUpdateNeeded) { + auto &ccm = context.activeState.ccm.ccm; + auto &red = params->red.ccm; + auto &green = params->green.ccm; + auto &blue = params->blue.ccm; + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { + red[i].c1 = ccmValue(i, ccm[0][0]); + red[i].c2 = ccmValue(i, ccm[0][1]); + red[i].c3 = ccmValue(i, ccm[0][2]); + green[i].c1 = ccmValue(i, ccm[1][0]); + green[i].c2 = ccmValue(i, ccm[1][1]); + green[i].c3 = ccmValue(i, ccm[1][2]); + blue[i].c1 = ccmValue(i, ccm[2][0]); + blue[i].c2 = ccmValue(i, ccm[2][1]); + blue[i].c3 = ccmValue(i, ccm[2][2]); + params->gammaLut[i] = gammaTable[i / div]; + } } } diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h index b635987d0..8d50a23fc 100644 --- a/src/ipa/simple/algorithms/lut.h +++ b/src/ipa/simple/algorithms/lut.h @@ -27,6 +27,7 @@ public: private: void updateGammaTable(IPAContext &context); + int16_t ccmValue(unsigned int i, float ccm) const; }; } /* namespace ipa::soft::algorithms */ diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp index f0b832619..97106b22b 100644 --- a/src/libcamera/software_isp/debayer.cpp +++ b/src/libcamera/software_isp/debayer.cpp @@ -23,11 +23,56 @@ namespace libcamera { * \brief Size of a color lookup table */ +/** + * \struct DebayerParams::CcmRow + * \brief Type of a single row of a color correction matrix + */ + +/** + * \var DebayerParams::CcmRow::c1 + * \brief First column of a CCM row + */ + +/** + * \var DebayerParams::CcmRow::c2 + * \brief Second column of a CCM row + */ + +/** + * \var DebayerParams::CcmRow::c3 + * \brief Third column of a CCM row + */ + /** * \typedef DebayerParams::ColorLookupTable + * \brief Type of the simple lookup tables for red, green, blue values + */ + +/** + * \typedef DebayerParams::CcmLookupTable + * \brief Type of the CCM lookup tables for red, green, blue values + */ + +/** + * \typedef DebayerParams::GammaLookupTable + * \brief Type of the gamma lookup tables for CCM + */ + +/** + * \union DebayerParams::LookupTable * \brief Type of the lookup tables for red, green, blue values */ +/** + * \var DebayerParams::LookupTable::simple + * \brief Simple lookup table for red, green, blue values + */ + +/** + * \var DebayerParams::LookupTable::ccm + * \brief CCM lookup table for red, green, blue values + */ + /** * \var DebayerParams::red * \brief Lookup table for red color, mapping input values to output values @@ -43,6 +88,11 @@ namespace libcamera { * \brief Lookup table for blue color, mapping input values to output values */ +/** + * \var DebayerParams::gammaLut + * \brief Gamma lookup table used with color correction matrix + */ + /** * \class Debayer * \brief Base debayering class @@ -57,10 +107,11 @@ Debayer::~Debayer() } /** - * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs) + * \fn int Debayer::configure(const StreamConfiguration &inputCfg, const std::vector<std::reference_wrapper<StreamConfiguration>> &outputCfgs, ccmEnabled) * \brief Configure the debayer object according to the passed in parameters * \param[in] inputCfg The input configuration * \param[in] outputCfgs The output configurations + * \param[in] ccmEnabled Whether a color correction matrix is applied * * \return 0 on success, a negative errno on failure */ diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp index 52baf43dc..d8976f183 100644 --- a/src/libcamera/software_isp/debayer_cpu.cpp +++ b/src/libcamera/software_isp/debayer_cpu.cpp @@ -11,6 +11,7 @@ #include "debayer_cpu.h" +#include <algorithm> #include <stdlib.h> #include <sys/ioctl.h> #include <time.h> @@ -50,8 +51,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) enableInputMemcpy_ = true; /* Initialize color lookup tables */ - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) - red_[i] = green_[i] = blue_[i] = i; + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { + red_.simple[i] = green_.simple[i] = blue_.simple[i] = i; + red_.ccm[i].c1 = red_.ccm[i].c2 = red_.ccm[i].c3 = 0; + green_.ccm[i].c1 = green_.ccm[i].c2 = green_.ccm[i].c3 = 0; + blue_.ccm[i].c1 = blue_.ccm[i].c2 = blue_.ccm[i].c3 = 0; + } } DebayerCpu::~DebayerCpu() = default; @@ -61,12 +66,24 @@ DebayerCpu::~DebayerCpu() = default; const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \ const pixel_t *next = (const pixel_t *)src[2] + xShift_; -#define STORE_PIXEL(b, g, r) \ - *dst++ = blue_[r]; \ - *dst++ = green_[g]; \ - *dst++ = red_[b]; \ - if constexpr (addAlphaByte) \ - *dst++ = 255; \ +#define GAMMA(value) \ + *dst++ = gammaLut_[std::clamp(value, 0, 1023)] + +#define STORE_PIXEL(b, g, r) \ + if constexpr (ccmEnabled) { \ + DebayerParams::CcmRow &red = red_.ccm[r]; \ + DebayerParams::CcmRow &green = green_.ccm[g]; \ + DebayerParams::CcmRow &blue = blue_.ccm[b]; \ + GAMMA(red.c3 + green.c3 + blue.c3); \ + GAMMA(red.c2 + green.c2 + blue.c2); \ + GAMMA(red.c1 + green.c1 + blue.c1); \ + } else { \ + *dst++ = blue_.simple[b]; \ + *dst++ = green_.simple[g]; \ + *dst++ = red_.simple[r]; \ + } \ + if constexpr (addAlphaByte) \ + *dst++ = 255; \ x++; /* @@ -764,6 +781,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output green_ = params.green; red_ = swapRedBlueGains_ ? params.blue : params.red; blue_ = swapRedBlueGains_ ? params.red : params.blue; + gammaLut_ = params.gammaLut; /* Copy metadata from the input buffer */ FrameMetadata &metadata = output->_d()->metadata(); diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h index b2ec8f1bd..35c51e4fd 100644 --- a/src/libcamera/software_isp/debayer_cpu.h +++ b/src/libcamera/software_isp/debayer_cpu.h @@ -137,9 +137,10 @@ private: /* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */ static constexpr unsigned int kMaxLineBuffers = 5; - DebayerParams::ColorLookupTable red_; - DebayerParams::ColorLookupTable green_; - DebayerParams::ColorLookupTable blue_; + DebayerParams::LookupTable red_; + DebayerParams::LookupTable green_; + DebayerParams::LookupTable blue_; + DebayerParams::GammaLookupTable gammaLut_; debayerFn debayer0_; debayerFn debayer1_; debayerFn debayer2_; diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp index 0e1d3a457..08eca192e 100644 --- a/src/libcamera/software_isp/software_isp.cpp +++ b/src/libcamera/software_isp/software_isp.cpp @@ -79,9 +79,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor) for (unsigned int i = 0; i < 256; i++) gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5); for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { - debayerParams_.red[i] = gammaTable[i]; - debayerParams_.green[i] = gammaTable[i]; - debayerParams_.blue[i] = gammaTable[i]; + debayerParams_.red.simple[i] = gammaTable[i]; + debayerParams_.green.simple[i] = gammaTable[i]; + debayerParams_.blue.simple[i] = gammaTable[i]; } if (!dmaHeap_.isValid()) {
This patch applies color correction matrix (CCM) in debayering if the CCM is specified. Not using CCM must still be supported for performance reasons. The CCM is applied as follows: [r1 r2 r3] [r g b] * [g1 g2 g3] [b1 b2 b3] The CCM matrix (the right side of the multiplication) is constant during single frame processing, while the input pixel (the left side) changes. Because each of the color channels is only 8-bit in software ISP, we can make 9 lookup tables with 256 input values for multiplications of each of the r_i, g_i, b_i values. This way we don't have to multiply each pixel, we can use table lookups and additions instead. Gamma (which is non-linear and thus cannot be a part of the 9 lookup tables values) is applied on the final values rounded to integers using another lookup table. We use int16_t to store the precomputed multiplications. This seems to be noticeably (>10%) faster than `float' for the price of slightly less accuracy and it covers the range of values that sane CCMs produce. The selection and structure of data is performance critical, for example using bytes would add significant (>10%) speedup but would be too short to cover the value range. The color lookup tables are changed to unions to be able to serve as both simple lookup tables or CCM lookup tables. This is arguable but it keeps the code easier to understand if nothing else. It is important to make the unions on the whole lookup tables rather than their values because the latter might cause a noticeable slowdown on simple lookup due to the sheer fact that the table is not compact. Using unions makes the initialization of the tables dubious because it is done before the decision about using CCM is made but the initial values are dubious anyway. The tables are copied (as before), which is not elegant but also not a big problem. There are patches posted that use shared buffers for parameters passing in software ISP (see software ISP TODO #5) and they can be adjusted for the new parameter format. With this patch, the reported per-frame slowdown when applying CCM is about 45% on Debix Model A and about 85% on TI AM69 SK. Signed-off-by: Milan Zamazal <mzamazal@redhat.com> --- .../internal/software_isp/debayer_params.h | 20 ++++++- src/ipa/simple/algorithms/lut.cpp | 58 ++++++++++++++----- src/ipa/simple/algorithms/lut.h | 1 + src/libcamera/software_isp/debayer.cpp | 53 ++++++++++++++++- src/libcamera/software_isp/debayer_cpu.cpp | 34 ++++++++--- src/libcamera/software_isp/debayer_cpu.h | 7 ++- src/libcamera/software_isp/software_isp.cpp | 6 +- 7 files changed, 145 insertions(+), 34 deletions(-)