Message ID | 20240319123622.675599-18-mzamazal@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series |
|
Related | show |
Hi Milan, (CC'ing David) Thank you for the patch. On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote: > Black may not be represented as 0 pixel value for given hardware, it may be > higher. If this is not compensated then various problems may occur such as low > contrast or suboptimal exposure. > > The black pixel value can be either retrieved from a tuning file for the given > hardware, or automatically on fly. The former is the right and correct method, s/on fly/on the fly/ > while the latter can be used when a tuning file is not available for the given > hardware. Since there is currently no support for tuning files in software ISP, > the automatic, hardware independent way, is always used. Support for tuning > files should be added in future but it will require more work than this patch. I don't think this is quite right. Strictly speaking, the black level compensation is about subtracting the black level as produced by the sensor. This requires tuning, and shouldn't be done on-the-fly. On the other hand, looking at the histogram to try and stretch it is called contrast adjustment. There's nothing wrong in doing so, but it's usually done towards the output of the processing pipeline, and is associated with manual adjustments of the contrast and brightness. See src/ipa/rpi/controller/rpi/contrast.cpp for instance. David may be able to comment further as to why BLC and contrast are quite different. As this patch may need more work, I propose not including it in v7 to avoid delay merging the rest of the implementation. > The patch looks at the image histogram and assumes that black starts when pixel > values start occurring on the left. A certain amount of the darkest pixels is > ignored; it doesn't matter whether they represent various kinds of noise or are > real, they are better to omit in any case to make the image looking better. It > also doesn't matter whether the darkest pixels occur around the supposed black > level or are spread between 0 and the black level, the difference is not > important. > > An arbitrary threshold of 2% darkest pixels is applied; there is no magic about > that value. > > The patch assumes that the black values for different colors are the same and > doesn't attempt any other non-primitive enhancements. It cannot completely > replace tuning files and simplicity, while providing visible benefit, is its > goal. Anything more sophisticated is left for future patches. > > A possible cheap enhancement, if needed, could be setting exposure + gain to > minimum values temporarily, before setting the black level. In theory, the > black level should be fixed but it may not be reached in all images. For this > reason, the patch updates black level only if the observed value is lower than > the current one; it should be never increased. > > The purpose of the patch is to compensate for hardware properties. General > image contrast enhancements are out of scope of this patch. > > Stats are still gathered as an uncorrected histogram, to avoid any confusion and > to represent the raw image data. Exposure must be determined after the black > level correction -- it has no influence on the sub-black area and must be > correct after applying the black level correction. The granularity of the > histogram is increased from 16 to 64 to provide a better precision (there is no > theory behind either of those numbers). > > Reviewed-by: Hans de Goede <hdegoede@redhat.com> > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > Signed-off-by: Hans de Goede <hdegoede@redhat.com> > --- > .../internal/software_isp/debayer_params.h | 4 + > .../internal/software_isp/swisp_stats.h | 10 ++- > src/ipa/simple/black_level.cpp | 88 +++++++++++++++++++ > src/ipa/simple/black_level.h | 28 ++++++ > src/ipa/simple/meson.build | 7 +- > src/ipa/simple/soft_simple.cpp | 28 ++++-- > src/libcamera/software_isp/debayer_cpu.cpp | 13 ++- > src/libcamera/software_isp/debayer_cpu.h | 1 + > src/libcamera/software_isp/software_isp.cpp | 2 +- > 9 files changed, 164 insertions(+), 17 deletions(-) > create mode 100644 src/ipa/simple/black_level.cpp > create mode 100644 src/ipa/simple/black_level.h > > diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h > index 98965fa1..5e38e08b 100644 > --- a/include/libcamera/internal/software_isp/debayer_params.h > +++ b/include/libcamera/internal/software_isp/debayer_params.h > @@ -43,6 +43,10 @@ struct DebayerParams { > * \brief Gamma correction, 1.0 is no correction > */ > float gamma; > + /** > + * \brief Level of the black point, 0..255, 0 is no correction. > + */ > + unsigned int blackLevel; > }; > > } /* namespace libcamera */ > diff --git a/include/libcamera/internal/software_isp/swisp_stats.h b/include/libcamera/internal/software_isp/swisp_stats.h > index afe42c9a..25cd5abd 100644 > --- a/include/libcamera/internal/software_isp/swisp_stats.h > +++ b/include/libcamera/internal/software_isp/swisp_stats.h > @@ -7,6 +7,8 @@ > > #pragma once > > +#include <array> > + > namespace libcamera { > > /** > @@ -28,11 +30,15 @@ struct SwIspStats { > /** > * \brief Number of bins in the yHistogram. > */ > - static constexpr unsigned int kYHistogramSize = 16; > + static constexpr unsigned int kYHistogramSize = 64; > + /** > + * \brief Type of the histogram. > + */ > + using histogram = std::array<unsigned int, kYHistogramSize>; > /** > * \brief A histogram of luminance values. > */ > - std::array<unsigned int, kYHistogramSize> yHistogram; > + histogram yHistogram; > }; > > } /* namespace libcamera */ > diff --git a/src/ipa/simple/black_level.cpp b/src/ipa/simple/black_level.cpp > new file mode 100644 > index 00000000..9d4aa800 > --- /dev/null > +++ b/src/ipa/simple/black_level.cpp > @@ -0,0 +1,88 @@ > +/* SPDX-License-Identifier: LGPL-2.1-or-later */ > +/* > + * Copyright (C) 2024, Red Hat Inc. > + * > + * black_level.cpp - black level handling > + */ > + > +#include "black_level.h" > + > +#include <numeric> > + > +#include <libcamera/base/log.h> > + > +namespace libcamera { > + > +LOG_DEFINE_CATEGORY(IPASoftBL) > + > +/** > + * \class BlackLevel > + * \brief Object providing black point level for software ISP > + * > + * Black level can be provided in hardware tuning files or, if no tuning file is > + * available for the given hardware, guessed automatically, with less accuracy. > + * As tuning files are not yet implemented for software ISP, BlackLevel > + * currently provides only guessed black levels. > + * > + * This class serves for tracking black level as a property of the underlying > + * hardware, not as means of enhancing a particular scene or image. > + * > + * The class is supposed to be instantiated for the given camera stream. > + * The black level can be retrieved using BlackLevel::get() method. It is > + * initially 0 and may change when updated using BlackLevel::update() method. > + */ > + > +BlackLevel::BlackLevel() > + : blackLevel_(255), blackLevelSet_(false) > +{ > +} > + > +/** > + * \brief Return the current black level > + * > + * \return The black level, in the range from 0 (minimum) to 255 (maximum). > + * If the black level couldn't be determined yet, return 0. > + */ > +unsigned int BlackLevel::get() const > +{ > + return blackLevelSet_ ? blackLevel_ : 0; > +} > + > +/** > + * \brief Update black level from the provided histogram > + * \param[in] yHistogram The histogram to be used for updating black level > + * > + * The black level is property of the given hardware, not image. It is updated > + * only if it has not been yet set or if it is lower than the lowest value seen > + * so far. > + */ > +void BlackLevel::update(SwIspStats::histogram &yHistogram) > +{ > + /* > + * The constant is selected to be "good enough", not overly conservative or > + * aggressive. There is no magic about the given value. > + */ > + constexpr float ignoredPercentage_ = 0.02; > + const unsigned int total = > + std::accumulate(begin(yHistogram), end(yHistogram), 0); > + const unsigned int pixelThreshold = ignoredPercentage_ * total; > + const unsigned int histogramRatio = 256 / SwIspStats::kYHistogramSize; > + const unsigned int currentBlackIdx = blackLevel_ / histogramRatio; > + > + for (unsigned int i = 0, seen = 0; > + i < currentBlackIdx && i < SwIspStats::kYHistogramSize; > + i++) { > + seen += yHistogram[i]; > + if (seen >= pixelThreshold) { > + blackLevel_ = i * histogramRatio; > + blackLevelSet_ = true; > + LOG(IPASoftBL, Debug) > + << "Auto-set black level: " > + << i << "/" << SwIspStats::kYHistogramSize > + << " (" << 100 * (seen - yHistogram[i]) / total << "% below, " > + << 100 * seen / total << "% at or below)"; > + break; > + } > + }; > +} > +} // namespace libcamera } /* namespace libcamera */ Same below. > diff --git a/src/ipa/simple/black_level.h b/src/ipa/simple/black_level.h > new file mode 100644 > index 00000000..b3785db0 > --- /dev/null > +++ b/src/ipa/simple/black_level.h > @@ -0,0 +1,28 @@ > +/* SPDX-License-Identifier: LGPL-2.1-or-later */ > +/* > + * Copyright (C) 2024, Red Hat Inc. > + * > + * black_level.h - black level handling > + */ > + > +#pragma once > + > +#include <array> > + > +#include "libcamera/internal/software_isp/swisp_stats.h" > + > +namespace libcamera { > + > +class BlackLevel Wouldn't it be better to move to using the libipa Algorithm class before introducing new algorithms ? > +{ > +public: > + BlackLevel(); > + unsigned int get() const; > + void update(std::array<unsigned int, SwIspStats::kYHistogramSize> &yHistogram); > + > +private: > + unsigned int blackLevel_; > + bool blackLevelSet_; > +}; > + > +} // namespace libcamera > diff --git a/src/ipa/simple/meson.build b/src/ipa/simple/meson.build > index 3e863db7..44b5f1d7 100644 > --- a/src/ipa/simple/meson.build > +++ b/src/ipa/simple/meson.build > @@ -2,8 +2,13 @@ > > ipa_name = 'ipa_soft_simple' > > +soft_simple_sources = files([ > + 'soft_simple.cpp', > + 'black_level.cpp', > +]) > + > mod = shared_module(ipa_name, > - ['soft_simple.cpp', libcamera_generated_ipa_headers], > + [soft_simple_sources, libcamera_generated_ipa_headers], > name_prefix : '', > include_directories : [ipa_includes, libipa_includes], > dependencies : libcamera_private, > diff --git a/src/ipa/simple/soft_simple.cpp b/src/ipa/simple/soft_simple.cpp > index 312df4ba..ac027568 100644 > --- a/src/ipa/simple/soft_simple.cpp > +++ b/src/ipa/simple/soft_simple.cpp > @@ -22,6 +22,8 @@ > #include "libcamera/internal/software_isp/debayer_params.h" > #include "libcamera/internal/software_isp/swisp_stats.h" > > +#include "black_level.h" > + > namespace libcamera { > > LOG_DEFINE_CATEGORY(IPASoft) > @@ -33,7 +35,8 @@ class IPASoftSimple : public ipa::soft::IPASoftInterface > public: > IPASoftSimple() > : params_(static_cast<DebayerParams *>(MAP_FAILED)), > - stats_(static_cast<SwIspStats *>(MAP_FAILED)), ignore_updates_(0) > + stats_(static_cast<SwIspStats *>(MAP_FAILED)), > + blackLevel_(BlackLevel()), ignore_updates_(0) > { > } > > @@ -63,6 +66,7 @@ private: > SharedFD fdParams_; > DebayerParams *params_; > SwIspStats *stats_; > + BlackLevel blackLevel_; > > int32_t exposure_min_, exposure_max_; > int32_t again_min_, again_max_; > @@ -196,6 +200,10 @@ void IPASoftSimple::processStats(const ControlList &sensorControls) > params_->gainG = 256; > params_->gamma = 0.5; > > + if (ignore_updates_ > 0) > + blackLevel_.update(stats_->yHistogram); > + params_->blackLevel = blackLevel_.get(); > + > setIspParams.emit(0); > > /* > @@ -211,18 +219,19 @@ void IPASoftSimple::processStats(const ControlList &sensorControls) > * Calculate Mean Sample Value (MSV) according to formula from: > * https://www.araa.asn.au/acra/acra2007/papers/paper84final.pdf > */ > - constexpr unsigned int yHistValsPerBin = > - SwIspStats::kYHistogramSize / kExposureBinsCount; > - constexpr unsigned int yHistValsPerBinMod = > - SwIspStats::kYHistogramSize / > - (SwIspStats::kYHistogramSize % kExposureBinsCount + 1); > + const unsigned int blackLevelHistIdx = > + params_->blackLevel / (256 / SwIspStats::kYHistogramSize); > + const unsigned int histogramSize = SwIspStats::kYHistogramSize - blackLevelHistIdx; > + const unsigned int yHistValsPerBin = histogramSize / kExposureBinsCount; > + const unsigned int yHistValsPerBinMod = > + histogramSize / (histogramSize % kExposureBinsCount + 1); > int ExposureBins[kExposureBinsCount] = {}; > unsigned int denom = 0; > unsigned int num = 0; > > - for (unsigned int i = 0; i < SwIspStats::kYHistogramSize; i++) { > + for (unsigned int i = 0; i < histogramSize; i++) { > unsigned int idx = (i - (i / yHistValsPerBinMod)) / yHistValsPerBin; > - ExposureBins[idx] += stats_->yHistogram[i]; > + ExposureBins[idx] += stats_->yHistogram[blackLevelHistIdx + i]; > } > > for (unsigned int i = 0; i < kExposureBinsCount; i++) { > @@ -256,7 +265,8 @@ void IPASoftSimple::processStats(const ControlList &sensorControls) > > LOG(IPASoft, Debug) << "exposureMSV " << exposureMSV > << " exp " << exposure_ << " again " << again_ > - << " gain R/B " << params_->gainR << "/" << params_->gainB; > + << " gain R/B " << params_->gainR << "/" << params_->gainB > + << " black level " << params_->blackLevel; > } > > void IPASoftSimple::updateExposure(double exposureMSV) > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp > index a1692693..3be3cdfe 100644 > --- a/src/libcamera/software_isp/debayer_cpu.cpp > +++ b/src/libcamera/software_isp/debayer_cpu.cpp > @@ -35,7 +35,7 @@ namespace libcamera { > * \param[in] stats Pointer to the stats object to use. > */ > DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) > - : stats_(std::move(stats)), gamma_correction_(1.0) > + : stats_(std::move(stats)), gamma_correction_(1.0), blackLevel_(0) > { > #ifdef __x86_64__ > enableInputMemcpy_ = false; > @@ -683,11 +683,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > } > > /* Apply DebayerParams */ > - if (params.gamma != gamma_correction_) { > - for (unsigned int i = 0; i < kGammaLookupSize; i++) > - gamma_[i] = UINT8_MAX * powf(i / (kGammaLookupSize - 1.0), params.gamma); > + if (params.gamma != gamma_correction_ || params.blackLevel != blackLevel_) { > + const unsigned int blackIndex = > + params.blackLevel * kGammaLookupSize / 256; > + std::fill(gamma_.begin(), gamma_.begin() + blackIndex, 0); > + const float divisor = kGammaLookupSize - blackIndex - 1.0; > + for (unsigned int i = blackIndex; i < kGammaLookupSize; i++) > + gamma_[i] = UINT8_MAX * powf((i - blackIndex) / divisor, params.gamma); > > gamma_correction_ = params.gamma; > + blackLevel_ = params.blackLevel; > } > > if (swapRedBlueGains_) > diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h > index 5f44fc65..ea02f909 100644 > --- a/src/libcamera/software_isp/debayer_cpu.h > +++ b/src/libcamera/software_isp/debayer_cpu.h > @@ -147,6 +147,7 @@ private: > bool enableInputMemcpy_; > bool swapRedBlueGains_; > float gamma_correction_; > + unsigned int blackLevel_; > unsigned int measuredFrames_; > int64_t frameProcessTime_; > /* Skip 30 frames for things to stabilize then measure 30 frames */ > diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp > index 388b4496..9b49be41 100644 > --- a/src/libcamera/software_isp/software_isp.cpp > +++ b/src/libcamera/software_isp/software_isp.cpp > @@ -64,7 +64,7 @@ LOG_DEFINE_CATEGORY(SoftwareIsp) > */ > SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const ControlInfoMap &sensorControls) > : debayer_(nullptr), > - debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f }, > + debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f, 0 }, > dmaHeap_(DmaHeap::DmaHeapFlag::Cma | DmaHeap::DmaHeapFlag::System) > { > if (!dmaHeap_.isValid()) {
Hi Laurent, thank you for the review. Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > Hi Milan, > > (CC'ing David) > > Thank you for the patch. > > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote: >> Black may not be represented as 0 pixel value for given hardware, it may be >> higher. If this is not compensated then various problems may occur such as low >> contrast or suboptimal exposure. >> >> The black pixel value can be either retrieved from a tuning file for the given >> hardware, or automatically on fly. The former is the right and correct method, > > s/on fly/on the fly/ > >> while the latter can be used when a tuning file is not available for the given >> hardware. Since there is currently no support for tuning files in software ISP, >> the automatic, hardware independent way, is always used. Support for tuning >> files should be added in future but it will require more work than this patch. > > I don't think this is quite right. Strictly speaking, the black level > compensation is about subtracting the black level as produced by the > sensor. This requires tuning, and shouldn't be done on-the-fly. A general agreement from those who tried the patch was that it makes a noticeable improvement. As explained in my comments above, this is a fallback mechanism that we can use cheaply and immediately, until or unless tuning file for software ISP is available. So far, it seems to be beneficial rather than harmful. Unless we have something better or perfect soon, which doesn't seem to be the case to me, why not to use this simple and useful (I believe) fallback mechanism? Do you have a better suggestion what could be done right now regarding this issue (not considering black level at all is a real issue)? > On the other hand, looking at the histogram to try and stretch it is > called contrast adjustment. There's nothing wrong in doing so, but it's > usually done towards the output of the processing pipeline, and is > associated with manual adjustments of the contrast and brightness. See > src/ipa/rpi/controller/rpi/contrast.cpp for instance. Contrast adjustment is a different issue than the one this patch tries to address and less important now. I plan to work on it and similar functionality later. > David may be able to comment further as to why BLC and contrast are > quite different. I think I understand the difference but feel free to comment if you think I'm missing something. > As this patch may need more work, I propose not including it in v7 to > avoid delay merging the rest of the implementation. Please consider my comments above. I'm open to suggestions how to do better at the moment. Less comfortable with nothing should be done about black level until we have a perfect solution. Thanks, Milan
Hi everyone Thanks for cc'ing me on this question. Always happy to try and answer! On Thu, 28 Mar 2024 at 10:11, Milan Zamazal <mzamazal@redhat.com> wrote: > > Hi Laurent, > > thank you for the review. > > Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > > > Hi Milan, > > > > (CC'ing David) > > > > Thank you for the patch. > > > > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote: > >> Black may not be represented as 0 pixel value for given hardware, it may be > >> higher. If this is not compensated then various problems may occur such as low > >> contrast or suboptimal exposure. > >> > >> The black pixel value can be either retrieved from a tuning file for the given > >> hardware, or automatically on fly. The former is the right and correct method, > > > > s/on fly/on the fly/ > > > >> while the latter can be used when a tuning file is not available for the given > >> hardware. Since there is currently no support for tuning files in software ISP, > >> the automatic, hardware independent way, is always used. Support for tuning > >> files should be added in future but it will require more work than this patch. > > > > I don't think this is quite right. Strictly speaking, the black level > > compensation is about subtracting the black level as produced by the > > sensor. This requires tuning, and shouldn't be done on-the-fly. > > A general agreement from those who tried the patch was that it makes a > noticeable improvement. As explained in my comments above, this is a fallback > mechanism that we can use cheaply and immediately, until or unless tuning file > for software ISP is available. So far, it seems to be beneficial rather than > harmful. > > Unless we have something better or perfect soon, which doesn't seem to be the > case to me, why not to use this simple and useful (I believe) fallback > mechanism? Do you have a better suggestion what could be done right now > regarding this issue (not considering black level at all is a real issue)? This is of course partly a technical issue - what's the right thing to do? - and particularly a pragmatic issue - whether to do something imperfect right now or not. To tackle the technical question, black level is normally removed fairly early in the pipeline, because not doing so will make typical operations in the Bayer pipe go wrong. For example: * Lens shading (vignetting correction). I don't know whether you have this feature currently, but maybe it will be required at some point to improve image quality? Doing this without first removing the black level will make all the colours go wrong. * White balance. This too will go wrong if you haven't subtracted the black level. White balance is normally applied before demosaic because demosaic typically works better with white-balanced pixels. Obviously it depends what your demosaic algorithm does, but I wonder if there's a future in which you have higher quality versions of some algorithms for certain use cases (e.g. stills capture) where this could become an issue. * Colour correction. Like white balance, colour correction matrices will not do the right thing if black level has not been removed. You can probably guess that I belong more to the "calibrate the camera properly" view of the world because doing that fixes problems and makes them go away _forever_. But I'm not familiar with the background here, so I can appreciate that interim solutions may have a place. > > > On the other hand, looking at the histogram to try and stretch it is > > called contrast adjustment. There's nothing wrong in doing so, but it's > > usually done towards the output of the processing pipeline, and is > > associated with manual adjustments of the contrast and brightness. See > > src/ipa/rpi/controller/rpi/contrast.cpp for instance. > > Contrast adjustment is a different issue than the one this patch tries to > address and less important now. I plan to work on it and similar functionality > later. > > > David may be able to comment further as to why BLC and contrast are > > quite different. We treat contrast adjustment as a more cosmetic feature of an image which users can adjust according to their own personal taste. They can have more of it or less of it, as they wish, and it doesn't really affect the other aspects of image, such as colours or image quality. Black level is quite different. There's a known value that you should use. There's basically no choice. Speaking for the Pi, I would be highly reluctant to couple these two things together - you certainly wouldn't want contrast adjustment to start changing the colours, for example! Anyway, I hope that provides a little bit of a comparison, at least from the point of view of a different implementation. Good luck! David > > I think I understand the difference but feel free to comment if you think I'm > missing something. > > > As this patch may need more work, I propose not including it in v7 to > > avoid delay merging the rest of the implementation. > > Please consider my comments above. I'm open to suggestions how to do better at > the moment. Less comfortable with nothing should be done about black level > until we have a perfect solution. > > Thanks, > Milan >
David Plowman <david.plowman@raspberrypi.com> writes: > Hi everyone > > Thanks for cc'ing me on this question. Always happy to try and answer! Hi David, thank you for your answer and explanations. > On Thu, 28 Mar 2024 at 10:11, Milan Zamazal <mzamazal@redhat.com> wrote: >> >> Hi Laurent, >> >> thank you for the review. >> >> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: >> >> > Hi Milan, >> > >> > (CC'ing David) >> > >> > Thank you for the patch. >> > >> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote: >> >> Black may not be represented as 0 pixel value for given hardware, it may be >> >> higher. If this is not compensated then various problems may occur such as low >> >> contrast or suboptimal exposure. >> >> >> >> The black pixel value can be either retrieved from a tuning file for the given >> >> hardware, or automatically on fly. The former is the right and correct method, >> > >> > s/on fly/on the fly/ >> > >> >> while the latter can be used when a tuning file is not available for the given >> >> hardware. Since there is currently no support for tuning files in software ISP, >> >> the automatic, hardware independent way, is always used. Support for tuning >> >> files should be added in future but it will require more work than this patch. >> > >> > I don't think this is quite right. Strictly speaking, the black level >> > compensation is about subtracting the black level as produced by the >> > sensor. This requires tuning, and shouldn't be done on-the-fly. >> >> A general agreement from those who tried the patch was that it makes a >> noticeable improvement. As explained in my comments above, this is a fallback >> mechanism that we can use cheaply and immediately, until or unless tuning file >> for software ISP is available. So far, it seems to be beneficial rather than >> harmful. >> >> Unless we have something better or perfect soon, which doesn't seem to be the >> case to me, why not to use this simple and useful (I believe) fallback >> mechanism? Do you have a better suggestion what could be done right now >> regarding this issue (not considering black level at all is a real issue)? > > This is of course partly a technical issue - what's the right thing to > do? - and particularly a pragmatic issue - whether to do something > imperfect right now or not. Yes. > To tackle the technical question, black level is normally removed > fairly early in the pipeline, because not doing so will make typical > operations in the Bayer pipe go wrong. For example: > > * Lens shading (vignetting correction). I don't know whether you have > this feature currently, but maybe it will be required at some point to > improve image quality? Doing this without first removing the black > level will make all the colours go wrong. > > * White balance. This too will go wrong if you haven't subtracted the > black level. White balance is normally applied before demosaic because > demosaic typically works better with white-balanced pixels. Obviously > it depends what your demosaic algorithm does, but I wonder if there's > a future in which you have higher quality versions of some algorithms > for certain use cases (e.g. stills capture) where this could become an > issue. > > * Colour correction. Like white balance, colour correction matrices > will not do the right thing if black level has not been removed. We don't have lens shading or color correction, once we have support for these there is no reason not to have proper black level correction as well. As for white balance, we currently use probably the most primitive method possible (green/red and green/blue ratios) and we don't subtract black level even with this patch (which should get fixed later). The patch subtracts black level when correcting exposure though, which is also influenced by this. Considering all the imperfections at the moment, the actual effects on white balance and exposure may not be that big to care very much. But even then ... > You can probably guess that I belong more to the "calibrate the camera > properly" view of the world because doing that fixes problems and > makes them go away _forever_. But I'm not familiar with the background > here, so I can appreciate that interim solutions may have a place. ... the thing is that not applying any black level correction, even when not considering white balance and exposure deformations, often makes the image looking noticeably flat as there are no blacks. I'd say better imperfect blacks than no blacks, similarly to having imperfect rather than no white balance correction (yes, comparing apples and oranges technically, but both have their impacts on the user satisfaction with the output). >> > On the other hand, looking at the histogram to try and stretch it is >> > called contrast adjustment. There's nothing wrong in doing so, but it's >> > usually done towards the output of the processing pipeline, and is >> > associated with manual adjustments of the contrast and brightness. See >> > src/ipa/rpi/controller/rpi/contrast.cpp for instance. >> >> Contrast adjustment is a different issue than the one this patch tries to >> address and less important now. I plan to work on it and similar functionality >> later. >> >> > David may be able to comment further as to why BLC and contrast are >> > quite different. > > We treat contrast adjustment as a more cosmetic feature of an image > which users can adjust according to their own personal taste. They can > have more of it or less of it, as they wish, and it doesn't really > affect the other aspects of image, such as colours or image quality. Yes, as for contrast, we quickly rejected the idea of handling it automatically and decided to implement the proper user adjustable settings later. > Black level is quite different. There's a known value that you should > use. There's basically no choice. Speaking for the Pi, I would be > highly reluctant to couple these two things together - you certainly > wouldn't want contrast adjustment to start changing the colours, for > example! Exactly. This is why I'm advocating for still considering the patch rather than dropping it. At least in my setup the auto-determined black value doesn't seem to be way off and serves its purpose even without access to exact tuning data. > Anyway, I hope that provides a little bit of a comparison, at least > from the point of view of a different implementation. Good luck! Thanks, Milan > David > >> >> I think I understand the difference but feel free to comment if you think I'm >> missing something. >> >> > As this patch may need more work, I propose not including it in v7 to >> > avoid delay merging the rest of the implementation. >> >> Please consider my comments above. I'm open to suggestions how to do better at >> the moment. Less comfortable with nothing should be done about black level >> until we have a perfect solution. >> >> Thanks, >> Milan >>
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > As this patch may need more work, I propose not including it in v7 to > avoid delay merging the rest of the implementation. I think David and I have provided sufficient additional information to make the decision. I'll keep the patch in v7 but it will be the last patch in the series again so it's easy to either take or omit it. [...] >> +class BlackLevel > > Wouldn't it be better to move to using the libipa Algorithm class before > introducing new algorithms ? I suppose we will need this sooner or later but it will need a non-trivial amount of work to integrate software ISP with this in a meaningful way and to decide how to de-wire exposure, white balance, and possibly black level from the current code, without wasting the precious CPU cycles. Regards, Milan
Hi Milan, On Tue, Apr 02, 2024 at 09:48:34PM +0200, Milan Zamazal wrote: > Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > > > As this patch may need more work, I propose not including it in v7 to > > avoid delay merging the rest of the implementation. > > I think David and I have provided sufficient additional information to make the > decision. I'll keep the patch in v7 but it will be the last patch in the series > again so it's easy to either take or omit it. Thank you. > [...] > > >> +class BlackLevel > > > > Wouldn't it be better to move to using the libipa Algorithm class before > > introducing new algorithms ? > > I suppose we will need this sooner or later but it will need a non-trivial > amount of work to integrate software ISP with this in a meaningful way and to > decide how to de-wire exposure, white balance, and possibly black level from the > current code, without wasting the precious CPU cycles. The CPU-intensive part is the image processing itself, the IPA side should hopefully not be an issue, even if we complexify it a little bit by using the more modular architecture that the Algorithm class provides.
Hello, On Thu, Mar 28, 2024 at 03:47:32PM +0100, Milan Zamazal wrote: > David Plowman writes: > > > Hi everyone > > > > Thanks for cc'ing me on this question. Always happy to try and answer! > > Hi David, > > thank you for your answer and explanations. Likewise, thank you. > > On Thu, 28 Mar 2024 at 10:11, Milan Zamazal wrote: > >> Laurent Pinchart writes: > >> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote: > >> >> Black may not be represented as 0 pixel value for given hardware, it may be > >> >> higher. If this is not compensated then various problems may occur such as low > >> >> contrast or suboptimal exposure. > >> >> > >> >> The black pixel value can be either retrieved from a tuning file for the given > >> >> hardware, or automatically on fly. The former is the right and correct method, > >> > > >> > s/on fly/on the fly/ > >> > > >> >> while the latter can be used when a tuning file is not available for the given > >> >> hardware. Since there is currently no support for tuning files in software ISP, > >> >> the automatic, hardware independent way, is always used. Support for tuning > >> >> files should be added in future but it will require more work than this patch. > >> > > >> > I don't think this is quite right. Strictly speaking, the black level > >> > compensation is about subtracting the black level as produced by the > >> > sensor. This requires tuning, and shouldn't be done on-the-fly. > >> > >> A general agreement from those who tried the patch was that it makes a > >> noticeable improvement. As explained in my comments above, this is a fallback > >> mechanism that we can use cheaply and immediately, until or unless tuning file > >> for software ISP is available. So far, it seems to be beneficial rather than > >> harmful. > >> > >> Unless we have something better or perfect soon, which doesn't seem to be the > >> case to me, why not to use this simple and useful (I believe) fallback > >> mechanism? Do you have a better suggestion what could be done right now > >> regarding this issue (not considering black level at all is a real issue)? > > > > This is of course partly a technical issue - what's the right thing to > > do? - and particularly a pragmatic issue - whether to do something > > imperfect right now or not. > > Yes. > > > To tackle the technical question, black level is normally removed > > fairly early in the pipeline, because not doing so will make typical > > operations in the Bayer pipe go wrong. For example: > > > > * Lens shading (vignetting correction). I don't know whether you have > > this feature currently, but maybe it will be required at some point to > > improve image quality? Doing this without first removing the black > > level will make all the colours go wrong. > > > > * White balance. This too will go wrong if you haven't subtracted the > > black level. White balance is normally applied before demosaic because > > demosaic typically works better with white-balanced pixels. Obviously > > it depends what your demosaic algorithm does, but I wonder if there's > > a future in which you have higher quality versions of some algorithms > > for certain use cases (e.g. stills capture) where this could become an > > issue. > > > > * Colour correction. Like white balance, colour correction matrices > > will not do the right thing if black level has not been removed. > > We don't have lens shading or color correction, once we have support for these > there is no reason not to have proper black level correction as well. As for > white balance, we currently use probably the most primitive method possible > (green/red and green/blue ratios) and we don't subtract black level even with > this patch (which should get fixed later). The patch subtracts black level when > correcting exposure though, which is also influenced by this. Considering all > the imperfections at the moment, the actual effects on white balance and > exposure may not be that big to care very much. But even then ... > > > You can probably guess that I belong more to the "calibrate the camera > > properly" view of the world because doing that fixes problems and > > makes them go away _forever_. But I'm not familiar with the background > > here, so I can appreciate that interim solutions may have a place. > > ... the thing is that not applying any black level correction, even when not > considering white balance and exposure deformations, often makes the image > looking noticeably flat as there are no blacks. I'd say better imperfect blacks > than no blacks, similarly to having imperfect rather than no white balance > correction (yes, comparing apples and oranges technically, but both have their > impacts on the user satisfaction with the output). > > >> > On the other hand, looking at the histogram to try and stretch it is > >> > called contrast adjustment. There's nothing wrong in doing so, but it's > >> > usually done towards the output of the processing pipeline, and is > >> > associated with manual adjustments of the contrast and brightness. See > >> > src/ipa/rpi/controller/rpi/contrast.cpp for instance. > >> > >> Contrast adjustment is a different issue than the one this patch tries to > >> address and less important now. I plan to work on it and similar functionality > >> later. > >> > >> > David may be able to comment further as to why BLC and contrast are > >> > quite different. > > > > We treat contrast adjustment as a more cosmetic feature of an image > > which users can adjust according to their own personal taste. They can > > have more of it or less of it, as they wish, and it doesn't really > > affect the other aspects of image, such as colours or image quality. > > Yes, as for contrast, we quickly rejected the idea of handling it automatically > and decided to implement the proper user adjustable settings later. > > > Black level is quite different. There's a known value that you should > > use. There's basically no choice. Speaking for the Pi, I would be > > highly reluctant to couple these two things together - you certainly > > wouldn't want contrast adjustment to start changing the colours, for > > example! > > Exactly. This is why I'm advocating for still considering the patch rather than > dropping it. At least in my setup the auto-determined black value doesn't seem > to be way off and serves its purpose even without access to exact tuning data. As for other part of this series, I'm fine starting with this initial implementation and improving it, but I think the black level should eventually be moved before debayering, and ideally the colour gains as well. I understand the need for optimizations to lower the CPU consumption, but at the same time I don't feel comfortable building up on top of an implementation that may work a bit more by chance than by correctness, as that's not very maintainable. > > Anyway, I hope that provides a little bit of a comparison, at least > > from the point of view of a different implementation. Good luck! > > > >> I think I understand the difference but feel free to comment if you think I'm > >> missing something. > >> > >> > As this patch may need more work, I propose not including it in v7 to > >> > avoid delay merging the rest of the implementation. > >> > >> Please consider my comments above. I'm open to suggestions how to do better at > >> the moment. Less comfortable with nothing should be done about black level > >> until we have a perfect solution.
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > Hello, > > On Thu, Mar 28, 2024 at 03:47:32PM +0100, Milan Zamazal wrote: >> David Plowman writes: >> >> > Hi everyone >> > >> > Thanks for cc'ing me on this question. Always happy to try and answer! >> >> Hi David, >> >> thank you for your answer and explanations. > > Likewise, thank you. > >> > On Thu, 28 Mar 2024 at 10:11, Milan Zamazal wrote: >> >> Laurent Pinchart writes: >> >> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote: >> >> >> Black may not be represented as 0 pixel value for given hardware, it may be >> >> >> higher. If this is not compensated then various problems may occur such as low >> >> >> contrast or suboptimal exposure. >> >> >> >> >> >> The black pixel value can be either retrieved from a tuning file for the given >> >> >> hardware, or automatically on fly. The former is the right and correct method, >> >> > >> >> > s/on fly/on the fly/ >> >> > >> >> >> while the latter can be used when a tuning file is not available for the given >> >> >> hardware. Since there is currently no support for tuning files in software ISP, >> >> >> the automatic, hardware independent way, is always used. Support for tuning >> >> >> files should be added in future but it will require more work than this patch. >> >> > >> >> > I don't think this is quite right. Strictly speaking, the black level >> >> > compensation is about subtracting the black level as produced by the >> >> > sensor. This requires tuning, and shouldn't be done on-the-fly. >> >> >> >> A general agreement from those who tried the patch was that it makes a >> >> noticeable improvement. As explained in my comments above, this is a fallback >> >> mechanism that we can use cheaply and immediately, until or unless tuning file >> >> for software ISP is available. So far, it seems to be beneficial rather than >> >> harmful. >> >> >> >> Unless we have something better or perfect soon, which doesn't seem to be the >> >> case to me, why not to use this simple and useful (I believe) fallback >> >> mechanism? Do you have a better suggestion what could be done right now >> >> regarding this issue (not considering black level at all is a real issue)? >> > >> > This is of course partly a technical issue - what's the right thing to >> > do? - and particularly a pragmatic issue - whether to do something >> > imperfect right now or not. >> >> Yes. >> >> > To tackle the technical question, black level is normally removed >> > fairly early in the pipeline, because not doing so will make typical >> > operations in the Bayer pipe go wrong. For example: >> > >> > * Lens shading (vignetting correction). I don't know whether you have >> > this feature currently, but maybe it will be required at some point to >> > improve image quality? Doing this without first removing the black >> > level will make all the colours go wrong. >> > >> > * White balance. This too will go wrong if you haven't subtracted the >> > black level. White balance is normally applied before demosaic because >> > demosaic typically works better with white-balanced pixels. Obviously >> > it depends what your demosaic algorithm does, but I wonder if there's >> > a future in which you have higher quality versions of some algorithms >> > for certain use cases (e.g. stills capture) where this could become an >> > issue. >> > >> > * Colour correction. Like white balance, colour correction matrices >> > will not do the right thing if black level has not been removed. >> >> We don't have lens shading or color correction, once we have support for these >> there is no reason not to have proper black level correction as well. As for >> white balance, we currently use probably the most primitive method possible >> (green/red and green/blue ratios) and we don't subtract black level even with >> this patch (which should get fixed later). The patch subtracts black level when >> correcting exposure though, which is also influenced by this. Considering all >> the imperfections at the moment, the actual effects on white balance and >> exposure may not be that big to care very much. But even then ... >> >> > You can probably guess that I belong more to the "calibrate the camera >> > properly" view of the world because doing that fixes problems and >> > makes them go away _forever_. But I'm not familiar with the background >> > here, so I can appreciate that interim solutions may have a place. >> >> ... the thing is that not applying any black level correction, even when not >> considering white balance and exposure deformations, often makes the image >> looking noticeably flat as there are no blacks. I'd say better imperfect blacks >> than no blacks, similarly to having imperfect rather than no white balance >> correction (yes, comparing apples and oranges technically, but both have their >> impacts on the user satisfaction with the output). >> >> >> > On the other hand, looking at the histogram to try and stretch it is >> >> > called contrast adjustment. There's nothing wrong in doing so, but it's >> >> > usually done towards the output of the processing pipeline, and is >> >> > associated with manual adjustments of the contrast and brightness. See >> >> > src/ipa/rpi/controller/rpi/contrast.cpp for instance. >> >> >> >> Contrast adjustment is a different issue than the one this patch tries to >> >> address and less important now. I plan to work on it and similar functionality >> >> later. >> >> >> >> > David may be able to comment further as to why BLC and contrast are >> >> > quite different. >> > >> > We treat contrast adjustment as a more cosmetic feature of an image >> > which users can adjust according to their own personal taste. They can >> > have more of it or less of it, as they wish, and it doesn't really >> > affect the other aspects of image, such as colours or image quality. >> >> Yes, as for contrast, we quickly rejected the idea of handling it automatically >> and decided to implement the proper user adjustable settings later. >> >> > Black level is quite different. There's a known value that you should >> > use. There's basically no choice. Speaking for the Pi, I would be >> > highly reluctant to couple these two things together - you certainly >> > wouldn't want contrast adjustment to start changing the colours, for >> > example! >> >> Exactly. This is why I'm advocating for still considering the patch rather than >> dropping it. At least in my setup the auto-determined black value doesn't seem >> to be way off and serves its purpose even without access to exact tuning data. > > As for other part of this series, I'm fine starting with this initial > implementation and improving it, but I think the black level should > eventually be moved before debayering, and ideally the colour gains as > well. I understand the need for optimizations to lower the CPU > consumption, but at the same time I don't feel comfortable building up > on top of an implementation that may work a bit more by chance than by > correctness, as that's not very maintainable. I agree with this approach. I added your comment to the TODO file. >> > Anyway, I hope that provides a little bit of a comparison, at least >> > from the point of view of a different implementation. Good luck! >> > >> >> I think I understand the difference but feel free to comment if you think I'm >> >> missing something. >> >> >> >> > As this patch may need more work, I propose not including it in v7 to >> >> > avoid delay merging the rest of the implementation. >> >> >> >> Please consider my comments above. I'm open to suggestions how to do better at >> >> the moment. Less comfortable with nothing should be done about black level >> >> until we have a perfect solution.
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > As for other part of this series, I'm fine starting with this initial > implementation and improving it, but I think the black level should > eventually be moved before debayering, and ideally the colour gains as > well. I understand the need for optimizations to lower the CPU > consumption, but at the same time I don't feel comfortable building up > on top of an implementation that may work a bit more by chance than by > correctness, as that's not very maintainable. Hi Laurent, to make sure we understand each other correctly: Can we agree that "be moved before debayering" above means primarily separating the level adjustments computations from debayering code and building them (together with other stuff) on top of libipa/algorithm.h? Rather than first applying the corrections on the data and only then perform debayering, which luxury we cannot afford, at least on CPU? I.e. the first step could be something like https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ? Regards, Milan
Hi Milan, On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote: > Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > > > As for other part of this series, I'm fine starting with this initial > > implementation and improving it, but I think the black level should > > eventually be moved before debayering, and ideally the colour gains as > > well. I understand the need for optimizations to lower the CPU > > consumption, but at the same time I don't feel comfortable building up > > on top of an implementation that may work a bit more by chance than by > > correctness, as that's not very maintainable. > > Hi Laurent, > > to make sure we understand each other correctly: Can we agree that "be moved > before debayering" above means primarily separating the level adjustments > computations from debayering code and building them (together with other stuff) > on top of libipa/algorithm.h? Rather than first applying the corrections on the > data and only then perform debayering, which luxury we cannot afford, at least > on CPU? I.e. the first step could be something like > https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ? What I meant is subtracting the black level before applying the colour gains and performing the CFA interpolation (a.k.a. debayering). I'm fine with the black level subtraction and CFA interpolation being performed in a single pass by a single function for performance reason, but the order needs to be swapped.
Hi, On 4/20/24 12:42 PM, Laurent Pinchart wrote: > Hi Milan, > > On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote: >> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: >> >>> As for other part of this series, I'm fine starting with this initial >>> implementation and improving it, but I think the black level should >>> eventually be moved before debayering, and ideally the colour gains as >>> well. I understand the need for optimizations to lower the CPU >>> consumption, but at the same time I don't feel comfortable building up >>> on top of an implementation that may work a bit more by chance than by >>> correctness, as that's not very maintainable. >> >> Hi Laurent, >> >> to make sure we understand each other correctly: Can we agree that "be moved >> before debayering" above means primarily separating the level adjustments >> computations from debayering code and building them (together with other stuff) >> on top of libipa/algorithm.h? Rather than first applying the corrections on the >> data and only then perform debayering, which luxury we cannot afford, at least >> on CPU? I.e. the first step could be something like >> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ? > > What I meant is subtracting the black level before applying the colour > gains and performing the CFA interpolation (a.k.a. debayering). I'm fine > with the black level subtraction and CFA interpolation being performed > in a single pass by a single function for performance reason, but the > order needs to be swapped. If we apply the per color lookup table (which does black-level compensation + color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to do this on some extra copy before debayering or we need to do it multiple times per pixel before averaging the surrounding pixels, neither one of which is ideal. I guess we could make the memcpy to temp buffer behavior done to deal with uncached mem mandatory (so always do this) and then first apply the rgb lookups to a just copied raw line before debayering, then we still only need to do this once per color. And this could also help wrt cache behavior of the lookups since we then linearly travel through 1 line touching only the 1 line (in temp buffer) + read 2 lookup tables. So then there will be less cache contention then when using the lookups during debayering when were are reading 3 lines + writing 1 line + accessing all 3 lookup tables (so I guess this might even speed things up especially on the imx8). I don't expect this to make much difference for the resulting image though, OTOH it will make some difference and testing has shown that the overhead of using the extra memcpy buffers in cases where this is not necessary is small. So I guess we could do this. Milan can you extract what I have in mind from the above and is this something you can work on, or shall I take a shot at this when I can schedule some time for this? Regards, Hans
Hi Hans, On Sun, Apr 21, 2024 at 02:16:06PM +0200, Hans de Goede wrote: > On 4/20/24 12:42 PM, Laurent Pinchart wrote: > > On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote: > >> Laurent Pinchart writes: > >> > >>> As for other part of this series, I'm fine starting with this initial > >>> implementation and improving it, but I think the black level should > >>> eventually be moved before debayering, and ideally the colour gains as > >>> well. I understand the need for optimizations to lower the CPU > >>> consumption, but at the same time I don't feel comfortable building up > >>> on top of an implementation that may work a bit more by chance than by > >>> correctness, as that's not very maintainable. > >> > >> Hi Laurent, > >> > >> to make sure we understand each other correctly: Can we agree that "be moved > >> before debayering" above means primarily separating the level adjustments > >> computations from debayering code and building them (together with other stuff) > >> on top of libipa/algorithm.h? Rather than first applying the corrections on the > >> data and only then perform debayering, which luxury we cannot afford, at least > >> on CPU? I.e. the first step could be something like > >> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ? > > > > What I meant is subtracting the black level before applying the colour > > gains and performing the CFA interpolation (a.k.a. debayering). I'm fine > > with the black level subtraction and CFA interpolation being performed > > in a single pass by a single function for performance reason, but the > > order needs to be swapped. > > If we apply the per color lookup table (which does black-level compensation + > color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to > do this on some extra copy before debayering or we need to do it multiple times > per pixel before averaging the surrounding pixels, neither one of which is ideal. It's not the full lookup table that needs to be applied before debayering, it's only the black level subtraction and the colour gains. The gamma curve needs to go after debayering, and additional histogram stretching could go there too. There's also a need to add a CCM matrix to the pipeline, multiplying the RGB values obtained by debayering by a 3x3 matrix before applying the gamma curve. With that I think we would have a reasonable image quality already. Additional processing, such as lens shading correction, defective pixel correction and denoising, are out of scope for CPU processing in my opinion. They could be implemented with a GPU soft ISP though. > I guess we could make the memcpy to temp buffer behavior done to deal with > uncached mem mandatory (so always do this) and then first apply the rgb lookups > to a just copied raw line before debayering, then we still only need to do > this once per color. And this could also help wrt cache behavior of the > lookups since we then linearly travel through 1 line touching only the > 1 line (in temp buffer) + read 2 lookup tables. So then there will be less > cache contention then when using the lookups during debayering when > were are reading 3 lines + writing 1 line + accessing all 3 lookup tables > (so I guess this might even speed things up especially on the imx8). > > I don't expect this to make much difference for the resulting image though, > OTOH it will make some difference and testing has shown that the overhead > of using the extra memcpy buffers in cases where this is not necessary is > small. So I guess we could do this. > > Milan can you extract what I have in mind from the above and is this > something you can work on, or shall I take a shot at this when I can schedule > some time for this?
Hi Laurent, On 4/21/24 2:22 PM, Laurent Pinchart wrote: > Hi Hans, > > On Sun, Apr 21, 2024 at 02:16:06PM +0200, Hans de Goede wrote: >> On 4/20/24 12:42 PM, Laurent Pinchart wrote: >>> On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote: >>>> Laurent Pinchart writes: >>>> >>>>> As for other part of this series, I'm fine starting with this initial >>>>> implementation and improving it, but I think the black level should >>>>> eventually be moved before debayering, and ideally the colour gains as >>>>> well. I understand the need for optimizations to lower the CPU >>>>> consumption, but at the same time I don't feel comfortable building up >>>>> on top of an implementation that may work a bit more by chance than by >>>>> correctness, as that's not very maintainable. >>>> >>>> Hi Laurent, >>>> >>>> to make sure we understand each other correctly: Can we agree that "be moved >>>> before debayering" above means primarily separating the level adjustments >>>> computations from debayering code and building them (together with other stuff) >>>> on top of libipa/algorithm.h? Rather than first applying the corrections on the >>>> data and only then perform debayering, which luxury we cannot afford, at least >>>> on CPU? I.e. the first step could be something like >>>> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ? >>> >>> What I meant is subtracting the black level before applying the colour >>> gains and performing the CFA interpolation (a.k.a. debayering). I'm fine >>> with the black level subtraction and CFA interpolation being performed >>> in a single pass by a single function for performance reason, but the >>> order needs to be swapped. >> >> If we apply the per color lookup table (which does black-level compensation + >> color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to >> do this on some extra copy before debayering or we need to do it multiple times >> per pixel before averaging the surrounding pixels, neither one of which is ideal. > > It's not the full lookup table that needs to be applied before > debayering, it's only the black level subtraction and the colour gains. The lookup table also gives us clamping for free. If we do actual substract + multiply we need to clamp and that means adding 2 conditional jumps per pixel and testing has shown that that really really hurts on modern processors. > The gamma curve needs to go after debayering, and additional histogram > stretching could go there too. > > There's also a need to add a CCM matrix to the pipeline, multiplying the > RGB values obtained by debayering by a 3x3 matrix before applying the > gamma curve. I have looked into CCM already, but IMHO that is too heavy to do on the CPU. We might get away with that at the big Intel Cores, but at a significant CPU usage and battery cost. As for gamma correction I think that we need to live with that being applied before debayering too (our debayering is a straight averaging of surrounding pixels, so the impact of doing this before / after should be small). > With that I think we would have a reasonable image quality already. I agree that a CCM is the biggest missing item. But I think we can still get reasonable quality without that and the CPU SoftwareISP really should be a fallback for when all else fails. I would rather see us focus on working on a GPU based SoftwareISP and then we can add CCM and proper post debay gamma correction there. > Additional processing, such as lens shading correction, defective pixel > correction and denoising, are out of scope for CPU processing in my > opinion. They could be implemented with a GPU soft ISP though. Ack. Regards, Hans
Hi Hans, On Sun, Apr 21, 2024 at 04:47:30PM +0200, Hans de Goede wrote: > On 4/21/24 2:22 PM, Laurent Pinchart wrote: > > On Sun, Apr 21, 2024 at 02:16:06PM +0200, Hans de Goede wrote: > >> On 4/20/24 12:42 PM, Laurent Pinchart wrote: > >>> On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote: > >>>> Laurent Pinchart writes: > >>>> > >>>>> As for other part of this series, I'm fine starting with this initial > >>>>> implementation and improving it, but I think the black level should > >>>>> eventually be moved before debayering, and ideally the colour gains as > >>>>> well. I understand the need for optimizations to lower the CPU > >>>>> consumption, but at the same time I don't feel comfortable building up > >>>>> on top of an implementation that may work a bit more by chance than by > >>>>> correctness, as that's not very maintainable. > >>>> > >>>> Hi Laurent, > >>>> > >>>> to make sure we understand each other correctly: Can we agree that "be moved > >>>> before debayering" above means primarily separating the level adjustments > >>>> computations from debayering code and building them (together with other stuff) > >>>> on top of libipa/algorithm.h? Rather than first applying the corrections on the > >>>> data and only then perform debayering, which luxury we cannot afford, at least > >>>> on CPU? I.e. the first step could be something like > >>>> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ? > >>> > >>> What I meant is subtracting the black level before applying the colour > >>> gains and performing the CFA interpolation (a.k.a. debayering). I'm fine > >>> with the black level subtraction and CFA interpolation being performed > >>> in a single pass by a single function for performance reason, but the > >>> order needs to be swapped. > >> > >> If we apply the per color lookup table (which does black-level compensation + > >> color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to > >> do this on some extra copy before debayering or we need to do it multiple times > >> per pixel before averaging the surrounding pixels, neither one of which is ideal. > > > > It's not the full lookup table that needs to be applied before > > debayering, it's only the black level subtraction and the colour gains. > > The lookup table also gives us clamping for free. If we do actual substract > + multiply we need to clamp and that means adding 2 conditional jumps > per pixel and testing has shown that that really really hurts on modern > processors. Could this be implemented with a LUT separate from the gamma LUT ? > > The gamma curve needs to go after debayering, and additional histogram > > stretching could go there too. > > > > There's also a need to add a CCM matrix to the pipeline, multiplying the > > RGB values obtained by debayering by a 3x3 matrix before applying the > > gamma curve. > > I have looked into CCM already, but IMHO that is too heavy to do on > the CPU. We might get away with that at the big Intel Cores, but at > a significant CPU usage and battery cost. Would SIMD help ? > As for gamma correction I think that we need to live with that being applied > before debayering too (our debayering is a straight averaging of surrounding > pixels, so the impact of doing this before / after should be small). As long as we have no CCM that may be an option. > > With that I think we would have a reasonable image quality already. > > I agree that a CCM is the biggest missing item. But I think we can still > get reasonable quality without that and the CPU SoftwareISP really should > be a fallback for when all else fails. I think the CPU-based ISP is also a useful platform to experiment with ISP algorithms, even if it consumes more CPU. When we'll have GPU offload support, when do you expect we'll fall back to the CPU ? > I would rather see us focus on working on a GPU based SoftwareISP and then > we can add CCM and proper post debay gamma correction there. I would certainly welcome GPU support :-) > > Additional processing, such as lens shading correction, defective pixel > > correction and denoising, are out of scope for CPU processing in my > > opinion. They could be implemented with a GPU soft ISP though. > > Ack.
Hi! > > > The gamma curve needs to go after debayering, and additional histogram > > > stretching could go there too. > > > > > > There's also a need to add a CCM matrix to the pipeline, multiplying the > > > RGB values obtained by debayering by a 3x3 matrix before applying the > > > gamma curve. > > > > I have looked into CCM already, but IMHO that is too heavy to do on > > the CPU. We might get away with that at the big Intel Cores, but at > > a significant CPU usage and battery cost. > > Would SIMD help ? I tried to do debayering using SIMD, and good news is that gcc now has support so playing with SIMD is easy. Bad news was I was not able to get speedups... Neither on x86-64 nor on arm64. (But for matrix multiplication, it might make sense... not sure). Anyway SIMD is _not_ easy. Bad news with GPU is that there's quite some overhead with moving data to and from GPU on my platforms. Best regards, Pavel
Hi! > If we apply the per color lookup table (which does black-level compensation + > color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to > do this on some extra copy before debayering or we need to do it multiple times > per pixel before averaging the surrounding pixels, neither one of which is ideal. > > I guess we could make the memcpy to temp buffer behavior done to deal with > uncached mem mandatory (so always do this) and then first apply the rgb lookups > to a just copied raw line before debayering, then we still only need to do > this once per color. And this could also help wrt cache behavior of > the I believe that doing extra memcpy will hurt (but black-level during memcpy will be free) OTOH doing the pass at convenient time when data is already in L1 cache should be free, too. Anyway, point to consider. If we are talking about correcting gammas etc, we'll probably need to switch processing to 16-bit values. Those sensors are pretty non-linear... https://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/ Anyway there's ton of work, and it would be good to get the basics merged, first. For example, for digital photography, it is important to do statistics and AAA algorithms, but not debayer... Best regards, Pavel
Hi, thank you all for your comments and clarifications. Let me summarize the discussion, with my notes merged in, before I get lost in it: - Since our current black level and color gain operations are linear, we average only the same color pixels and the lookup table is applied after averaging the pixels, the current debayering implementation is correct. It doesn't matter whether we do "average pixels -> subtract black -> multiply by color gain -> apply gamma" or "subtract black -> multiply by color gain -> average pixels -> apply gamma". - This may no longer hold if we change the operations, for example: - The optimization suggested by Hans (preprocessing while copying lines from the input buffer). It may or may not have a significant impact on the performance and may or may not have a significant impact on the image quality. I think we need some experiments here. Performance on the CPU is generally challenging, so if we can demonstrate a performance benefit, we should consider including it (most likely configurable), if it doesn't turn the implementation into a complete mess. - Or if we add support for non-linear gains, as mentioned by Pavel. - Laurent suggested adding a CCM matrix between debayering and gamma to achieve a reasonable image quality. According to Hans, this is a heavy operation on CPUs. Laurent still sees CPU processing as useful for experiments. I think we can make this simply configurable (if it gets implemented). - If we split the processing to pre-bayer and post-bayer parts, we should probably work with uint16_t or float's, which may have impact on performance. - Pavel couldn't get a better performance by using SIMD CPU instructions for debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on CPU is hard to use and may differ on architectures, so the question is whether it's worth to invest into it. - GPU processing should make many of these things easier and more things possible. - Do we already know how to map the buffers to GPU efficiently? My conclusions are: - The current CPU debayering computation itself is fine at the moment and we shouldn't change it without a good reason. It can be replaced in future if needed, once we have a better understanding of what and how we can realistically achieve. General cleanups, like moving table computations elsewhere, would be still useful already now. - We can experiment with whatever mentioned above to get the better understanding, but: - GPU processing is clearly a priority. - We have also other items in the TODO file! (I already work on some of them.) - We should probably change the e-mail thread subject. Thanks, Milan
Hi, On 4/22/24 1:24 PM, Milan Zamazal wrote: > Hi, > > thank you all for your comments and clarifications. Let me summarize the > discussion, with my notes merged in, before I get lost in it: > > - Since our current black level and color gain operations are linear, we average > only the same color pixels and the lookup table is applied after averaging the > pixels, the current debayering implementation is correct. It doesn't matter > whether we do "average pixels -> subtract black -> multiply by color gain -> > apply gamma" or "subtract black -> multiply by color gain -> average pixels -> > apply gamma". This is not entirely true, because of the clamping involved in black-level subtraction (and the same for clamping at the top for gains), e.g. Lets say that for the red component we average 2 pixels with a blacklevel compensation of 8 for all colors and the value of the 2 pixels is: 6 and 10, then if we average first and then do blacklevel compensation the result of blacklevel correction is 2 pixels with a value of 0 and 2, which averages to 1. Where as if we first average we get an average of 8 and then after blacklevel compensation we end up with 0. > > - This may no longer hold if we change the operations, for example: > > - The optimization suggested by Hans (preprocessing while copying lines from the > input buffer). It may or may not have a significant impact on the performance > and may or may not have a significant impact on the image quality. I think we > need some experiments here. Performance on the CPU is generally challenging, > so if we can demonstrate a performance benefit, we should consider including > it (most likely configurable), if it doesn't turn the implementation into a > complete mess. I think that if we want to move to applying blacklevel + gain before debayering that we then should do this unconditionally, which means also unconditionally doing the memcpy to a temporary line buffer. > > - Or if we add support for non-linear gains, as mentioned by Pavel. > > - Laurent suggested adding a CCM matrix between debayering and gamma to achieve > a reasonable image quality. According to Hans, this is a heavy operation on > CPUs. Laurent still sees CPU processing as useful for experiments. I think > we can make this simply configurable (if it gets implemented). I agree that adding a CCM matrix as an optional step sounds interesting and this should indeed be configurable. This can be done as an optional (gaurded by a single if per line) post-debayer step run on the output buffer. > - If we split the processing to pre-bayer and post-bayer parts, we should > probably work with uint16_t or float's, which may have impact on performance. > > - Pavel couldn't get a better performance by using SIMD CPU instructions for > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on > CPU is hard to use and may differ on architectures, so the question is whether > it's worth to invest into it. > > - GPU processing should make many of these things easier and more things > possible. > > - Do we already know how to map the buffers to GPU efficiently? > > My conclusions are: > > - The current CPU debayering computation itself is fine at the moment and we > shouldn't change it without a good reason. It can be replaced in future if > needed, once we have a better understanding of what and how we can > realistically achieve. General cleanups, like moving table computations > elsewhere, would be still useful already now. > > - We can experiment with whatever mentioned above to get the better > understanding, but: > > - GPU processing is clearly a priority. > > - We have also other items in the TODO file! (I already work on some of them.) > > - We should probably change the e-mail thread subject. I agree with all of your conclusions :) Regards, Hans
Hello, On Mon, Apr 22, 2024 at 01:38:58PM +0200, Hans de Goede wrote: > On 4/22/24 1:24 PM, Milan Zamazal wrote: > > Hi, > > > > thank you all for your comments and clarifications. Let me summarize the > > discussion, with my notes merged in, before I get lost in it: > > > > - Since our current black level and color gain operations are linear, we average > > only the same color pixels and the lookup table is applied after averaging the > > pixels, the current debayering implementation is correct. It doesn't matter > > whether we do "average pixels -> subtract black -> multiply by color gain -> > > apply gamma" or "subtract black -> multiply by color gain -> average pixels -> > > apply gamma". > > This is not entirely true, because of the clamping involved in black-level > subtraction (and the same for clamping at the top for gains), e.g. > > Lets say that for the red component we average 2 pixels with a blacklevel > compensation of 8 for all colors and the value of the 2 pixels is: > 6 and 10, then if we average first and then do blacklevel compensation the > result of blacklevel correction is 2 pixels with a value of 0 and 2, > which averages to 1. Where as if we first average we get an average of 8 > and then after blacklevel compensation we end up with 0. > > > - This may no longer hold if we change the operations, for example: > > > > - The optimization suggested by Hans (preprocessing while copying lines from the > > input buffer). It may or may not have a significant impact on the performance > > and may or may not have a significant impact on the image quality. I think we > > need some experiments here. Performance on the CPU is generally challenging, > > so if we can demonstrate a performance benefit, we should consider including > > it (most likely configurable), if it doesn't turn the implementation into a > > complete mess. > > I think that if we want to move to applying blacklevel + gain before > debayering that we then should do this unconditionally, which means also > unconditionally doing the memcpy to a temporary line buffer. If we can move it before debayering with acceptable performance, then I wouldn't make it configurable. > > - Or if we add support for non-linear gains, as mentioned by Pavel. Sensors are mostly linear these days. That is, until they reach saturation of course. I don't think the soft ISP needs a linearization LUT. > > - Laurent suggested adding a CCM matrix between debayering and gamma to achieve > > a reasonable image quality. According to Hans, this is a heavy operation on > > CPUs. Laurent still sees CPU processing as useful for experiments. I think > > we can make this simply configurable (if it gets implemented). > > I agree that adding a CCM matrix as an optional step sounds interesting > and this should indeed be configurable. This can be done as an optional > (gaurded by a single if per line) post-debayer step run on the output buffer. Given that the debayering currently produces an RGB value, do you think an additional multiplication by a 3x3 matrix (ideally accelerated by SIMD) would be that costly ? > > - If we split the processing to pre-bayer and post-bayer parts, we should > > probably work with uint16_t or float's, which may have impact on performance. > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for > > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on > > CPU is hard to use and may differ on architectures, so the question is whether > > it's worth to invest into it. Good question :-) > > - GPU processing should make many of these things easier and more things > > possible. > > > > - Do we already know how to map the buffers to GPU efficiently? I haven't looked into it yet personally. > > My conclusions are: > > > > - The current CPU debayering computation itself is fine at the moment and we > > shouldn't change it without a good reason. It can be replaced in future if > > needed, once we have a better understanding of what and how we can > > realistically achieve. General cleanups, like moving table computations > > elsewhere, would be still useful already now. > > > > - We can experiment with whatever mentioned above to get the better > > understanding, but: > > > > - GPU processing is clearly a priority. > > > > - We have also other items in the TODO file! (I already work on some of them.) On this topic, I'm having a look at how we could use the soft ISP with the vimc pipeline handler. > > - We should probably change the e-mail thread subject. Done :-) > I agree with all of your conclusions :)
Hi, On 4/22/24 2:12 PM, Laurent Pinchart wrote: > Hello, > > On Mon, Apr 22, 2024 at 01:38:58PM +0200, Hans de Goede wrote: >> On 4/22/24 1:24 PM, Milan Zamazal wrote: >>> Hi, >>> >>> thank you all for your comments and clarifications. Let me summarize the >>> discussion, with my notes merged in, before I get lost in it: >>> >>> - Since our current black level and color gain operations are linear, we average >>> only the same color pixels and the lookup table is applied after averaging the >>> pixels, the current debayering implementation is correct. It doesn't matter >>> whether we do "average pixels -> subtract black -> multiply by color gain -> >>> apply gamma" or "subtract black -> multiply by color gain -> average pixels -> >>> apply gamma". >> >> This is not entirely true, because of the clamping involved in black-level >> subtraction (and the same for clamping at the top for gains), e.g. >> >> Lets say that for the red component we average 2 pixels with a blacklevel >> compensation of 8 for all colors and the value of the 2 pixels is: >> 6 and 10, then if we average first and then do blacklevel compensation the >> result of blacklevel correction is 2 pixels with a value of 0 and 2, >> which averages to 1. Where as if we first average we get an average of 8 >> and then after blacklevel compensation we end up with 0. >> >>> - This may no longer hold if we change the operations, for example: >>> >>> - The optimization suggested by Hans (preprocessing while copying lines from the >>> input buffer). It may or may not have a significant impact on the performance >>> and may or may not have a significant impact on the image quality. I think we >>> need some experiments here. Performance on the CPU is generally challenging, >>> so if we can demonstrate a performance benefit, we should consider including >>> it (most likely configurable), if it doesn't turn the implementation into a >>> complete mess. >> >> I think that if we want to move to applying blacklevel + gain before >> debayering that we then should do this unconditionally, which means also >> unconditionally doing the memcpy to a temporary line buffer. > > If we can move it before debayering with acceptable performance, then I > wouldn't make it configurable. > >>> - Or if we add support for non-linear gains, as mentioned by Pavel. > > Sensors are mostly linear these days. That is, until they reach > saturation of course. I don't think the soft ISP needs a linearization > LUT. > >>> - Laurent suggested adding a CCM matrix between debayering and gamma to achieve >>> a reasonable image quality. According to Hans, this is a heavy operation on >>> CPUs. Laurent still sees CPU processing as useful for experiments. I think >>> we can make this simply configurable (if it gets implemented). >> >> I agree that adding a CCM matrix as an optional step sounds interesting >> and this should indeed be configurable. This can be done as an optional >> (gaurded by a single if per line) post-debayer step run on the output buffer. > > Given that the debayering currently produces an RGB value, do you think > an additional multiplication by a 3x3 matrix (ideally accelerated by > SIMD) would be that costly ? Currently we do max 4 additions + 1 divide per pixel. this adds * 9 * multiplications + 6 adds per pixel. With FHD at 30 FPS that is ~560 million multiplications per second! And many sensors do way higher resolutions/fps. Maybe with SIMD we can do some of these in parallel and it certainly is interesting to try at least on some beefy x86 laptops (or upcoming beefy ARM laptops) but this is significantly going to hurt performance. I'm not saying this is not interesting to experiment with, but I have doubts this is something which we will enable by default anywhere. Regards, Hans > >>> - If we split the processing to pre-bayer and post-bayer parts, we should >>> probably work with uint16_t or float's, which may have impact on performance. >>> >>> - Pavel couldn't get a better performance by using SIMD CPU instructions for >>> debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on >>> CPU is hard to use and may differ on architectures, so the question is whether >>> it's worth to invest into it. > > Good question :-) > >>> - GPU processing should make many of these things easier and more things >>> possible. >>> >>> - Do we already know how to map the buffers to GPU efficiently? > > I haven't looked into it yet personally. > >>> My conclusions are: >>> >>> - The current CPU debayering computation itself is fine at the moment and we >>> shouldn't change it without a good reason. It can be replaced in future if >>> needed, once we have a better understanding of what and how we can >>> realistically achieve. General cleanups, like moving table computations >>> elsewhere, would be still useful already now. >>> >>> - We can experiment with whatever mentioned above to get the better >>> understanding, but: >>> >>> - GPU processing is clearly a priority. >>> >>> - We have also other items in the TODO file! (I already work on some of them.) > > On this topic, I'm having a look at how we could use the soft ISP with > the vimc pipeline handler. > >>> - We should probably change the e-mail thread subject. > > Done :-) > >> I agree with all of your conclusions :) >
Hi! > > > - Or if we add support for non-linear gains, as mentioned by Pavel. > > Sensors are mostly linear these days. That is, until they reach > saturation of course. I don't think the soft ISP needs a linearization > LUT. That's not what Martijn measured: https://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/ Librem 5 is reasonably modern. Basically it is linear in 0..70% range, and then it is linear again in 90%..100% range, but at different slope. Take a look, it is fun article. > > > - If we split the processing to pre-bayer and post-bayer parts, we should > > > probably work with uint16_t or float's, which may have impact on performance. > > > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for > > > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on > > > CPU is hard to use and may differ on architectures, so the question is whether > > > it's worth to invest into it. > > Good question :-) Oh, so good news is you write SIMD code once with gcc intristics, and gcc does its magic. You don't have to know assembly for that, but it certainly helps to look at the assembly if it looks reasonable. (Debayering needs too much of byte shuffling to work well with SIMD extensions. Matrix multiply could be better, but I'm not sure if 3x3 matrix is big enough to get advantage). Best regards, Pavel
On Mon, Apr 22, 2024 at 09:28:08PM +0200, Pavel Machek wrote: > Hi! > > > > > - Or if we add support for non-linear gains, as mentioned by Pavel. > > > > Sensors are mostly linear these days. That is, until they reach > > saturation of course. I don't think the soft ISP needs a linearization > > LUT. > > That's not what Martijn measured: > https://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/ > Librem 5 is reasonably modern. > > Basically it is linear in 0..70% range, and then it is linear again in > 90%..100% range, but at different slope. Take a look, it is fun article. I did, and I don't reach the same conclusion. Obviously sensors saturate, so close to saturation there will be non linearities. This is not something that really requires compensation as far as I understand. What linearization LUTs are meant to compensate for, unless I'm mistaken, is non-linearities in the linear part of the dynamic range. On a side note, I wonder if lens shading could play a role in shape of the curve in the above article. Central pixels will saturate quicker than pixels closer to the corners. It would be interesting to make the same measurements taking only the central part of the image into account. Another interesting experiment would be to measure the sensor linearity without the lens (but that will be difficult in many cases without destroying the camera). > > > > - If we split the processing to pre-bayer and post-bayer parts, we should > > > > probably work with uint16_t or float's, which may have impact on performance. > > > > > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for > > > > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on > > > > CPU is hard to use and may differ on architectures, so the question is whether > > > > it's worth to invest into it. > > > > Good question :-) > > Oh, so good news is you write SIMD code once with gcc intristics, and > gcc does its magic. You don't have to know assembly for that, but it > certainly helps to look at the assembly if it looks reasonable. There are also potentially interesting helper libraries such as https://github.com/vectorclass/version2 (I haven't checked the license compatibility). > (Debayering needs too much of byte shuffling to work well with SIMD > extensions. Matrix multiply could be better, but I'm not sure if 3x3 > matrix is big enough to get advantage).
Hi! > > > > > - If we split the processing to pre-bayer and post-bayer parts, we should > > > > > probably work with uint16_t or float's, which may have impact on performance. > > > > > > > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for > > > > > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on > > > > > CPU is hard to use and may differ on architectures, so the question is whether > > > > > it's worth to invest into it. > > > > > > Good question :-) > > > > Oh, so good news is you write SIMD code once with gcc intristics, and > > gcc does its magic. You don't have to know assembly for that, but it > > certainly helps to look at the assembly if it looks reasonable. > > There are also potentially interesting helper libraries such as > https://github.com/vectorclass/version2 (I haven't checked the license > compatibility). Ok, so I played with black level correction a bit and got pleasant surprise: [Ignore wrong name]. void debayer8(uint8_t *dst, const uint8_t *src) { for (int x = 0; x < (int)WIDTH; x++) { uint8_t v = src[x]; if (v < 16) dst[x] = 0; else dst[x] = v-16; } } gcc translates it to vector code automatically, and results is only 10% slower than plain memcpy. Test was done on thinkpad x60. If I disable vector instructions, result is 4x time of plain memcpy. I'm quite impressed both by vector unit and by the gcc :-). Best regards, Pavel 00001340 <debayer8>: 1340: e8 f0 ff ff ff call 1335 <__x86.get_pc_thunk.dx> 1345: 81 c2 af 2c 00 00 add $0x2caf,%edx 134b: 56 push %esi 134c: 53 push %ebx 134d: 8b 4c 24 0c mov 0xc(%esp),%ecx 1351: 8b 5c 24 10 mov 0x10(%esp),%ebx 1355: 89 c8 mov %ecx,%eax 1357: 8d 73 01 lea 0x1(%ebx),%esi 135a: 29 f0 sub %esi,%eax 135c: 83 f8 0e cmp $0xe,%eax 135f: b8 00 00 00 00 mov $0x0,%eax 1364: 76 58 jbe 13be <debayer8+0x7e> 1366: 66 0f 6f a2 4c e0 ff movdqa -0x1fb4(%edx),%xmm4 136d: ff 136e: 66 0f 6f 9a 5c e0 ff movdqa -0x1fa4(%edx),%xmm3 1375: ff 1376: 66 0f ef d2 pxor %xmm2,%xmm2 137a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 1380: f3 0f 6f 04 03 movdqu (%ebx,%eax,1),%xmm0 1385: f3 0f 6f 0c 03 movdqu (%ebx,%eax,1),%xmm1 138a: 66 0f d8 c4 psubusb %xmm4,%xmm0 138e: 66 0f fc cb paddb %xmm3,%xmm1 1392: 66 0f 74 c2 pcmpeqb %xmm2,%xmm0 1396: 66 0f df c1 pandn %xmm1,%xmm0 139a: 0f 11 04 01 movups %xmm0,(%ecx,%eax,1) 139e: 83 c0 10 add $0x10,%eax 13a1: 3d 80 07 00 00 cmp $0x780,%eax 13a6: 75 d8 jne 1380 <debayer8+0x40> 13a8: 5b pop %ebx 13a9: 5e pop %esi 13aa: c3 ret 13ab: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi 13af: 90 nop 13b0: c6 04 01 00 movb $0x0,(%ecx,%eax,1) 13b4: 83 c0 01 add $0x1,%eax 13b7: 3d 80 07 00 00 cmp $0x780,%eax 13bc: 74 ea je 13a8 <debayer8+0x68> 13be: 0f b6 14 03 movzbl (%ebx,%eax,1),%edx 13c2: 80 fa 0f cmp $0xf,%dl 13c5: 76 e9 jbe 13b0 <debayer8+0x70> 13c7: 83 ea 10 sub $0x10,%edx 13ca: 88 14 01 mov %dl,(%ecx,%eax,1) 13cd: 83 c0 01 add $0x1,%eax 13d0: 3d 80 07 00 00 cmp $0x780,%eax 13d5: 75 e7 jne 13be <debayer8+0x7e> 13d7: eb cf jmp 13a8 <debayer8+0x68> 13d9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
Hi! > > > > > > - If we split the processing to pre-bayer and post-bayer parts, we should > > > > > > probably work with uint16_t or float's, which may have impact on performance. > > > > > > > > > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for > > > > > > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on > > > > > > CPU is hard to use and may differ on architectures, so the question is whether > > > > > > it's worth to invest into it. > > > > > > > > Good question :-) > > > > > > Oh, so good news is you write SIMD code once with gcc intristics, and > > > gcc does its magic. You don't have to know assembly for that, but it > > > certainly helps to look at the assembly if it looks reasonable. > > > > There are also potentially interesting helper libraries such as > > https://github.com/vectorclass/version2 (I haven't checked the license > > compatibility). > > Ok, so I played with black level correction a bit and got pleasant > surprise: [Ignore wrong name]. > > void debayer8(uint8_t *dst, const uint8_t *src) > { > for (int x = 0; x < (int)WIDTH; x++) { > uint8_t v = src[x]; > if (v < 16) > dst[x] = 0; > else > dst[x] = v-16; > } > } > > gcc translates it to vector code automatically, and results is only > 10% slower than plain memcpy. Test was done on thinkpad x60. If I > disable vector instructions, result is 4x time of plain memcpy. I'm > quite impressed both by vector unit and by the gcc :-). Ok, disassembly below was for different function than benchmark was running due to inlining, but you got the idea. Code is at https://gitlab.com/tui/tui/-/blob/master/cam/blacklevel.c?ref_type=heads if someone wants to play. I tried to do matrix multiply, and while I do get small improvoement from "tree-vectorize", it is from 1.05 sec to 0.94 sec... additional improvement to cca 0.8 sec is possible with "fma". But this is still many times slower than memcpy(), so I'm not sure if we can get good performance there. matmult.c code in same directory. Best regards, Pavel
diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h index 98965fa1..5e38e08b 100644 --- a/include/libcamera/internal/software_isp/debayer_params.h +++ b/include/libcamera/internal/software_isp/debayer_params.h @@ -43,6 +43,10 @@ struct DebayerParams { * \brief Gamma correction, 1.0 is no correction */ float gamma; + /** + * \brief Level of the black point, 0..255, 0 is no correction. + */ + unsigned int blackLevel; }; } /* namespace libcamera */ diff --git a/include/libcamera/internal/software_isp/swisp_stats.h b/include/libcamera/internal/software_isp/swisp_stats.h index afe42c9a..25cd5abd 100644 --- a/include/libcamera/internal/software_isp/swisp_stats.h +++ b/include/libcamera/internal/software_isp/swisp_stats.h @@ -7,6 +7,8 @@ #pragma once +#include <array> + namespace libcamera { /** @@ -28,11 +30,15 @@ struct SwIspStats { /** * \brief Number of bins in the yHistogram. */ - static constexpr unsigned int kYHistogramSize = 16; + static constexpr unsigned int kYHistogramSize = 64; + /** + * \brief Type of the histogram. + */ + using histogram = std::array<unsigned int, kYHistogramSize>; /** * \brief A histogram of luminance values. */ - std::array<unsigned int, kYHistogramSize> yHistogram; + histogram yHistogram; }; } /* namespace libcamera */ diff --git a/src/ipa/simple/black_level.cpp b/src/ipa/simple/black_level.cpp new file mode 100644 index 00000000..9d4aa800 --- /dev/null +++ b/src/ipa/simple/black_level.cpp @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: LGPL-2.1-or-later */ +/* + * Copyright (C) 2024, Red Hat Inc. + * + * black_level.cpp - black level handling + */ + +#include "black_level.h" + +#include <numeric> + +#include <libcamera/base/log.h> + +namespace libcamera { + +LOG_DEFINE_CATEGORY(IPASoftBL) + +/** + * \class BlackLevel + * \brief Object providing black point level for software ISP + * + * Black level can be provided in hardware tuning files or, if no tuning file is + * available for the given hardware, guessed automatically, with less accuracy. + * As tuning files are not yet implemented for software ISP, BlackLevel + * currently provides only guessed black levels. + * + * This class serves for tracking black level as a property of the underlying + * hardware, not as means of enhancing a particular scene or image. + * + * The class is supposed to be instantiated for the given camera stream. + * The black level can be retrieved using BlackLevel::get() method. It is + * initially 0 and may change when updated using BlackLevel::update() method. + */ + +BlackLevel::BlackLevel() + : blackLevel_(255), blackLevelSet_(false) +{ +} + +/** + * \brief Return the current black level + * + * \return The black level, in the range from 0 (minimum) to 255 (maximum). + * If the black level couldn't be determined yet, return 0. + */ +unsigned int BlackLevel::get() const +{ + return blackLevelSet_ ? blackLevel_ : 0; +} + +/** + * \brief Update black level from the provided histogram + * \param[in] yHistogram The histogram to be used for updating black level + * + * The black level is property of the given hardware, not image. It is updated + * only if it has not been yet set or if it is lower than the lowest value seen + * so far. + */ +void BlackLevel::update(SwIspStats::histogram &yHistogram) +{ + /* + * The constant is selected to be "good enough", not overly conservative or + * aggressive. There is no magic about the given value. + */ + constexpr float ignoredPercentage_ = 0.02; + const unsigned int total = + std::accumulate(begin(yHistogram), end(yHistogram), 0); + const unsigned int pixelThreshold = ignoredPercentage_ * total; + const unsigned int histogramRatio = 256 / SwIspStats::kYHistogramSize; + const unsigned int currentBlackIdx = blackLevel_ / histogramRatio; + + for (unsigned int i = 0, seen = 0; + i < currentBlackIdx && i < SwIspStats::kYHistogramSize; + i++) { + seen += yHistogram[i]; + if (seen >= pixelThreshold) { + blackLevel_ = i * histogramRatio; + blackLevelSet_ = true; + LOG(IPASoftBL, Debug) + << "Auto-set black level: " + << i << "/" << SwIspStats::kYHistogramSize + << " (" << 100 * (seen - yHistogram[i]) / total << "% below, " + << 100 * seen / total << "% at or below)"; + break; + } + }; +} +} // namespace libcamera diff --git a/src/ipa/simple/black_level.h b/src/ipa/simple/black_level.h new file mode 100644 index 00000000..b3785db0 --- /dev/null +++ b/src/ipa/simple/black_level.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: LGPL-2.1-or-later */ +/* + * Copyright (C) 2024, Red Hat Inc. + * + * black_level.h - black level handling + */ + +#pragma once + +#include <array> + +#include "libcamera/internal/software_isp/swisp_stats.h" + +namespace libcamera { + +class BlackLevel +{ +public: + BlackLevel(); + unsigned int get() const; + void update(std::array<unsigned int, SwIspStats::kYHistogramSize> &yHistogram); + +private: + unsigned int blackLevel_; + bool blackLevelSet_; +}; + +} // namespace libcamera diff --git a/src/ipa/simple/meson.build b/src/ipa/simple/meson.build index 3e863db7..44b5f1d7 100644 --- a/src/ipa/simple/meson.build +++ b/src/ipa/simple/meson.build @@ -2,8 +2,13 @@ ipa_name = 'ipa_soft_simple' +soft_simple_sources = files([ + 'soft_simple.cpp', + 'black_level.cpp', +]) + mod = shared_module(ipa_name, - ['soft_simple.cpp', libcamera_generated_ipa_headers], + [soft_simple_sources, libcamera_generated_ipa_headers], name_prefix : '', include_directories : [ipa_includes, libipa_includes], dependencies : libcamera_private, diff --git a/src/ipa/simple/soft_simple.cpp b/src/ipa/simple/soft_simple.cpp index 312df4ba..ac027568 100644 --- a/src/ipa/simple/soft_simple.cpp +++ b/src/ipa/simple/soft_simple.cpp @@ -22,6 +22,8 @@ #include "libcamera/internal/software_isp/debayer_params.h" #include "libcamera/internal/software_isp/swisp_stats.h" +#include "black_level.h" + namespace libcamera { LOG_DEFINE_CATEGORY(IPASoft) @@ -33,7 +35,8 @@ class IPASoftSimple : public ipa::soft::IPASoftInterface public: IPASoftSimple() : params_(static_cast<DebayerParams *>(MAP_FAILED)), - stats_(static_cast<SwIspStats *>(MAP_FAILED)), ignore_updates_(0) + stats_(static_cast<SwIspStats *>(MAP_FAILED)), + blackLevel_(BlackLevel()), ignore_updates_(0) { } @@ -63,6 +66,7 @@ private: SharedFD fdParams_; DebayerParams *params_; SwIspStats *stats_; + BlackLevel blackLevel_; int32_t exposure_min_, exposure_max_; int32_t again_min_, again_max_; @@ -196,6 +200,10 @@ void IPASoftSimple::processStats(const ControlList &sensorControls) params_->gainG = 256; params_->gamma = 0.5; + if (ignore_updates_ > 0) + blackLevel_.update(stats_->yHistogram); + params_->blackLevel = blackLevel_.get(); + setIspParams.emit(0); /* @@ -211,18 +219,19 @@ void IPASoftSimple::processStats(const ControlList &sensorControls) * Calculate Mean Sample Value (MSV) according to formula from: * https://www.araa.asn.au/acra/acra2007/papers/paper84final.pdf */ - constexpr unsigned int yHistValsPerBin = - SwIspStats::kYHistogramSize / kExposureBinsCount; - constexpr unsigned int yHistValsPerBinMod = - SwIspStats::kYHistogramSize / - (SwIspStats::kYHistogramSize % kExposureBinsCount + 1); + const unsigned int blackLevelHistIdx = + params_->blackLevel / (256 / SwIspStats::kYHistogramSize); + const unsigned int histogramSize = SwIspStats::kYHistogramSize - blackLevelHistIdx; + const unsigned int yHistValsPerBin = histogramSize / kExposureBinsCount; + const unsigned int yHistValsPerBinMod = + histogramSize / (histogramSize % kExposureBinsCount + 1); int ExposureBins[kExposureBinsCount] = {}; unsigned int denom = 0; unsigned int num = 0; - for (unsigned int i = 0; i < SwIspStats::kYHistogramSize; i++) { + for (unsigned int i = 0; i < histogramSize; i++) { unsigned int idx = (i - (i / yHistValsPerBinMod)) / yHistValsPerBin; - ExposureBins[idx] += stats_->yHistogram[i]; + ExposureBins[idx] += stats_->yHistogram[blackLevelHistIdx + i]; } for (unsigned int i = 0; i < kExposureBinsCount; i++) { @@ -256,7 +265,8 @@ void IPASoftSimple::processStats(const ControlList &sensorControls) LOG(IPASoft, Debug) << "exposureMSV " << exposureMSV << " exp " << exposure_ << " again " << again_ - << " gain R/B " << params_->gainR << "/" << params_->gainB; + << " gain R/B " << params_->gainR << "/" << params_->gainB + << " black level " << params_->blackLevel; } void IPASoftSimple::updateExposure(double exposureMSV) diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp index a1692693..3be3cdfe 100644 --- a/src/libcamera/software_isp/debayer_cpu.cpp +++ b/src/libcamera/software_isp/debayer_cpu.cpp @@ -35,7 +35,7 @@ namespace libcamera { * \param[in] stats Pointer to the stats object to use. */ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats) - : stats_(std::move(stats)), gamma_correction_(1.0) + : stats_(std::move(stats)), gamma_correction_(1.0), blackLevel_(0) { #ifdef __x86_64__ enableInputMemcpy_ = false; @@ -683,11 +683,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams } /* Apply DebayerParams */ - if (params.gamma != gamma_correction_) { - for (unsigned int i = 0; i < kGammaLookupSize; i++) - gamma_[i] = UINT8_MAX * powf(i / (kGammaLookupSize - 1.0), params.gamma); + if (params.gamma != gamma_correction_ || params.blackLevel != blackLevel_) { + const unsigned int blackIndex = + params.blackLevel * kGammaLookupSize / 256; + std::fill(gamma_.begin(), gamma_.begin() + blackIndex, 0); + const float divisor = kGammaLookupSize - blackIndex - 1.0; + for (unsigned int i = blackIndex; i < kGammaLookupSize; i++) + gamma_[i] = UINT8_MAX * powf((i - blackIndex) / divisor, params.gamma); gamma_correction_ = params.gamma; + blackLevel_ = params.blackLevel; } if (swapRedBlueGains_) diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h index 5f44fc65..ea02f909 100644 --- a/src/libcamera/software_isp/debayer_cpu.h +++ b/src/libcamera/software_isp/debayer_cpu.h @@ -147,6 +147,7 @@ private: bool enableInputMemcpy_; bool swapRedBlueGains_; float gamma_correction_; + unsigned int blackLevel_; unsigned int measuredFrames_; int64_t frameProcessTime_; /* Skip 30 frames for things to stabilize then measure 30 frames */ diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp index 388b4496..9b49be41 100644 --- a/src/libcamera/software_isp/software_isp.cpp +++ b/src/libcamera/software_isp/software_isp.cpp @@ -64,7 +64,7 @@ LOG_DEFINE_CATEGORY(SoftwareIsp) */ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const ControlInfoMap &sensorControls) : debayer_(nullptr), - debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f }, + debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f, 0 }, dmaHeap_(DmaHeap::DmaHeapFlag::Cma | DmaHeap::DmaHeapFlag::System) { if (!dmaHeap_.isValid()) {