[v6,17/18] libcamera: software_isp: Apply black level compensation
diff mbox series

Message ID 20240319123622.675599-18-mzamazal@redhat.com
State Accepted
Headers show
Series
  • libcamera: introduce Software ISP and Software IPA
Related show

Commit Message

Milan Zamazal March 19, 2024, 12:36 p.m. UTC
Black may not be represented as 0 pixel value for given hardware, it may be
higher.  If this is not compensated then various problems may occur such as low
contrast or suboptimal exposure.

The black pixel value can be either retrieved from a tuning file for the given
hardware, or automatically on fly.  The former is the right and correct method,
while the latter can be used when a tuning file is not available for the given
hardware.  Since there is currently no support for tuning files in software ISP,
the automatic, hardware independent way, is always used.  Support for tuning
files should be added in future but it will require more work than this patch.

The patch looks at the image histogram and assumes that black starts when pixel
values start occurring on the left.  A certain amount of the darkest pixels is
ignored; it doesn't matter whether they represent various kinds of noise or are
real, they are better to omit in any case to make the image looking better.  It
also doesn't matter whether the darkest pixels occur around the supposed black
level or are spread between 0 and the black level, the difference is not
important.

An arbitrary threshold of 2% darkest pixels is applied; there is no magic about
that value.

The patch assumes that the black values for different colors are the same and
doesn't attempt any other non-primitive enhancements.  It cannot completely
replace tuning files and simplicity, while providing visible benefit, is its
goal.  Anything more sophisticated is left for future patches.

A possible cheap enhancement, if needed, could be setting exposure + gain to
minimum values temporarily, before setting the black level.  In theory, the
black level should be fixed but it may not be reached in all images.  For this
reason, the patch updates black level only if the observed value is lower than
the current one; it should be never increased.

The purpose of the patch is to compensate for hardware properties.  General
image contrast enhancements are out of scope of this patch.

Stats are still gathered as an uncorrected histogram, to avoid any confusion and
to represent the raw image data.  Exposure must be determined after the black
level correction -- it has no influence on the sub-black area and must be
correct after applying the black level correction.  The granularity of the
histogram is increased from 16 to 64 to provide a better precision (there is no
theory behind either of those numbers).

Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 .../internal/software_isp/debayer_params.h    |  4 +
 .../internal/software_isp/swisp_stats.h       | 10 ++-
 src/ipa/simple/black_level.cpp                | 88 +++++++++++++++++++
 src/ipa/simple/black_level.h                  | 28 ++++++
 src/ipa/simple/meson.build                    |  7 +-
 src/ipa/simple/soft_simple.cpp                | 28 ++++--
 src/libcamera/software_isp/debayer_cpu.cpp    | 13 ++-
 src/libcamera/software_isp/debayer_cpu.h      |  1 +
 src/libcamera/software_isp/software_isp.cpp   |  2 +-
 9 files changed, 164 insertions(+), 17 deletions(-)
 create mode 100644 src/ipa/simple/black_level.cpp
 create mode 100644 src/ipa/simple/black_level.h

Comments

Laurent Pinchart March 27, 2024, 6:55 p.m. UTC | #1
Hi Milan,

(CC'ing David)

Thank you for the patch.

On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote:
> Black may not be represented as 0 pixel value for given hardware, it may be
> higher.  If this is not compensated then various problems may occur such as low
> contrast or suboptimal exposure.
> 
> The black pixel value can be either retrieved from a tuning file for the given
> hardware, or automatically on fly.  The former is the right and correct method,

s/on fly/on the fly/

> while the latter can be used when a tuning file is not available for the given
> hardware.  Since there is currently no support for tuning files in software ISP,
> the automatic, hardware independent way, is always used.  Support for tuning
> files should be added in future but it will require more work than this patch.

I don't think this is quite right. Strictly speaking, the black level
compensation is about subtracting the black level as produced by the
sensor. This requires tuning, and shouldn't be done on-the-fly.

On the other hand, looking at the histogram to try and stretch it is
called contrast adjustment. There's nothing wrong in doing so, but it's
usually done towards the output of the processing pipeline, and is
associated with manual adjustments of the contrast and brightness. See
src/ipa/rpi/controller/rpi/contrast.cpp for instance.

David may be able to comment further as to why BLC and contrast are
quite different.

As this patch may need more work, I propose not including it in v7 to
avoid delay merging the rest of the implementation.

> The patch looks at the image histogram and assumes that black starts when pixel
> values start occurring on the left.  A certain amount of the darkest pixels is
> ignored; it doesn't matter whether they represent various kinds of noise or are
> real, they are better to omit in any case to make the image looking better.  It
> also doesn't matter whether the darkest pixels occur around the supposed black
> level or are spread between 0 and the black level, the difference is not
> important.
> 
> An arbitrary threshold of 2% darkest pixels is applied; there is no magic about
> that value.
> 
> The patch assumes that the black values for different colors are the same and
> doesn't attempt any other non-primitive enhancements.  It cannot completely
> replace tuning files and simplicity, while providing visible benefit, is its
> goal.  Anything more sophisticated is left for future patches.
> 
> A possible cheap enhancement, if needed, could be setting exposure + gain to
> minimum values temporarily, before setting the black level.  In theory, the
> black level should be fixed but it may not be reached in all images.  For this
> reason, the patch updates black level only if the observed value is lower than
> the current one; it should be never increased.
> 
> The purpose of the patch is to compensate for hardware properties.  General
> image contrast enhancements are out of scope of this patch.
> 
> Stats are still gathered as an uncorrected histogram, to avoid any confusion and
> to represent the raw image data.  Exposure must be determined after the black
> level correction -- it has no influence on the sub-black area and must be
> correct after applying the black level correction.  The granularity of the
> histogram is increased from 16 to 64 to provide a better precision (there is no
> theory behind either of those numbers).
> 
> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  .../internal/software_isp/debayer_params.h    |  4 +
>  .../internal/software_isp/swisp_stats.h       | 10 ++-
>  src/ipa/simple/black_level.cpp                | 88 +++++++++++++++++++
>  src/ipa/simple/black_level.h                  | 28 ++++++
>  src/ipa/simple/meson.build                    |  7 +-
>  src/ipa/simple/soft_simple.cpp                | 28 ++++--
>  src/libcamera/software_isp/debayer_cpu.cpp    | 13 ++-
>  src/libcamera/software_isp/debayer_cpu.h      |  1 +
>  src/libcamera/software_isp/software_isp.cpp   |  2 +-
>  9 files changed, 164 insertions(+), 17 deletions(-)
>  create mode 100644 src/ipa/simple/black_level.cpp
>  create mode 100644 src/ipa/simple/black_level.h
> 
> diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h
> index 98965fa1..5e38e08b 100644
> --- a/include/libcamera/internal/software_isp/debayer_params.h
> +++ b/include/libcamera/internal/software_isp/debayer_params.h
> @@ -43,6 +43,10 @@ struct DebayerParams {
>  	 * \brief Gamma correction, 1.0 is no correction
>  	 */
>  	float gamma;
> +	/**
> +	 * \brief Level of the black point, 0..255, 0 is no correction.
> +	 */
> +	unsigned int blackLevel;
>  };
>  
>  } /* namespace libcamera */
> diff --git a/include/libcamera/internal/software_isp/swisp_stats.h b/include/libcamera/internal/software_isp/swisp_stats.h
> index afe42c9a..25cd5abd 100644
> --- a/include/libcamera/internal/software_isp/swisp_stats.h
> +++ b/include/libcamera/internal/software_isp/swisp_stats.h
> @@ -7,6 +7,8 @@
>  
>  #pragma once
>  
> +#include <array>
> +
>  namespace libcamera {
>  
>  /**
> @@ -28,11 +30,15 @@ struct SwIspStats {
>  	/**
>  	 * \brief Number of bins in the yHistogram.
>  	 */
> -	static constexpr unsigned int kYHistogramSize = 16;
> +	static constexpr unsigned int kYHistogramSize = 64;
> +	/**
> +	 * \brief Type of the histogram.
> +	 */
> +	using histogram = std::array<unsigned int, kYHistogramSize>;
>  	/**
>  	 * \brief A histogram of luminance values.
>  	 */
> -	std::array<unsigned int, kYHistogramSize> yHistogram;
> +	histogram yHistogram;
>  };
>  
>  } /* namespace libcamera */
> diff --git a/src/ipa/simple/black_level.cpp b/src/ipa/simple/black_level.cpp
> new file mode 100644
> index 00000000..9d4aa800
> --- /dev/null
> +++ b/src/ipa/simple/black_level.cpp
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: LGPL-2.1-or-later */
> +/*
> + * Copyright (C) 2024, Red Hat Inc.
> + *
> + * black_level.cpp - black level handling
> + */
> +
> +#include "black_level.h"
> +
> +#include <numeric>
> +
> +#include <libcamera/base/log.h>
> +
> +namespace libcamera {
> +
> +LOG_DEFINE_CATEGORY(IPASoftBL)
> +
> +/**
> + * \class BlackLevel
> + * \brief Object providing black point level for software ISP
> + *
> + * Black level can be provided in hardware tuning files or, if no tuning file is
> + * available for the given hardware, guessed automatically, with less accuracy.
> + * As tuning files are not yet implemented for software ISP, BlackLevel
> + * currently provides only guessed black levels.
> + *
> + * This class serves for tracking black level as a property of the underlying
> + * hardware, not as means of enhancing a particular scene or image.
> + *
> + * The class is supposed to be instantiated for the given camera stream.
> + * The black level can be retrieved using BlackLevel::get() method. It is
> + * initially 0 and may change when updated using BlackLevel::update() method.
> + */
> +
> +BlackLevel::BlackLevel()
> +	: blackLevel_(255), blackLevelSet_(false)
> +{
> +}
> +
> +/**
> + * \brief Return the current black level
> + *
> + * \return The black level, in the range from 0 (minimum) to 255 (maximum).
> + * If the black level couldn't be determined yet, return 0.
> + */
> +unsigned int BlackLevel::get() const
> +{
> +	return blackLevelSet_ ? blackLevel_ : 0;
> +}
> +
> +/**
> + * \brief Update black level from the provided histogram
> + * \param[in] yHistogram The histogram to be used for updating black level
> + *
> + * The black level is property of the given hardware, not image. It is updated
> + * only if it has not been yet set or if it is lower than the lowest value seen
> + * so far.
> + */
> +void BlackLevel::update(SwIspStats::histogram &yHistogram)
> +{
> +	/*
> +	 * The constant is selected to be "good enough", not overly conservative or
> +	 * aggressive. There is no magic about the given value.
> +	 */
> +	constexpr float ignoredPercentage_ = 0.02;
> +	const unsigned int total =
> +		std::accumulate(begin(yHistogram), end(yHistogram), 0);
> +	const unsigned int pixelThreshold = ignoredPercentage_ * total;
> +	const unsigned int histogramRatio = 256 / SwIspStats::kYHistogramSize;
> +	const unsigned int currentBlackIdx = blackLevel_ / histogramRatio;
> +
> +	for (unsigned int i = 0, seen = 0;
> +	     i < currentBlackIdx && i < SwIspStats::kYHistogramSize;
> +	     i++) {
> +		seen += yHistogram[i];
> +		if (seen >= pixelThreshold) {
> +			blackLevel_ = i * histogramRatio;
> +			blackLevelSet_ = true;
> +			LOG(IPASoftBL, Debug)
> +				<< "Auto-set black level: "
> +				<< i << "/" << SwIspStats::kYHistogramSize
> +				<< " (" << 100 * (seen - yHistogram[i]) / total << "% below, "
> +				<< 100 * seen / total << "% at or below)";
> +			break;
> +		}
> +	};
> +}
> +} // namespace libcamera

} /* namespace libcamera */

Same below.

> diff --git a/src/ipa/simple/black_level.h b/src/ipa/simple/black_level.h
> new file mode 100644
> index 00000000..b3785db0
> --- /dev/null
> +++ b/src/ipa/simple/black_level.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: LGPL-2.1-or-later */
> +/*
> + * Copyright (C) 2024, Red Hat Inc.
> + *
> + * black_level.h - black level handling
> + */
> +
> +#pragma once
> +
> +#include <array>
> +
> +#include "libcamera/internal/software_isp/swisp_stats.h"
> +
> +namespace libcamera {
> +
> +class BlackLevel

Wouldn't it be better to move to using the libipa Algorithm class before
introducing new algorithms ?

> +{
> +public:
> +	BlackLevel();
> +	unsigned int get() const;
> +	void update(std::array<unsigned int, SwIspStats::kYHistogramSize> &yHistogram);
> +
> +private:
> +	unsigned int blackLevel_;
> +	bool blackLevelSet_;
> +};
> +
> +} // namespace libcamera
> diff --git a/src/ipa/simple/meson.build b/src/ipa/simple/meson.build
> index 3e863db7..44b5f1d7 100644
> --- a/src/ipa/simple/meson.build
> +++ b/src/ipa/simple/meson.build
> @@ -2,8 +2,13 @@
>  
>  ipa_name = 'ipa_soft_simple'
>  
> +soft_simple_sources = files([
> +    'soft_simple.cpp',
> +    'black_level.cpp',
> +])
> +
>  mod = shared_module(ipa_name,
> -                    ['soft_simple.cpp', libcamera_generated_ipa_headers],
> +                    [soft_simple_sources, libcamera_generated_ipa_headers],
>                      name_prefix : '',
>                      include_directories : [ipa_includes, libipa_includes],
>                      dependencies : libcamera_private,
> diff --git a/src/ipa/simple/soft_simple.cpp b/src/ipa/simple/soft_simple.cpp
> index 312df4ba..ac027568 100644
> --- a/src/ipa/simple/soft_simple.cpp
> +++ b/src/ipa/simple/soft_simple.cpp
> @@ -22,6 +22,8 @@
>  #include "libcamera/internal/software_isp/debayer_params.h"
>  #include "libcamera/internal/software_isp/swisp_stats.h"
>  
> +#include "black_level.h"
> +
>  namespace libcamera {
>  
>  LOG_DEFINE_CATEGORY(IPASoft)
> @@ -33,7 +35,8 @@ class IPASoftSimple : public ipa::soft::IPASoftInterface
>  public:
>  	IPASoftSimple()
>  		: params_(static_cast<DebayerParams *>(MAP_FAILED)),
> -		  stats_(static_cast<SwIspStats *>(MAP_FAILED)), ignore_updates_(0)
> +		  stats_(static_cast<SwIspStats *>(MAP_FAILED)),
> +		  blackLevel_(BlackLevel()), ignore_updates_(0)
>  	{
>  	}
>  
> @@ -63,6 +66,7 @@ private:
>  	SharedFD fdParams_;
>  	DebayerParams *params_;
>  	SwIspStats *stats_;
> +	BlackLevel blackLevel_;
>  
>  	int32_t exposure_min_, exposure_max_;
>  	int32_t again_min_, again_max_;
> @@ -196,6 +200,10 @@ void IPASoftSimple::processStats(const ControlList &sensorControls)
>  	params_->gainG = 256;
>  	params_->gamma = 0.5;
>  
> +	if (ignore_updates_ > 0)
> +		blackLevel_.update(stats_->yHistogram);
> +	params_->blackLevel = blackLevel_.get();
> +
>  	setIspParams.emit(0);
>  
>  	/*
> @@ -211,18 +219,19 @@ void IPASoftSimple::processStats(const ControlList &sensorControls)
>  	 * Calculate Mean Sample Value (MSV) according to formula from:
>  	 * https://www.araa.asn.au/acra/acra2007/papers/paper84final.pdf
>  	 */
> -	constexpr unsigned int yHistValsPerBin =
> -		SwIspStats::kYHistogramSize / kExposureBinsCount;
> -	constexpr unsigned int yHistValsPerBinMod =
> -		SwIspStats::kYHistogramSize /
> -		(SwIspStats::kYHistogramSize % kExposureBinsCount + 1);
> +	const unsigned int blackLevelHistIdx =
> +		params_->blackLevel / (256 / SwIspStats::kYHistogramSize);
> +	const unsigned int histogramSize = SwIspStats::kYHistogramSize - blackLevelHistIdx;
> +	const unsigned int yHistValsPerBin = histogramSize / kExposureBinsCount;
> +	const unsigned int yHistValsPerBinMod =
> +		histogramSize / (histogramSize % kExposureBinsCount + 1);
>  	int ExposureBins[kExposureBinsCount] = {};
>  	unsigned int denom = 0;
>  	unsigned int num = 0;
>  
> -	for (unsigned int i = 0; i < SwIspStats::kYHistogramSize; i++) {
> +	for (unsigned int i = 0; i < histogramSize; i++) {
>  		unsigned int idx = (i - (i / yHistValsPerBinMod)) / yHistValsPerBin;
> -		ExposureBins[idx] += stats_->yHistogram[i];
> +		ExposureBins[idx] += stats_->yHistogram[blackLevelHistIdx + i];
>  	}
>  
>  	for (unsigned int i = 0; i < kExposureBinsCount; i++) {
> @@ -256,7 +265,8 @@ void IPASoftSimple::processStats(const ControlList &sensorControls)
>  
>  	LOG(IPASoft, Debug) << "exposureMSV " << exposureMSV
>  			    << " exp " << exposure_ << " again " << again_
> -			    << " gain R/B " << params_->gainR << "/" << params_->gainB;
> +			    << " gain R/B " << params_->gainR << "/" << params_->gainB
> +			    << " black level " << params_->blackLevel;
>  }
>  
>  void IPASoftSimple::updateExposure(double exposureMSV)
> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
> index a1692693..3be3cdfe 100644
> --- a/src/libcamera/software_isp/debayer_cpu.cpp
> +++ b/src/libcamera/software_isp/debayer_cpu.cpp
> @@ -35,7 +35,7 @@ namespace libcamera {
>   * \param[in] stats Pointer to the stats object to use.
>   */
>  DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats)
> -	: stats_(std::move(stats)), gamma_correction_(1.0)
> +	: stats_(std::move(stats)), gamma_correction_(1.0), blackLevel_(0)
>  {
>  #ifdef __x86_64__
>  	enableInputMemcpy_ = false;
> @@ -683,11 +683,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams
>  	}
>  
>  	/* Apply DebayerParams */
> -	if (params.gamma != gamma_correction_) {
> -		for (unsigned int i = 0; i < kGammaLookupSize; i++)
> -			gamma_[i] = UINT8_MAX * powf(i / (kGammaLookupSize - 1.0), params.gamma);
> +	if (params.gamma != gamma_correction_ || params.blackLevel != blackLevel_) {
> +		const unsigned int blackIndex =
> +			params.blackLevel * kGammaLookupSize / 256;
> +		std::fill(gamma_.begin(), gamma_.begin() + blackIndex, 0);
> +		const float divisor = kGammaLookupSize - blackIndex - 1.0;
> +		for (unsigned int i = blackIndex; i < kGammaLookupSize; i++)
> +			gamma_[i] = UINT8_MAX * powf((i - blackIndex) / divisor, params.gamma);
>  
>  		gamma_correction_ = params.gamma;
> +		blackLevel_ = params.blackLevel;
>  	}
>  
>  	if (swapRedBlueGains_)
> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
> index 5f44fc65..ea02f909 100644
> --- a/src/libcamera/software_isp/debayer_cpu.h
> +++ b/src/libcamera/software_isp/debayer_cpu.h
> @@ -147,6 +147,7 @@ private:
>  	bool enableInputMemcpy_;
>  	bool swapRedBlueGains_;
>  	float gamma_correction_;
> +	unsigned int blackLevel_;
>  	unsigned int measuredFrames_;
>  	int64_t frameProcessTime_;
>  	/* Skip 30 frames for things to stabilize then measure 30 frames */
> diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp
> index 388b4496..9b49be41 100644
> --- a/src/libcamera/software_isp/software_isp.cpp
> +++ b/src/libcamera/software_isp/software_isp.cpp
> @@ -64,7 +64,7 @@ LOG_DEFINE_CATEGORY(SoftwareIsp)
>   */
>  SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const ControlInfoMap &sensorControls)
>  	: debayer_(nullptr),
> -	  debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f },
> +	  debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f, 0 },
>  	  dmaHeap_(DmaHeap::DmaHeapFlag::Cma | DmaHeap::DmaHeapFlag::System)
>  {
>  	if (!dmaHeap_.isValid()) {
Milan Zamazal March 28, 2024, 10:11 a.m. UTC | #2
Hi Laurent,

thank you for the review.  

Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:

> Hi Milan,
>
> (CC'ing David)
>
> Thank you for the patch.
>
> On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote:
>> Black may not be represented as 0 pixel value for given hardware, it may be
>> higher.  If this is not compensated then various problems may occur such as low
>> contrast or suboptimal exposure.
>> 
>> The black pixel value can be either retrieved from a tuning file for the given
>> hardware, or automatically on fly.  The former is the right and correct method,
>
> s/on fly/on the fly/
>
>> while the latter can be used when a tuning file is not available for the given
>> hardware.  Since there is currently no support for tuning files in software ISP,
>> the automatic, hardware independent way, is always used.  Support for tuning
>> files should be added in future but it will require more work than this patch.
>
> I don't think this is quite right. Strictly speaking, the black level
> compensation is about subtracting the black level as produced by the
> sensor. This requires tuning, and shouldn't be done on-the-fly.

A general agreement from those who tried the patch was that it makes a
noticeable improvement.  As explained in my comments above, this is a fallback
mechanism that we can use cheaply and immediately, until or unless tuning file
for software ISP is available.  So far, it seems to be beneficial rather than
harmful.

Unless we have something better or perfect soon, which doesn't seem to be the
case to me, why not to use this simple and useful (I believe) fallback
mechanism?  Do you have a better suggestion what could be done right now
regarding this issue (not considering black level at all is a real issue)?

> On the other hand, looking at the histogram to try and stretch it is
> called contrast adjustment. There's nothing wrong in doing so, but it's
> usually done towards the output of the processing pipeline, and is
> associated with manual adjustments of the contrast and brightness. See
> src/ipa/rpi/controller/rpi/contrast.cpp for instance.

Contrast adjustment is a different issue than the one this patch tries to
address and less important now.  I plan to work on it and similar functionality
later.

> David may be able to comment further as to why BLC and contrast are
> quite different.

I think I understand the difference but feel free to comment if you think I'm
missing something.

> As this patch may need more work, I propose not including it in v7 to
> avoid delay merging the rest of the implementation.

Please consider my comments above.  I'm open to suggestions how to do better at
the moment.  Less comfortable with nothing should be done about black level
until we have a perfect solution.

Thanks,
Milan
David Plowman March 28, 2024, 11:51 a.m. UTC | #3
Hi everyone

Thanks for cc'ing me on this question. Always happy to try and answer!

On Thu, 28 Mar 2024 at 10:11, Milan Zamazal <mzamazal@redhat.com> wrote:
>
> Hi Laurent,
>
> thank you for the review.
>
> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:
>
> > Hi Milan,
> >
> > (CC'ing David)
> >
> > Thank you for the patch.
> >
> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote:
> >> Black may not be represented as 0 pixel value for given hardware, it may be
> >> higher.  If this is not compensated then various problems may occur such as low
> >> contrast or suboptimal exposure.
> >>
> >> The black pixel value can be either retrieved from a tuning file for the given
> >> hardware, or automatically on fly.  The former is the right and correct method,
> >
> > s/on fly/on the fly/
> >
> >> while the latter can be used when a tuning file is not available for the given
> >> hardware.  Since there is currently no support for tuning files in software ISP,
> >> the automatic, hardware independent way, is always used.  Support for tuning
> >> files should be added in future but it will require more work than this patch.
> >
> > I don't think this is quite right. Strictly speaking, the black level
> > compensation is about subtracting the black level as produced by the
> > sensor. This requires tuning, and shouldn't be done on-the-fly.
>
> A general agreement from those who tried the patch was that it makes a
> noticeable improvement.  As explained in my comments above, this is a fallback
> mechanism that we can use cheaply and immediately, until or unless tuning file
> for software ISP is available.  So far, it seems to be beneficial rather than
> harmful.
>
> Unless we have something better or perfect soon, which doesn't seem to be the
> case to me, why not to use this simple and useful (I believe) fallback
> mechanism?  Do you have a better suggestion what could be done right now
> regarding this issue (not considering black level at all is a real issue)?

This is of course partly a technical issue - what's the right thing to
do? - and particularly a pragmatic issue - whether to do something
imperfect right now or not.

To tackle the technical question, black level is normally removed
fairly early in the pipeline, because not doing so will make typical
operations in the Bayer pipe go wrong. For example:

* Lens shading (vignetting correction). I don't know whether you have
this feature currently, but maybe it will be required at some point to
improve image quality? Doing this without first removing the black
level will make all the colours go wrong.

* White balance. This too will go wrong if you haven't subtracted the
black level. White balance is normally applied before demosaic because
demosaic typically works better with white-balanced pixels. Obviously
it depends what your demosaic algorithm does, but I wonder if there's
a future in which you have higher quality versions of some algorithms
for certain use cases (e.g. stills capture) where this could become an
issue.

* Colour correction. Like white balance, colour correction matrices
will not do the right thing if black level has not been removed.

You can probably guess that I belong more to the "calibrate the camera
properly" view of the world because doing that fixes problems and
makes them go away _forever_. But I'm not familiar with the background
here, so I can appreciate that interim solutions may have a place.

>
> > On the other hand, looking at the histogram to try and stretch it is
> > called contrast adjustment. There's nothing wrong in doing so, but it's
> > usually done towards the output of the processing pipeline, and is
> > associated with manual adjustments of the contrast and brightness. See
> > src/ipa/rpi/controller/rpi/contrast.cpp for instance.
>
> Contrast adjustment is a different issue than the one this patch tries to
> address and less important now.  I plan to work on it and similar functionality
> later.
>
> > David may be able to comment further as to why BLC and contrast are
> > quite different.

We treat contrast adjustment as a more cosmetic feature of an image
which users can adjust according to their own personal taste. They can
have more of it or less of it, as they wish, and it doesn't really
affect the other aspects of image, such as colours or image quality.

Black level is quite different. There's a known value that you should
use. There's basically no choice. Speaking for the Pi, I would be
highly reluctant to couple these two things together - you certainly
wouldn't want contrast adjustment to start changing the colours, for
example!

Anyway, I hope that provides a little bit of a comparison, at least
from the point of view of a different implementation. Good luck!

David

>
> I think I understand the difference but feel free to comment if you think I'm
> missing something.
>
> > As this patch may need more work, I propose not including it in v7 to
> > avoid delay merging the rest of the implementation.
>
> Please consider my comments above.  I'm open to suggestions how to do better at
> the moment.  Less comfortable with nothing should be done about black level
> until we have a perfect solution.
>
> Thanks,
> Milan
>
Milan Zamazal March 28, 2024, 2:47 p.m. UTC | #4
David Plowman <david.plowman@raspberrypi.com> writes:

> Hi everyone
>
> Thanks for cc'ing me on this question. Always happy to try and answer!

Hi David,

thank you for your answer and explanations.

> On Thu, 28 Mar 2024 at 10:11, Milan Zamazal <mzamazal@redhat.com> wrote:
>>
>> Hi Laurent,
>>
>> thank you for the review.
>>
>> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:
>>
>> > Hi Milan,
>> >
>> > (CC'ing David)
>> >
>> > Thank you for the patch.
>> >
>> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote:
>> >> Black may not be represented as 0 pixel value for given hardware, it may be
>> >> higher.  If this is not compensated then various problems may occur such as low
>> >> contrast or suboptimal exposure.
>> >>
>> >> The black pixel value can be either retrieved from a tuning file for the given
>> >> hardware, or automatically on fly.  The former is the right and correct method,
>> >
>> > s/on fly/on the fly/
>> >
>> >> while the latter can be used when a tuning file is not available for the given
>> >> hardware.  Since there is currently no support for tuning files in software ISP,
>> >> the automatic, hardware independent way, is always used.  Support for tuning
>> >> files should be added in future but it will require more work than this patch.
>> >
>> > I don't think this is quite right. Strictly speaking, the black level
>> > compensation is about subtracting the black level as produced by the
>> > sensor. This requires tuning, and shouldn't be done on-the-fly.
>>
>> A general agreement from those who tried the patch was that it makes a
>> noticeable improvement.  As explained in my comments above, this is a fallback
>> mechanism that we can use cheaply and immediately, until or unless tuning file
>> for software ISP is available.  So far, it seems to be beneficial rather than
>> harmful.
>>
>> Unless we have something better or perfect soon, which doesn't seem to be the
>> case to me, why not to use this simple and useful (I believe) fallback
>> mechanism?  Do you have a better suggestion what could be done right now
>> regarding this issue (not considering black level at all is a real issue)?
>
> This is of course partly a technical issue - what's the right thing to
> do? - and particularly a pragmatic issue - whether to do something
> imperfect right now or not.

Yes.

> To tackle the technical question, black level is normally removed
> fairly early in the pipeline, because not doing so will make typical
> operations in the Bayer pipe go wrong. For example:
>
> * Lens shading (vignetting correction). I don't know whether you have
> this feature currently, but maybe it will be required at some point to
> improve image quality? Doing this without first removing the black
> level will make all the colours go wrong.
>
> * White balance. This too will go wrong if you haven't subtracted the
> black level. White balance is normally applied before demosaic because
> demosaic typically works better with white-balanced pixels. Obviously
> it depends what your demosaic algorithm does, but I wonder if there's
> a future in which you have higher quality versions of some algorithms
> for certain use cases (e.g. stills capture) where this could become an
> issue.
>
> * Colour correction. Like white balance, colour correction matrices
> will not do the right thing if black level has not been removed.

We don't have lens shading or color correction, once we have support for these
there is no reason not to have proper black level correction as well.  As for
white balance, we currently use probably the most primitive method possible
(green/red and green/blue ratios) and we don't subtract black level even with
this patch (which should get fixed later).  The patch subtracts black level when
correcting exposure though, which is also influenced by this.  Considering all
the imperfections at the moment, the actual effects on white balance and
exposure may not be that big to care very much.  But even then ...

> You can probably guess that I belong more to the "calibrate the camera
> properly" view of the world because doing that fixes problems and
> makes them go away _forever_. But I'm not familiar with the background
> here, so I can appreciate that interim solutions may have a place.

... the thing is that not applying any black level correction, even when not
considering white balance and exposure deformations, often makes the image
looking noticeably flat as there are no blacks.  I'd say better imperfect blacks
than no blacks, similarly to having imperfect rather than no white balance
correction (yes, comparing apples and oranges technically, but both have their
impacts on the user satisfaction with the output).

>> > On the other hand, looking at the histogram to try and stretch it is
>> > called contrast adjustment. There's nothing wrong in doing so, but it's
>> > usually done towards the output of the processing pipeline, and is
>> > associated with manual adjustments of the contrast and brightness. See
>> > src/ipa/rpi/controller/rpi/contrast.cpp for instance.
>>
>> Contrast adjustment is a different issue than the one this patch tries to
>> address and less important now.  I plan to work on it and similar functionality
>> later.
>>
>> > David may be able to comment further as to why BLC and contrast are
>> > quite different.
>
> We treat contrast adjustment as a more cosmetic feature of an image
> which users can adjust according to their own personal taste. They can
> have more of it or less of it, as they wish, and it doesn't really
> affect the other aspects of image, such as colours or image quality.

Yes, as for contrast, we quickly rejected the idea of handling it automatically
and decided to implement the proper user adjustable settings later.

> Black level is quite different. There's a known value that you should
> use. There's basically no choice. Speaking for the Pi, I would be
> highly reluctant to couple these two things together - you certainly
> wouldn't want contrast adjustment to start changing the colours, for
> example!

Exactly.  This is why I'm advocating for still considering the patch rather than
dropping it.  At least in my setup the auto-determined black value doesn't seem
to be way off and serves its purpose even without access to exact tuning data.

> Anyway, I hope that provides a little bit of a comparison, at least
> from the point of view of a different implementation. Good luck!

Thanks,
Milan

> David
>
>>
>> I think I understand the difference but feel free to comment if you think I'm
>> missing something.
>>
>> > As this patch may need more work, I propose not including it in v7 to
>> > avoid delay merging the rest of the implementation.
>>
>> Please consider my comments above.  I'm open to suggestions how to do better at
>> the moment.  Less comfortable with nothing should be done about black level
>> until we have a perfect solution.
>>
>> Thanks,
>> Milan
>>
Milan Zamazal April 2, 2024, 7:48 p.m. UTC | #5
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:

> As this patch may need more work, I propose not including it in v7 to
> avoid delay merging the rest of the implementation.

I think David and I have provided sufficient additional information to make the
decision.  I'll keep the patch in v7 but it will be the last patch in the series
again so it's easy to either take or omit it.

[...]

>> +class BlackLevel
>
> Wouldn't it be better to move to using the libipa Algorithm class before
> introducing new algorithms ?

I suppose we will need this sooner or later but it will need a non-trivial
amount of work to integrate software ISP with this in a meaningful way and to
decide how to de-wire exposure, white balance, and possibly black level from the
current code, without wasting the precious CPU cycles.

Regards,
Milan
Laurent Pinchart April 2, 2024, 9:31 p.m. UTC | #6
Hi Milan,

On Tue, Apr 02, 2024 at 09:48:34PM +0200, Milan Zamazal wrote:
> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:
> 
> > As this patch may need more work, I propose not including it in v7 to
> > avoid delay merging the rest of the implementation.
> 
> I think David and I have provided sufficient additional information to make the
> decision.  I'll keep the patch in v7 but it will be the last patch in the series
> again so it's easy to either take or omit it.

Thank you.

> [...]
> 
> >> +class BlackLevel
> >
> > Wouldn't it be better to move to using the libipa Algorithm class before
> > introducing new algorithms ?
> 
> I suppose we will need this sooner or later but it will need a non-trivial
> amount of work to integrate software ISP with this in a meaningful way and to
> decide how to de-wire exposure, white balance, and possibly black level from the
> current code, without wasting the precious CPU cycles.

The CPU-intensive part is the image processing itself, the IPA side
should hopefully not be an issue, even if we complexify it a little bit
by using the more modular architecture that the Algorithm class
provides.
Laurent Pinchart April 2, 2024, 10:07 p.m. UTC | #7
Hello,

On Thu, Mar 28, 2024 at 03:47:32PM +0100, Milan Zamazal wrote:
> David Plowman writes:
> 
> > Hi everyone
> >
> > Thanks for cc'ing me on this question. Always happy to try and answer!
> 
> Hi David,
> 
> thank you for your answer and explanations.

Likewise, thank you.

> > On Thu, 28 Mar 2024 at 10:11, Milan Zamazal wrote:
> >> Laurent Pinchart writes:
> >> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote:
> >> >> Black may not be represented as 0 pixel value for given hardware, it may be
> >> >> higher.  If this is not compensated then various problems may occur such as low
> >> >> contrast or suboptimal exposure.
> >> >>
> >> >> The black pixel value can be either retrieved from a tuning file for the given
> >> >> hardware, or automatically on fly.  The former is the right and correct method,
> >> >
> >> > s/on fly/on the fly/
> >> >
> >> >> while the latter can be used when a tuning file is not available for the given
> >> >> hardware.  Since there is currently no support for tuning files in software ISP,
> >> >> the automatic, hardware independent way, is always used.  Support for tuning
> >> >> files should be added in future but it will require more work than this patch.
> >> >
> >> > I don't think this is quite right. Strictly speaking, the black level
> >> > compensation is about subtracting the black level as produced by the
> >> > sensor. This requires tuning, and shouldn't be done on-the-fly.
> >>
> >> A general agreement from those who tried the patch was that it makes a
> >> noticeable improvement.  As explained in my comments above, this is a fallback
> >> mechanism that we can use cheaply and immediately, until or unless tuning file
> >> for software ISP is available.  So far, it seems to be beneficial rather than
> >> harmful.
> >>
> >> Unless we have something better or perfect soon, which doesn't seem to be the
> >> case to me, why not to use this simple and useful (I believe) fallback
> >> mechanism?  Do you have a better suggestion what could be done right now
> >> regarding this issue (not considering black level at all is a real issue)?
> >
> > This is of course partly a technical issue - what's the right thing to
> > do? - and particularly a pragmatic issue - whether to do something
> > imperfect right now or not.
> 
> Yes.
> 
> > To tackle the technical question, black level is normally removed
> > fairly early in the pipeline, because not doing so will make typical
> > operations in the Bayer pipe go wrong. For example:
> >
> > * Lens shading (vignetting correction). I don't know whether you have
> > this feature currently, but maybe it will be required at some point to
> > improve image quality? Doing this without first removing the black
> > level will make all the colours go wrong.
> >
> > * White balance. This too will go wrong if you haven't subtracted the
> > black level. White balance is normally applied before demosaic because
> > demosaic typically works better with white-balanced pixels. Obviously
> > it depends what your demosaic algorithm does, but I wonder if there's
> > a future in which you have higher quality versions of some algorithms
> > for certain use cases (e.g. stills capture) where this could become an
> > issue.
> >
> > * Colour correction. Like white balance, colour correction matrices
> > will not do the right thing if black level has not been removed.
> 
> We don't have lens shading or color correction, once we have support for these
> there is no reason not to have proper black level correction as well.  As for
> white balance, we currently use probably the most primitive method possible
> (green/red and green/blue ratios) and we don't subtract black level even with
> this patch (which should get fixed later).  The patch subtracts black level when
> correcting exposure though, which is also influenced by this.  Considering all
> the imperfections at the moment, the actual effects on white balance and
> exposure may not be that big to care very much.  But even then ...
> 
> > You can probably guess that I belong more to the "calibrate the camera
> > properly" view of the world because doing that fixes problems and
> > makes them go away _forever_. But I'm not familiar with the background
> > here, so I can appreciate that interim solutions may have a place.
> 
> ... the thing is that not applying any black level correction, even when not
> considering white balance and exposure deformations, often makes the image
> looking noticeably flat as there are no blacks.  I'd say better imperfect blacks
> than no blacks, similarly to having imperfect rather than no white balance
> correction (yes, comparing apples and oranges technically, but both have their
> impacts on the user satisfaction with the output).
> 
> >> > On the other hand, looking at the histogram to try and stretch it is
> >> > called contrast adjustment. There's nothing wrong in doing so, but it's
> >> > usually done towards the output of the processing pipeline, and is
> >> > associated with manual adjustments of the contrast and brightness. See
> >> > src/ipa/rpi/controller/rpi/contrast.cpp for instance.
> >>
> >> Contrast adjustment is a different issue than the one this patch tries to
> >> address and less important now.  I plan to work on it and similar functionality
> >> later.
> >>
> >> > David may be able to comment further as to why BLC and contrast are
> >> > quite different.
> >
> > We treat contrast adjustment as a more cosmetic feature of an image
> > which users can adjust according to their own personal taste. They can
> > have more of it or less of it, as they wish, and it doesn't really
> > affect the other aspects of image, such as colours or image quality.
> 
> Yes, as for contrast, we quickly rejected the idea of handling it automatically
> and decided to implement the proper user adjustable settings later.
> 
> > Black level is quite different. There's a known value that you should
> > use. There's basically no choice. Speaking for the Pi, I would be
> > highly reluctant to couple these two things together - you certainly
> > wouldn't want contrast adjustment to start changing the colours, for
> > example!
> 
> Exactly.  This is why I'm advocating for still considering the patch rather than
> dropping it.  At least in my setup the auto-determined black value doesn't seem
> to be way off and serves its purpose even without access to exact tuning data.

As for other part of this series, I'm fine starting with this initial
implementation and improving it, but I think the black level should
eventually be moved before debayering, and ideally the colour gains as
well. I understand the need for optimizations to lower the CPU
consumption, but at the same time I don't feel comfortable building up
on top of an implementation that may work a bit more by chance than by
correctness, as that's not very maintainable.

> > Anyway, I hope that provides a little bit of a comparison, at least
> > from the point of view of a different implementation. Good luck!
> >
> >> I think I understand the difference but feel free to comment if you think I'm
> >> missing something.
> >>
> >> > As this patch may need more work, I propose not including it in v7 to
> >> > avoid delay merging the rest of the implementation.
> >>
> >> Please consider my comments above.  I'm open to suggestions how to do better at
> >> the moment.  Less comfortable with nothing should be done about black level
> >> until we have a perfect solution.
Milan Zamazal April 3, 2024, 10:11 a.m. UTC | #8
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:

> Hello,
>
> On Thu, Mar 28, 2024 at 03:47:32PM +0100, Milan Zamazal wrote:
>> David Plowman writes:
>> 
>> > Hi everyone
>> >
>> > Thanks for cc'ing me on this question. Always happy to try and answer!
>> 
>> Hi David,
>> 
>> thank you for your answer and explanations.
>
> Likewise, thank you.
>
>> > On Thu, 28 Mar 2024 at 10:11, Milan Zamazal wrote:
>> >> Laurent Pinchart writes:
>> >> > On Tue, Mar 19, 2024 at 01:36:04PM +0100, Milan Zamazal wrote:
>> >> >> Black may not be represented as 0 pixel value for given hardware, it may be
>> >> >> higher.  If this is not compensated then various problems may occur such as low
>> >> >> contrast or suboptimal exposure.
>> >> >>
>> >> >> The black pixel value can be either retrieved from a tuning file for the given
>> >> >> hardware, or automatically on fly.  The former is the right and correct method,
>> >> >
>> >> > s/on fly/on the fly/
>> >> >
>> >> >> while the latter can be used when a tuning file is not available for the given
>> >> >> hardware.  Since there is currently no support for tuning files in software ISP,
>> >> >> the automatic, hardware independent way, is always used.  Support for tuning
>> >> >> files should be added in future but it will require more work than this patch.
>> >> >
>> >> > I don't think this is quite right. Strictly speaking, the black level
>> >> > compensation is about subtracting the black level as produced by the
>> >> > sensor. This requires tuning, and shouldn't be done on-the-fly.
>> >>
>> >> A general agreement from those who tried the patch was that it makes a
>> >> noticeable improvement.  As explained in my comments above, this is a fallback
>> >> mechanism that we can use cheaply and immediately, until or unless tuning file
>> >> for software ISP is available.  So far, it seems to be beneficial rather than
>> >> harmful.
>> >>
>> >> Unless we have something better or perfect soon, which doesn't seem to be the
>> >> case to me, why not to use this simple and useful (I believe) fallback
>> >> mechanism?  Do you have a better suggestion what could be done right now
>> >> regarding this issue (not considering black level at all is a real issue)?
>> >
>> > This is of course partly a technical issue - what's the right thing to
>> > do? - and particularly a pragmatic issue - whether to do something
>> > imperfect right now or not.
>> 
>> Yes.
>> 
>> > To tackle the technical question, black level is normally removed
>> > fairly early in the pipeline, because not doing so will make typical
>> > operations in the Bayer pipe go wrong. For example:
>> >
>> > * Lens shading (vignetting correction). I don't know whether you have
>> > this feature currently, but maybe it will be required at some point to
>> > improve image quality? Doing this without first removing the black
>> > level will make all the colours go wrong.
>> >
>> > * White balance. This too will go wrong if you haven't subtracted the
>> > black level. White balance is normally applied before demosaic because
>> > demosaic typically works better with white-balanced pixels. Obviously
>> > it depends what your demosaic algorithm does, but I wonder if there's
>> > a future in which you have higher quality versions of some algorithms
>> > for certain use cases (e.g. stills capture) where this could become an
>> > issue.
>> >
>> > * Colour correction. Like white balance, colour correction matrices
>> > will not do the right thing if black level has not been removed.
>> 
>> We don't have lens shading or color correction, once we have support for these
>> there is no reason not to have proper black level correction as well.  As for
>> white balance, we currently use probably the most primitive method possible
>> (green/red and green/blue ratios) and we don't subtract black level even with
>> this patch (which should get fixed later).  The patch subtracts black level when
>> correcting exposure though, which is also influenced by this.  Considering all
>> the imperfections at the moment, the actual effects on white balance and
>> exposure may not be that big to care very much.  But even then ...
>> 
>> > You can probably guess that I belong more to the "calibrate the camera
>> > properly" view of the world because doing that fixes problems and
>> > makes them go away _forever_. But I'm not familiar with the background
>> > here, so I can appreciate that interim solutions may have a place.
>> 
>> ... the thing is that not applying any black level correction, even when not
>> considering white balance and exposure deformations, often makes the image
>> looking noticeably flat as there are no blacks.  I'd say better imperfect blacks
>> than no blacks, similarly to having imperfect rather than no white balance
>> correction (yes, comparing apples and oranges technically, but both have their
>> impacts on the user satisfaction with the output).
>> 
>> >> > On the other hand, looking at the histogram to try and stretch it is
>> >> > called contrast adjustment. There's nothing wrong in doing so, but it's
>> >> > usually done towards the output of the processing pipeline, and is
>> >> > associated with manual adjustments of the contrast and brightness. See
>> >> > src/ipa/rpi/controller/rpi/contrast.cpp for instance.
>> >>
>> >> Contrast adjustment is a different issue than the one this patch tries to
>> >> address and less important now.  I plan to work on it and similar functionality
>> >> later.
>> >>
>> >> > David may be able to comment further as to why BLC and contrast are
>> >> > quite different.
>> >
>> > We treat contrast adjustment as a more cosmetic feature of an image
>> > which users can adjust according to their own personal taste. They can
>> > have more of it or less of it, as they wish, and it doesn't really
>> > affect the other aspects of image, such as colours or image quality.
>> 
>> Yes, as for contrast, we quickly rejected the idea of handling it automatically
>> and decided to implement the proper user adjustable settings later.
>> 
>> > Black level is quite different. There's a known value that you should
>> > use. There's basically no choice. Speaking for the Pi, I would be
>> > highly reluctant to couple these two things together - you certainly
>> > wouldn't want contrast adjustment to start changing the colours, for
>> > example!
>> 
>> Exactly.  This is why I'm advocating for still considering the patch rather than
>> dropping it.  At least in my setup the auto-determined black value doesn't seem
>> to be way off and serves its purpose even without access to exact tuning data.
>
> As for other part of this series, I'm fine starting with this initial
> implementation and improving it, but I think the black level should
> eventually be moved before debayering, and ideally the colour gains as
> well. I understand the need for optimizations to lower the CPU
> consumption, but at the same time I don't feel comfortable building up
> on top of an implementation that may work a bit more by chance than by
> correctness, as that's not very maintainable.

I agree with this approach.  I added your comment to the TODO file.

>> > Anyway, I hope that provides a little bit of a comparison, at least
>> > from the point of view of a different implementation. Good luck!
>> >
>> >> I think I understand the difference but feel free to comment if you think I'm
>> >> missing something.
>> >>
>> >> > As this patch may need more work, I propose not including it in v7 to
>> >> > avoid delay merging the rest of the implementation.
>> >>
>> >> Please consider my comments above.  I'm open to suggestions how to do better at
>> >> the moment.  Less comfortable with nothing should be done about black level
>> >> until we have a perfect solution.
Milan Zamazal April 16, 2024, 3:31 p.m. UTC | #9
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:

> As for other part of this series, I'm fine starting with this initial
> implementation and improving it, but I think the black level should
> eventually be moved before debayering, and ideally the colour gains as
> well. I understand the need for optimizations to lower the CPU
> consumption, but at the same time I don't feel comfortable building up
> on top of an implementation that may work a bit more by chance than by
> correctness, as that's not very maintainable.

Hi Laurent,

to make sure we understand each other correctly: Can we agree that "be moved
before debayering" above means primarily separating the level adjustments
computations from debayering code and building them (together with other stuff)
on top of libipa/algorithm.h?  Rather than first applying the corrections on the
data and only then perform debayering, which luxury we cannot afford, at least
on CPU?  I.e. the first step could be something like
https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ?

Regards,
Milan
Laurent Pinchart April 20, 2024, 10:42 a.m. UTC | #10
Hi Milan,

On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote:
> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:
> 
> > As for other part of this series, I'm fine starting with this initial
> > implementation and improving it, but I think the black level should
> > eventually be moved before debayering, and ideally the colour gains as
> > well. I understand the need for optimizations to lower the CPU
> > consumption, but at the same time I don't feel comfortable building up
> > on top of an implementation that may work a bit more by chance than by
> > correctness, as that's not very maintainable.
> 
> Hi Laurent,
> 
> to make sure we understand each other correctly: Can we agree that "be moved
> before debayering" above means primarily separating the level adjustments
> computations from debayering code and building them (together with other stuff)
> on top of libipa/algorithm.h?  Rather than first applying the corrections on the
> data and only then perform debayering, which luxury we cannot afford, at least
> on CPU?  I.e. the first step could be something like
> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ?

What I meant is subtracting the black level before applying the colour
gains and performing the CFA interpolation (a.k.a. debayering). I'm fine
with the black level subtraction and CFA interpolation being performed
in a single pass by a single function for performance reason, but the
order needs to be swapped.
Hans de Goede April 21, 2024, 12:16 p.m. UTC | #11
Hi,

On 4/20/24 12:42 PM, Laurent Pinchart wrote:
> Hi Milan,
> 
> On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote:
>> Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:
>>
>>> As for other part of this series, I'm fine starting with this initial
>>> implementation and improving it, but I think the black level should
>>> eventually be moved before debayering, and ideally the colour gains as
>>> well. I understand the need for optimizations to lower the CPU
>>> consumption, but at the same time I don't feel comfortable building up
>>> on top of an implementation that may work a bit more by chance than by
>>> correctness, as that's not very maintainable.
>>
>> Hi Laurent,
>>
>> to make sure we understand each other correctly: Can we agree that "be moved
>> before debayering" above means primarily separating the level adjustments
>> computations from debayering code and building them (together with other stuff)
>> on top of libipa/algorithm.h?  Rather than first applying the corrections on the
>> data and only then perform debayering, which luxury we cannot afford, at least
>> on CPU?  I.e. the first step could be something like
>> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ?
> 
> What I meant is subtracting the black level before applying the colour
> gains and performing the CFA interpolation (a.k.a. debayering). I'm fine
> with the black level subtraction and CFA interpolation being performed
> in a single pass by a single function for performance reason, but the
> order needs to be swapped.

If we apply the per color lookup table (which does black-level compensation +
color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to
do this on some extra copy before debayering or we need to do it multiple times
per pixel before averaging the surrounding pixels, neither one of which is ideal.

I guess we could make the memcpy to temp buffer behavior done to deal with
uncached mem mandatory (so always do this) and then first apply the rgb lookups
to a just copied raw line before debayering, then we still only need to do
this once per color. And this could also help wrt cache behavior of the
lookups since we then linearly travel through 1 line touching only the
1 line (in temp buffer) + read 2 lookup tables. So then there will be less
cache contention then when using the lookups during debayering when
were are reading 3 lines + writing 1 line + accessing all 3 lookup tables
(so I guess this might even speed things up especially on the imx8).

I don't expect this to make much difference for the resulting image though,
OTOH it will make some difference and testing has shown that the overhead
of using the extra memcpy buffers in cases where this is not necessary is
small. So I guess we could do this.

Milan can you extract what I have in mind from the above and is this
something you can work on, or shall I take a shot at this when I can schedule
some time for this?

Regards,

Hans
Laurent Pinchart April 21, 2024, 12:22 p.m. UTC | #12
Hi Hans,

On Sun, Apr 21, 2024 at 02:16:06PM +0200, Hans de Goede wrote:
> On 4/20/24 12:42 PM, Laurent Pinchart wrote:
> > On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote:
> >> Laurent Pinchart writes:
> >>
> >>> As for other part of this series, I'm fine starting with this initial
> >>> implementation and improving it, but I think the black level should
> >>> eventually be moved before debayering, and ideally the colour gains as
> >>> well. I understand the need for optimizations to lower the CPU
> >>> consumption, but at the same time I don't feel comfortable building up
> >>> on top of an implementation that may work a bit more by chance than by
> >>> correctness, as that's not very maintainable.
> >>
> >> Hi Laurent,
> >>
> >> to make sure we understand each other correctly: Can we agree that "be moved
> >> before debayering" above means primarily separating the level adjustments
> >> computations from debayering code and building them (together with other stuff)
> >> on top of libipa/algorithm.h?  Rather than first applying the corrections on the
> >> data and only then perform debayering, which luxury we cannot afford, at least
> >> on CPU?  I.e. the first step could be something like
> >> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ?
> > 
> > What I meant is subtracting the black level before applying the colour
> > gains and performing the CFA interpolation (a.k.a. debayering). I'm fine
> > with the black level subtraction and CFA interpolation being performed
> > in a single pass by a single function for performance reason, but the
> > order needs to be swapped.
> 
> If we apply the per color lookup table (which does black-level compensation +
> color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to
> do this on some extra copy before debayering or we need to do it multiple times
> per pixel before averaging the surrounding pixels, neither one of which is ideal.

It's not the full lookup table that needs to be applied before
debayering, it's only the black level subtraction and the colour gains.
The gamma curve needs to go after debayering, and additional histogram
stretching could go there too.

There's also a need to add a CCM matrix to the pipeline, multiplying the
RGB values obtained by debayering by a 3x3 matrix before applying the
gamma curve.

With that I think we would have a reasonable image quality already.
Additional processing, such as lens shading correction, defective pixel
correction and denoising, are out of scope for CPU processing in my
opinion. They could be implemented with a GPU soft ISP though.

> I guess we could make the memcpy to temp buffer behavior done to deal with
> uncached mem mandatory (so always do this) and then first apply the rgb lookups
> to a just copied raw line before debayering, then we still only need to do
> this once per color. And this could also help wrt cache behavior of the
> lookups since we then linearly travel through 1 line touching only the
> 1 line (in temp buffer) + read 2 lookup tables. So then there will be less
> cache contention then when using the lookups during debayering when
> were are reading 3 lines + writing 1 line + accessing all 3 lookup tables
> (so I guess this might even speed things up especially on the imx8).
> 
> I don't expect this to make much difference for the resulting image though,
> OTOH it will make some difference and testing has shown that the overhead
> of using the extra memcpy buffers in cases where this is not necessary is
> small. So I guess we could do this.
> 
> Milan can you extract what I have in mind from the above and is this
> something you can work on, or shall I take a shot at this when I can schedule
> some time for this?
Hans de Goede April 21, 2024, 2:47 p.m. UTC | #13
Hi Laurent,

On 4/21/24 2:22 PM, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Sun, Apr 21, 2024 at 02:16:06PM +0200, Hans de Goede wrote:
>> On 4/20/24 12:42 PM, Laurent Pinchart wrote:
>>> On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote:
>>>> Laurent Pinchart writes:
>>>>
>>>>> As for other part of this series, I'm fine starting with this initial
>>>>> implementation and improving it, but I think the black level should
>>>>> eventually be moved before debayering, and ideally the colour gains as
>>>>> well. I understand the need for optimizations to lower the CPU
>>>>> consumption, but at the same time I don't feel comfortable building up
>>>>> on top of an implementation that may work a bit more by chance than by
>>>>> correctness, as that's not very maintainable.
>>>>
>>>> Hi Laurent,
>>>>
>>>> to make sure we understand each other correctly: Can we agree that "be moved
>>>> before debayering" above means primarily separating the level adjustments
>>>> computations from debayering code and building them (together with other stuff)
>>>> on top of libipa/algorithm.h?  Rather than first applying the corrections on the
>>>> data and only then perform debayering, which luxury we cannot afford, at least
>>>> on CPU?  I.e. the first step could be something like
>>>> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ?
>>>
>>> What I meant is subtracting the black level before applying the colour
>>> gains and performing the CFA interpolation (a.k.a. debayering). I'm fine
>>> with the black level subtraction and CFA interpolation being performed
>>> in a single pass by a single function for performance reason, but the
>>> order needs to be swapped.
>>
>> If we apply the per color lookup table (which does black-level compensation +
>> color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to
>> do this on some extra copy before debayering or we need to do it multiple times
>> per pixel before averaging the surrounding pixels, neither one of which is ideal.
> 
> It's not the full lookup table that needs to be applied before
> debayering, it's only the black level subtraction and the colour gains.

The lookup table also gives us clamping for free. If we do actual substract
+ multiply we need to clamp and that means adding 2 conditional jumps
per pixel and testing has shown that that really really hurts on modern
processors.

> The gamma curve needs to go after debayering, and additional histogram
> stretching could go there too.
> 
> There's also a need to add a CCM matrix to the pipeline, multiplying the
> RGB values obtained by debayering by a 3x3 matrix before applying the
> gamma curve.

I have looked into CCM already, but IMHO that is too heavy to do on
the CPU. We might get away with that at the big Intel Cores, but at
a significant CPU usage and battery cost.

As for gamma correction I think that we need to live with that being applied
before debayering too (our debayering is a straight averaging of surrounding
pixels, so the impact of doing this before / after should be small).

> With that I think we would have a reasonable image quality already.

I agree that a CCM is the biggest missing item. But I think we can still
get reasonable quality without that and the CPU SoftwareISP really should
be a fallback for when all else fails.

I would rather see us focus on working on a GPU based SoftwareISP and then
we can add CCM and proper post debay gamma correction there.

> Additional processing, such as lens shading correction, defective pixel
> correction and denoising, are out of scope for CPU processing in my
> opinion. They could be implemented with a GPU soft ISP though.

Ack.

Regards,

Hans
Laurent Pinchart April 21, 2024, 2:59 p.m. UTC | #14
Hi Hans,

On Sun, Apr 21, 2024 at 04:47:30PM +0200, Hans de Goede wrote:
> On 4/21/24 2:22 PM, Laurent Pinchart wrote:
> > On Sun, Apr 21, 2024 at 02:16:06PM +0200, Hans de Goede wrote:
> >> On 4/20/24 12:42 PM, Laurent Pinchart wrote:
> >>> On Tue, Apr 16, 2024 at 05:31:38PM +0200, Milan Zamazal wrote:
> >>>> Laurent Pinchart writes:
> >>>>
> >>>>> As for other part of this series, I'm fine starting with this initial
> >>>>> implementation and improving it, but I think the black level should
> >>>>> eventually be moved before debayering, and ideally the colour gains as
> >>>>> well. I understand the need for optimizations to lower the CPU
> >>>>> consumption, but at the same time I don't feel comfortable building up
> >>>>> on top of an implementation that may work a bit more by chance than by
> >>>>> correctness, as that's not very maintainable.
> >>>>
> >>>> Hi Laurent,
> >>>>
> >>>> to make sure we understand each other correctly: Can we agree that "be moved
> >>>> before debayering" above means primarily separating the level adjustments
> >>>> computations from debayering code and building them (together with other stuff)
> >>>> on top of libipa/algorithm.h?  Rather than first applying the corrections on the
> >>>> data and only then perform debayering, which luxury we cannot afford, at least
> >>>> on CPU?  I.e. the first step could be something like
> >>>> https://gitlab.freedesktop.org/mzamazal/libcamera-softisp/-/commit/6180d853005bb956b3d2469576b139ff523b76ad ?
> >>>
> >>> What I meant is subtracting the black level before applying the colour
> >>> gains and performing the CFA interpolation (a.k.a. debayering). I'm fine
> >>> with the black level subtraction and CFA interpolation being performed
> >>> in a single pass by a single function for performance reason, but the
> >>> order needs to be swapped.
> >>
> >> If we apply the per color lookup table (which does black-level compensation +
> >> color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to
> >> do this on some extra copy before debayering or we need to do it multiple times
> >> per pixel before averaging the surrounding pixels, neither one of which is ideal.
> > 
> > It's not the full lookup table that needs to be applied before
> > debayering, it's only the black level subtraction and the colour gains.
> 
> The lookup table also gives us clamping for free. If we do actual substract
> + multiply we need to clamp and that means adding 2 conditional jumps
> per pixel and testing has shown that that really really hurts on modern
> processors.

Could this be implemented with a LUT separate from the gamma LUT ?

> > The gamma curve needs to go after debayering, and additional histogram
> > stretching could go there too.
> > 
> > There's also a need to add a CCM matrix to the pipeline, multiplying the
> > RGB values obtained by debayering by a 3x3 matrix before applying the
> > gamma curve.
> 
> I have looked into CCM already, but IMHO that is too heavy to do on
> the CPU. We might get away with that at the big Intel Cores, but at
> a significant CPU usage and battery cost.

Would SIMD help ?

> As for gamma correction I think that we need to live with that being applied
> before debayering too (our debayering is a straight averaging of surrounding
> pixels, so the impact of doing this before / after should be small).

As long as we have no CCM that may be an option.

> > With that I think we would have a reasonable image quality already.
> 
> I agree that a CCM is the biggest missing item. But I think we can still
> get reasonable quality without that and the CPU SoftwareISP really should
> be a fallback for when all else fails.

I think the CPU-based ISP is also a useful platform to experiment with
ISP algorithms, even if it consumes more CPU. When we'll have GPU
offload support, when do you expect we'll fall back to the CPU ?

> I would rather see us focus on working on a GPU based SoftwareISP and then
> we can add CCM and proper post debay gamma correction there.

I would certainly welcome GPU support :-)

> > Additional processing, such as lens shading correction, defective pixel
> > correction and denoising, are out of scope for CPU processing in my
> > opinion. They could be implemented with a GPU soft ISP though.
> 
> Ack.
Pavel Machek April 21, 2024, 8:29 p.m. UTC | #15
Hi!


> > > The gamma curve needs to go after debayering, and additional histogram
> > > stretching could go there too.
> > > 
> > > There's also a need to add a CCM matrix to the pipeline, multiplying the
> > > RGB values obtained by debayering by a 3x3 matrix before applying the
> > > gamma curve.
> > 
> > I have looked into CCM already, but IMHO that is too heavy to do on
> > the CPU. We might get away with that at the big Intel Cores, but at
> > a significant CPU usage and battery cost.
> 
> Would SIMD help ?

I tried to do debayering using SIMD, and good news is that gcc now has
support so playing with SIMD is easy. Bad news was I was not able to
get speedups... Neither on x86-64 nor on arm64.

(But for matrix multiplication, it might make sense... not
sure). Anyway SIMD is _not_ easy.

Bad news with GPU is that there's quite some overhead with moving data
to and from GPU on my platforms.

Best regards,
								Pavel
Pavel Machek April 21, 2024, 9:33 p.m. UTC | #16
Hi!

> If we apply the per color lookup table (which does black-level compensation +
> color-gain + gamma-curve in 1 step) to the raw bayer data then we either need to
> do this on some extra copy before debayering or we need to do it multiple times
> per pixel before averaging the surrounding pixels, neither one of which is ideal.
> 
> I guess we could make the memcpy to temp buffer behavior done to deal with
> uncached mem mandatory (so always do this) and then first apply the rgb lookups
> to a just copied raw line before debayering, then we still only need to do
> this once per color. And this could also help wrt cache behavior of
> the

I believe that doing extra memcpy will hurt (but black-level during
memcpy will be free)

OTOH doing the pass at convenient time when data is already in L1
cache should be free, too.

Anyway, point to consider. If we are talking about correcting gammas
etc, we'll probably need to switch processing to 16-bit values. Those
sensors are pretty non-linear...

https://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/

Anyway there's ton of work, and it would be good to get the basics
merged, first. For example, for digital photography, it is important
to do statistics and AAA algorithms, but not debayer...

Best regards,
								Pavel
Milan Zamazal April 22, 2024, 11:24 a.m. UTC | #17
Hi,

thank you all for your comments and clarifications.  Let me summarize the
discussion, with my notes merged in, before I get lost in it:

- Since our current black level and color gain operations are linear, we average
  only the same color pixels and the lookup table is applied after averaging the
  pixels, the current debayering implementation is correct.  It doesn't matter
  whether we do "average pixels -> subtract black -> multiply by color gain ->
  apply gamma" or "subtract black -> multiply by color gain -> average pixels ->
  apply gamma".

- This may no longer hold if we change the operations, for example:

- The optimization suggested by Hans (preprocessing while copying lines from the
  input buffer).  It may or may not have a significant impact on the performance
  and may or may not have a significant impact on the image quality.  I think we
  need some experiments here.  Performance on the CPU is generally challenging,
  so if we can demonstrate a performance benefit, we should consider including
  it (most likely configurable), if it doesn't turn the implementation into a
  complete mess.

- Or if we add support for non-linear gains, as mentioned by Pavel.

- Laurent suggested adding a CCM matrix between debayering and gamma to achieve
  a reasonable image quality.  According to Hans, this is a heavy operation on
  CPUs.  Laurent still sees CPU processing as useful for experiments.  I think
  we can make this simply configurable (if it gets implemented).

- If we split the processing to pre-bayer and post-bayer parts, we should
  probably work with uint16_t or float's, which may have impact on performance.

- Pavel couldn't get a better performance by using SIMD CPU instructions for
  debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
  CPU is hard to use and may differ on architectures, so the question is whether
  it's worth to invest into it.

- GPU processing should make many of these things easier and more things
  possible.

- Do we already know how to map the buffers to GPU efficiently?

My conclusions are:

- The current CPU debayering computation itself is fine at the moment and we
  shouldn't change it without a good reason.  It can be replaced in future if
  needed, once we have a better understanding of what and how we can
  realistically achieve.  General cleanups, like moving table computations
  elsewhere, would be still useful already now.

- We can experiment with whatever mentioned above to get the better
  understanding, but:

- GPU processing is clearly a priority.

- We have also other items in the TODO file!  (I already work on some of them.)

- We should probably change the e-mail thread subject.

Thanks,
Milan
Hans de Goede April 22, 2024, 11:38 a.m. UTC | #18
Hi,

On 4/22/24 1:24 PM, Milan Zamazal wrote:
> Hi,
> 
> thank you all for your comments and clarifications.  Let me summarize the
> discussion, with my notes merged in, before I get lost in it:
> 
> - Since our current black level and color gain operations are linear, we average
>   only the same color pixels and the lookup table is applied after averaging the
>   pixels, the current debayering implementation is correct.  It doesn't matter
>   whether we do "average pixels -> subtract black -> multiply by color gain ->
>   apply gamma" or "subtract black -> multiply by color gain -> average pixels ->
>   apply gamma".

This is not entirely true, because of the clamping involved in black-level
subtraction (and the same for clamping at the top for gains), e.g.

Lets say that for the red component we average 2 pixels with a blacklevel
compensation of 8 for all colors and the value of the 2 pixels is:
6 and 10, then if we average first and then do blacklevel compensation the
result of blacklevel correction is 2 pixels with a value of 0 and 2,
which averages to 1.  Where as if we first average we get an average of 8
and then after blacklevel compensation we end up with 0.



>
> - This may no longer hold if we change the operations, for example:
> 
> - The optimization suggested by Hans (preprocessing while copying lines from the
>   input buffer).  It may or may not have a significant impact on the performance
>   and may or may not have a significant impact on the image quality.  I think we
>   need some experiments here.  Performance on the CPU is generally challenging,
>   so if we can demonstrate a performance benefit, we should consider including
>   it (most likely configurable), if it doesn't turn the implementation into a
>   complete mess.

I think that if we want to move to applying blacklevel + gain before
debayering that we then should do this unconditionally, which means also
unconditionally doing the memcpy to a temporary line buffer.

> 
> - Or if we add support for non-linear gains, as mentioned by Pavel.
> 
> - Laurent suggested adding a CCM matrix between debayering and gamma to achieve
>   a reasonable image quality.  According to Hans, this is a heavy operation on
>   CPUs.  Laurent still sees CPU processing as useful for experiments.  I think
>   we can make this simply configurable (if it gets implemented).

I agree that adding a CCM matrix as an optional step sounds interesting
and this should indeed be configurable. This can be done as an optional
(gaurded by a single if per line) post-debayer step run on the output buffer.

> - If we split the processing to pre-bayer and post-bayer parts, we should
>   probably work with uint16_t or float's, which may have impact on performance.
> 
> - Pavel couldn't get a better performance by using SIMD CPU instructions for
>   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
>   CPU is hard to use and may differ on architectures, so the question is whether
>   it's worth to invest into it.
> 
> - GPU processing should make many of these things easier and more things
>   possible.
> 
> - Do we already know how to map the buffers to GPU efficiently?
> 
> My conclusions are:
> 
> - The current CPU debayering computation itself is fine at the moment and we
>   shouldn't change it without a good reason.  It can be replaced in future if
>   needed, once we have a better understanding of what and how we can
>   realistically achieve.  General cleanups, like moving table computations
>   elsewhere, would be still useful already now.
> 
> - We can experiment with whatever mentioned above to get the better
>   understanding, but:
> 
> - GPU processing is clearly a priority.
> 
> - We have also other items in the TODO file!  (I already work on some of them.)
> 
> - We should probably change the e-mail thread subject.

I agree with all of your conclusions :)

Regards,

Hans
Laurent Pinchart April 22, 2024, 12:12 p.m. UTC | #19
Hello,

On Mon, Apr 22, 2024 at 01:38:58PM +0200, Hans de Goede wrote:
> On 4/22/24 1:24 PM, Milan Zamazal wrote:
> > Hi,
> > 
> > thank you all for your comments and clarifications.  Let me summarize the
> > discussion, with my notes merged in, before I get lost in it:
> > 
> > - Since our current black level and color gain operations are linear, we average
> >   only the same color pixels and the lookup table is applied after averaging the
> >   pixels, the current debayering implementation is correct.  It doesn't matter
> >   whether we do "average pixels -> subtract black -> multiply by color gain ->
> >   apply gamma" or "subtract black -> multiply by color gain -> average pixels ->
> >   apply gamma".
> 
> This is not entirely true, because of the clamping involved in black-level
> subtraction (and the same for clamping at the top for gains), e.g.
> 
> Lets say that for the red component we average 2 pixels with a blacklevel
> compensation of 8 for all colors and the value of the 2 pixels is:
> 6 and 10, then if we average first and then do blacklevel compensation the
> result of blacklevel correction is 2 pixels with a value of 0 and 2,
> which averages to 1.  Where as if we first average we get an average of 8
> and then after blacklevel compensation we end up with 0.
> 
> > - This may no longer hold if we change the operations, for example:
> > 
> > - The optimization suggested by Hans (preprocessing while copying lines from the
> >   input buffer).  It may or may not have a significant impact on the performance
> >   and may or may not have a significant impact on the image quality.  I think we
> >   need some experiments here.  Performance on the CPU is generally challenging,
> >   so if we can demonstrate a performance benefit, we should consider including
> >   it (most likely configurable), if it doesn't turn the implementation into a
> >   complete mess.
> 
> I think that if we want to move to applying blacklevel + gain before
> debayering that we then should do this unconditionally, which means also
> unconditionally doing the memcpy to a temporary line buffer.

If we can move it before debayering with acceptable performance, then I
wouldn't make it configurable.

> > - Or if we add support for non-linear gains, as mentioned by Pavel.

Sensors are mostly linear these days. That is, until they reach
saturation of course. I don't think the soft ISP needs a linearization
LUT.

> > - Laurent suggested adding a CCM matrix between debayering and gamma to achieve
> >   a reasonable image quality.  According to Hans, this is a heavy operation on
> >   CPUs.  Laurent still sees CPU processing as useful for experiments.  I think
> >   we can make this simply configurable (if it gets implemented).
> 
> I agree that adding a CCM matrix as an optional step sounds interesting
> and this should indeed be configurable. This can be done as an optional
> (gaurded by a single if per line) post-debayer step run on the output buffer.

Given that the debayering currently produces an RGB value, do you think
an additional multiplication by a 3x3 matrix (ideally accelerated by
SIMD) would be that costly ?

> > - If we split the processing to pre-bayer and post-bayer parts, we should
> >   probably work with uint16_t or float's, which may have impact on performance.
> > 
> > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> >   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
> >   CPU is hard to use and may differ on architectures, so the question is whether
> >   it's worth to invest into it.

Good question :-)

> > - GPU processing should make many of these things easier and more things
> >   possible.
> > 
> > - Do we already know how to map the buffers to GPU efficiently?

I haven't looked into it yet personally.

> > My conclusions are:
> > 
> > - The current CPU debayering computation itself is fine at the moment and we
> >   shouldn't change it without a good reason.  It can be replaced in future if
> >   needed, once we have a better understanding of what and how we can
> >   realistically achieve.  General cleanups, like moving table computations
> >   elsewhere, would be still useful already now.
> > 
> > - We can experiment with whatever mentioned above to get the better
> >   understanding, but:
> > 
> > - GPU processing is clearly a priority.
> > 
> > - We have also other items in the TODO file!  (I already work on some of them.)

On this topic, I'm having a look at how we could use the soft ISP with
the vimc pipeline handler.

> > - We should probably change the e-mail thread subject.

Done :-)

> I agree with all of your conclusions :)
Hans de Goede April 22, 2024, 12:42 p.m. UTC | #20
Hi,

On 4/22/24 2:12 PM, Laurent Pinchart wrote:
> Hello,
> 
> On Mon, Apr 22, 2024 at 01:38:58PM +0200, Hans de Goede wrote:
>> On 4/22/24 1:24 PM, Milan Zamazal wrote:
>>> Hi,
>>>
>>> thank you all for your comments and clarifications.  Let me summarize the
>>> discussion, with my notes merged in, before I get lost in it:
>>>
>>> - Since our current black level and color gain operations are linear, we average
>>>   only the same color pixels and the lookup table is applied after averaging the
>>>   pixels, the current debayering implementation is correct.  It doesn't matter
>>>   whether we do "average pixels -> subtract black -> multiply by color gain ->
>>>   apply gamma" or "subtract black -> multiply by color gain -> average pixels ->
>>>   apply gamma".
>>
>> This is not entirely true, because of the clamping involved in black-level
>> subtraction (and the same for clamping at the top for gains), e.g.
>>
>> Lets say that for the red component we average 2 pixels with a blacklevel
>> compensation of 8 for all colors and the value of the 2 pixels is:
>> 6 and 10, then if we average first and then do blacklevel compensation the
>> result of blacklevel correction is 2 pixels with a value of 0 and 2,
>> which averages to 1.  Where as if we first average we get an average of 8
>> and then after blacklevel compensation we end up with 0.
>>
>>> - This may no longer hold if we change the operations, for example:
>>>
>>> - The optimization suggested by Hans (preprocessing while copying lines from the
>>>   input buffer).  It may or may not have a significant impact on the performance
>>>   and may or may not have a significant impact on the image quality.  I think we
>>>   need some experiments here.  Performance on the CPU is generally challenging,
>>>   so if we can demonstrate a performance benefit, we should consider including
>>>   it (most likely configurable), if it doesn't turn the implementation into a
>>>   complete mess.
>>
>> I think that if we want to move to applying blacklevel + gain before
>> debayering that we then should do this unconditionally, which means also
>> unconditionally doing the memcpy to a temporary line buffer.
> 
> If we can move it before debayering with acceptable performance, then I
> wouldn't make it configurable.
> 
>>> - Or if we add support for non-linear gains, as mentioned by Pavel.
> 
> Sensors are mostly linear these days. That is, until they reach
> saturation of course. I don't think the soft ISP needs a linearization
> LUT.
> 
>>> - Laurent suggested adding a CCM matrix between debayering and gamma to achieve
>>>   a reasonable image quality.  According to Hans, this is a heavy operation on
>>>   CPUs.  Laurent still sees CPU processing as useful for experiments.  I think
>>>   we can make this simply configurable (if it gets implemented).
>>
>> I agree that adding a CCM matrix as an optional step sounds interesting
>> and this should indeed be configurable. This can be done as an optional
>> (gaurded by a single if per line) post-debayer step run on the output buffer.
> 
> Given that the debayering currently produces an RGB value, do you think
> an additional multiplication by a 3x3 matrix (ideally accelerated by
> SIMD) would be that costly ?

Currently we do max 4 additions + 1 divide per pixel. this adds * 9 *
multiplications + 6 adds per pixel. With FHD at 30 FPS that is
~560 million multiplications per second!

And many sensors do way higher resolutions/fps.

Maybe with SIMD we can do some of these in parallel and it certainly
is interesting to try at least on some beefy x86 laptops (or upcoming
beefy ARM laptops) but this is significantly going to hurt performance.

I'm not saying this is not interesting to experiment with, but I have
doubts this is something which we will enable by default anywhere.

Regards,

Hans






> 
>>> - If we split the processing to pre-bayer and post-bayer parts, we should
>>>   probably work with uint16_t or float's, which may have impact on performance.
>>>
>>> - Pavel couldn't get a better performance by using SIMD CPU instructions for
>>>   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
>>>   CPU is hard to use and may differ on architectures, so the question is whether
>>>   it's worth to invest into it.
> 
> Good question :-)
> 
>>> - GPU processing should make many of these things easier and more things
>>>   possible.
>>>
>>> - Do we already know how to map the buffers to GPU efficiently?
> 
> I haven't looked into it yet personally.
> 
>>> My conclusions are:
>>>
>>> - The current CPU debayering computation itself is fine at the moment and we
>>>   shouldn't change it without a good reason.  It can be replaced in future if
>>>   needed, once we have a better understanding of what and how we can
>>>   realistically achieve.  General cleanups, like moving table computations
>>>   elsewhere, would be still useful already now.
>>>
>>> - We can experiment with whatever mentioned above to get the better
>>>   understanding, but:
>>>
>>> - GPU processing is clearly a priority.
>>>
>>> - We have also other items in the TODO file!  (I already work on some of them.)
> 
> On this topic, I'm having a look at how we could use the soft ISP with
> the vimc pipeline handler.
> 
>>> - We should probably change the e-mail thread subject.
> 
> Done :-)
> 
>> I agree with all of your conclusions :)
>
Pavel Machek April 22, 2024, 7:28 p.m. UTC | #21
Hi!

> > > - Or if we add support for non-linear gains, as mentioned by Pavel.
> 
> Sensors are mostly linear these days. That is, until they reach
> saturation of course. I don't think the soft ISP needs a linearization
> LUT.

That's not what Martijn measured:
https://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/
Librem 5 is reasonably modern.

Basically it is linear in 0..70% range, and then it is linear again in
90%..100% range, but at different slope. Take a look, it is fun article.

> > > - If we split the processing to pre-bayer and post-bayer parts, we should
> > >   probably work with uint16_t or float's, which may have impact on performance.
> > > 
> > > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> > >   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
> > >   CPU is hard to use and may differ on architectures, so the question is whether
> > >   it's worth to invest into it.
> 
> Good question :-)

Oh, so good news is you write SIMD code once with gcc intristics, and
gcc does its magic. You don't have to know assembly for that, but it
certainly helps to look at the assembly if it looks reasonable.

(Debayering needs too much of byte shuffling to work well with SIMD
extensions. Matrix multiply could be better, but I'm not sure if 3x3
matrix is big enough to get advantage).

Best regards,
								Pavel
Laurent Pinchart April 22, 2024, 7:47 p.m. UTC | #22
On Mon, Apr 22, 2024 at 09:28:08PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > > - Or if we add support for non-linear gains, as mentioned by Pavel.
> > 
> > Sensors are mostly linear these days. That is, until they reach
> > saturation of course. I don't think the soft ISP needs a linearization
> > LUT.
> 
> That's not what Martijn measured:
> https://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/
> Librem 5 is reasonably modern.
> 
> Basically it is linear in 0..70% range, and then it is linear again in
> 90%..100% range, but at different slope. Take a look, it is fun article.

I did, and I don't reach the same conclusion. Obviously sensors
saturate, so close to saturation there will be non linearities. This is
not something that really requires compensation as far as I understand.
What linearization LUTs are meant to compensate for, unless I'm
mistaken, is non-linearities in the linear part of the dynamic range.

On a side note, I wonder if lens shading could play a role in shape of
the curve in the above article. Central pixels will saturate quicker
than pixels closer to the corners. It would be interesting to make the
same measurements taking only the central part of the image into
account. Another interesting experiment would be to measure the sensor
linearity without the lens (but that will be difficult in many cases
without destroying the camera).

> > > > - If we split the processing to pre-bayer and post-bayer parts, we should
> > > >   probably work with uint16_t or float's, which may have impact on performance.
> > > > 
> > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> > > >   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
> > > >   CPU is hard to use and may differ on architectures, so the question is whether
> > > >   it's worth to invest into it.
> > 
> > Good question :-)
> 
> Oh, so good news is you write SIMD code once with gcc intristics, and
> gcc does its magic. You don't have to know assembly for that, but it
> certainly helps to look at the assembly if it looks reasonable.

There are also potentially interesting helper libraries such as
https://github.com/vectorclass/version2 (I haven't checked the license
compatibility).

> (Debayering needs too much of byte shuffling to work well with SIMD
> extensions. Matrix multiply could be better, but I'm not sure if 3x3
> matrix is big enough to get advantage).
Pavel Machek April 23, 2024, 6:59 a.m. UTC | #23
Hi!

> > > > > - If we split the processing to pre-bayer and post-bayer parts, we should
> > > > >   probably work with uint16_t or float's, which may have impact on performance.
> > > > > 
> > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> > > > >   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
> > > > >   CPU is hard to use and may differ on architectures, so the question is whether
> > > > >   it's worth to invest into it.
> > > 
> > > Good question :-)
> > 
> > Oh, so good news is you write SIMD code once with gcc intristics, and
> > gcc does its magic. You don't have to know assembly for that, but it
> > certainly helps to look at the assembly if it looks reasonable.
> 
> There are also potentially interesting helper libraries such as
> https://github.com/vectorclass/version2 (I haven't checked the license
> compatibility).

Ok, so I played with black level correction a bit and got pleasant
surprise: [Ignore wrong name].

void debayer8(uint8_t *dst, const uint8_t *src)
{
        for (int x = 0; x < (int)WIDTH; x++)  {
                uint8_t v = src[x];
                if (v < 16)
                        dst[x] = 0;
                else
                        dst[x] = v-16;
	}
}

gcc translates it to vector code automatically, and results is only
10% slower than plain memcpy. Test was done on thinkpad x60. If I
disable vector instructions, result is 4x time of plain memcpy. I'm
quite impressed both by vector unit and by the gcc :-).

Best regards,
								Pavel

00001340 <debayer8>:
    1340:       e8 f0 ff ff ff          call   1335 <__x86.get_pc_thunk.dx>
    1345:       81 c2 af 2c 00 00       add    $0x2caf,%edx
    134b:       56                      push   %esi
    134c:       53                      push   %ebx
    134d:       8b 4c 24 0c             mov    0xc(%esp),%ecx
    1351:       8b 5c 24 10             mov    0x10(%esp),%ebx
    1355:       89 c8                   mov    %ecx,%eax
    1357:       8d 73 01                lea    0x1(%ebx),%esi
    135a:       29 f0                   sub    %esi,%eax
    135c:       83 f8 0e                cmp    $0xe,%eax
    135f:       b8 00 00 00 00          mov    $0x0,%eax
    1364:       76 58                   jbe    13be <debayer8+0x7e>
    1366:       66 0f 6f a2 4c e0 ff    movdqa -0x1fb4(%edx),%xmm4
    136d:       ff 
    136e:       66 0f 6f 9a 5c e0 ff    movdqa -0x1fa4(%edx),%xmm3
    1375:       ff 
    1376:       66 0f ef d2             pxor   %xmm2,%xmm2
    137a:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
    1380:       f3 0f 6f 04 03          movdqu (%ebx,%eax,1),%xmm0
    1385:       f3 0f 6f 0c 03          movdqu (%ebx,%eax,1),%xmm1
    138a:       66 0f d8 c4             psubusb %xmm4,%xmm0
    138e:       66 0f fc cb             paddb  %xmm3,%xmm1
    1392:       66 0f 74 c2             pcmpeqb %xmm2,%xmm0
    1396:       66 0f df c1             pandn  %xmm1,%xmm0
    139a:       0f 11 04 01             movups %xmm0,(%ecx,%eax,1)
    139e:       83 c0 10                add    $0x10,%eax
    13a1:       3d 80 07 00 00          cmp    $0x780,%eax
    13a6:       75 d8                   jne    1380 <debayer8+0x40>
    13a8:       5b                      pop    %ebx
    13a9:       5e                      pop    %esi
    13aa:       c3                      ret
    13ab:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
    13af:       90                      nop
    13b0:       c6 04 01 00             movb   $0x0,(%ecx,%eax,1)
    13b4:       83 c0 01                add    $0x1,%eax
    13b7:       3d 80 07 00 00          cmp    $0x780,%eax
    13bc:       74 ea                   je     13a8 <debayer8+0x68>
    13be:       0f b6 14 03             movzbl (%ebx,%eax,1),%edx
    13c2:       80 fa 0f                cmp    $0xf,%dl
    13c5:       76 e9                   jbe    13b0 <debayer8+0x70>
    13c7:       83 ea 10                sub    $0x10,%edx
    13ca:       88 14 01                mov    %dl,(%ecx,%eax,1)
    13cd:       83 c0 01                add    $0x1,%eax
    13d0:       3d 80 07 00 00          cmp    $0x780,%eax
    13d5:       75 e7                   jne    13be <debayer8+0x7e>
    13d7:       eb cf                   jmp    13a8 <debayer8+0x68>
    13d9:       8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
Pavel Machek April 23, 2024, 11:11 a.m. UTC | #24
Hi!

> > > > > > - If we split the processing to pre-bayer and post-bayer parts, we should
> > > > > >   probably work with uint16_t or float's, which may have impact on performance.
> > > > > > 
> > > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> > > > > >   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
> > > > > >   CPU is hard to use and may differ on architectures, so the question is whether
> > > > > >   it's worth to invest into it.
> > > > 
> > > > Good question :-)
> > > 
> > > Oh, so good news is you write SIMD code once with gcc intristics, and
> > > gcc does its magic. You don't have to know assembly for that, but it
> > > certainly helps to look at the assembly if it looks reasonable.
> > 
> > There are also potentially interesting helper libraries such as
> > https://github.com/vectorclass/version2 (I haven't checked the license
> > compatibility).
> 
> Ok, so I played with black level correction a bit and got pleasant
> surprise: [Ignore wrong name].
> 
> void debayer8(uint8_t *dst, const uint8_t *src)
> {
>         for (int x = 0; x < (int)WIDTH; x++)  {
>                 uint8_t v = src[x];
>                 if (v < 16)
>                         dst[x] = 0;
>                 else
>                         dst[x] = v-16;
> 	}
> }
> 
> gcc translates it to vector code automatically, and results is only
> 10% slower than plain memcpy. Test was done on thinkpad x60. If I
> disable vector instructions, result is 4x time of plain memcpy. I'm
> quite impressed both by vector unit and by the gcc :-).

Ok, disassembly below was for different function than benchmark was
running due to inlining, but you got the idea. Code is at

https://gitlab.com/tui/tui/-/blob/master/cam/blacklevel.c?ref_type=heads

if someone wants to play.

I tried to do matrix multiply, and while I do get small improvoement
from "tree-vectorize", it is from 1.05 sec to 0.94 sec... additional
improvement to cca 0.8 sec is possible with "fma". But this is still
many times slower than memcpy(), so I'm not sure if we can get good
performance there.

matmult.c code in same directory.

Best regards,
								Pavel

Patch
diff mbox series

diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h
index 98965fa1..5e38e08b 100644
--- a/include/libcamera/internal/software_isp/debayer_params.h
+++ b/include/libcamera/internal/software_isp/debayer_params.h
@@ -43,6 +43,10 @@  struct DebayerParams {
 	 * \brief Gamma correction, 1.0 is no correction
 	 */
 	float gamma;
+	/**
+	 * \brief Level of the black point, 0..255, 0 is no correction.
+	 */
+	unsigned int blackLevel;
 };
 
 } /* namespace libcamera */
diff --git a/include/libcamera/internal/software_isp/swisp_stats.h b/include/libcamera/internal/software_isp/swisp_stats.h
index afe42c9a..25cd5abd 100644
--- a/include/libcamera/internal/software_isp/swisp_stats.h
+++ b/include/libcamera/internal/software_isp/swisp_stats.h
@@ -7,6 +7,8 @@ 
 
 #pragma once
 
+#include <array>
+
 namespace libcamera {
 
 /**
@@ -28,11 +30,15 @@  struct SwIspStats {
 	/**
 	 * \brief Number of bins in the yHistogram.
 	 */
-	static constexpr unsigned int kYHistogramSize = 16;
+	static constexpr unsigned int kYHistogramSize = 64;
+	/**
+	 * \brief Type of the histogram.
+	 */
+	using histogram = std::array<unsigned int, kYHistogramSize>;
 	/**
 	 * \brief A histogram of luminance values.
 	 */
-	std::array<unsigned int, kYHistogramSize> yHistogram;
+	histogram yHistogram;
 };
 
 } /* namespace libcamera */
diff --git a/src/ipa/simple/black_level.cpp b/src/ipa/simple/black_level.cpp
new file mode 100644
index 00000000..9d4aa800
--- /dev/null
+++ b/src/ipa/simple/black_level.cpp
@@ -0,0 +1,88 @@ 
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+/*
+ * Copyright (C) 2024, Red Hat Inc.
+ *
+ * black_level.cpp - black level handling
+ */
+
+#include "black_level.h"
+
+#include <numeric>
+
+#include <libcamera/base/log.h>
+
+namespace libcamera {
+
+LOG_DEFINE_CATEGORY(IPASoftBL)
+
+/**
+ * \class BlackLevel
+ * \brief Object providing black point level for software ISP
+ *
+ * Black level can be provided in hardware tuning files or, if no tuning file is
+ * available for the given hardware, guessed automatically, with less accuracy.
+ * As tuning files are not yet implemented for software ISP, BlackLevel
+ * currently provides only guessed black levels.
+ *
+ * This class serves for tracking black level as a property of the underlying
+ * hardware, not as means of enhancing a particular scene or image.
+ *
+ * The class is supposed to be instantiated for the given camera stream.
+ * The black level can be retrieved using BlackLevel::get() method. It is
+ * initially 0 and may change when updated using BlackLevel::update() method.
+ */
+
+BlackLevel::BlackLevel()
+	: blackLevel_(255), blackLevelSet_(false)
+{
+}
+
+/**
+ * \brief Return the current black level
+ *
+ * \return The black level, in the range from 0 (minimum) to 255 (maximum).
+ * If the black level couldn't be determined yet, return 0.
+ */
+unsigned int BlackLevel::get() const
+{
+	return blackLevelSet_ ? blackLevel_ : 0;
+}
+
+/**
+ * \brief Update black level from the provided histogram
+ * \param[in] yHistogram The histogram to be used for updating black level
+ *
+ * The black level is property of the given hardware, not image. It is updated
+ * only if it has not been yet set or if it is lower than the lowest value seen
+ * so far.
+ */
+void BlackLevel::update(SwIspStats::histogram &yHistogram)
+{
+	/*
+	 * The constant is selected to be "good enough", not overly conservative or
+	 * aggressive. There is no magic about the given value.
+	 */
+	constexpr float ignoredPercentage_ = 0.02;
+	const unsigned int total =
+		std::accumulate(begin(yHistogram), end(yHistogram), 0);
+	const unsigned int pixelThreshold = ignoredPercentage_ * total;
+	const unsigned int histogramRatio = 256 / SwIspStats::kYHistogramSize;
+	const unsigned int currentBlackIdx = blackLevel_ / histogramRatio;
+
+	for (unsigned int i = 0, seen = 0;
+	     i < currentBlackIdx && i < SwIspStats::kYHistogramSize;
+	     i++) {
+		seen += yHistogram[i];
+		if (seen >= pixelThreshold) {
+			blackLevel_ = i * histogramRatio;
+			blackLevelSet_ = true;
+			LOG(IPASoftBL, Debug)
+				<< "Auto-set black level: "
+				<< i << "/" << SwIspStats::kYHistogramSize
+				<< " (" << 100 * (seen - yHistogram[i]) / total << "% below, "
+				<< 100 * seen / total << "% at or below)";
+			break;
+		}
+	};
+}
+} // namespace libcamera
diff --git a/src/ipa/simple/black_level.h b/src/ipa/simple/black_level.h
new file mode 100644
index 00000000..b3785db0
--- /dev/null
+++ b/src/ipa/simple/black_level.h
@@ -0,0 +1,28 @@ 
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+/*
+ * Copyright (C) 2024, Red Hat Inc.
+ *
+ * black_level.h - black level handling
+ */
+
+#pragma once
+
+#include <array>
+
+#include "libcamera/internal/software_isp/swisp_stats.h"
+
+namespace libcamera {
+
+class BlackLevel
+{
+public:
+	BlackLevel();
+	unsigned int get() const;
+	void update(std::array<unsigned int, SwIspStats::kYHistogramSize> &yHistogram);
+
+private:
+	unsigned int blackLevel_;
+	bool blackLevelSet_;
+};
+
+} // namespace libcamera
diff --git a/src/ipa/simple/meson.build b/src/ipa/simple/meson.build
index 3e863db7..44b5f1d7 100644
--- a/src/ipa/simple/meson.build
+++ b/src/ipa/simple/meson.build
@@ -2,8 +2,13 @@ 
 
 ipa_name = 'ipa_soft_simple'
 
+soft_simple_sources = files([
+    'soft_simple.cpp',
+    'black_level.cpp',
+])
+
 mod = shared_module(ipa_name,
-                    ['soft_simple.cpp', libcamera_generated_ipa_headers],
+                    [soft_simple_sources, libcamera_generated_ipa_headers],
                     name_prefix : '',
                     include_directories : [ipa_includes, libipa_includes],
                     dependencies : libcamera_private,
diff --git a/src/ipa/simple/soft_simple.cpp b/src/ipa/simple/soft_simple.cpp
index 312df4ba..ac027568 100644
--- a/src/ipa/simple/soft_simple.cpp
+++ b/src/ipa/simple/soft_simple.cpp
@@ -22,6 +22,8 @@ 
 #include "libcamera/internal/software_isp/debayer_params.h"
 #include "libcamera/internal/software_isp/swisp_stats.h"
 
+#include "black_level.h"
+
 namespace libcamera {
 
 LOG_DEFINE_CATEGORY(IPASoft)
@@ -33,7 +35,8 @@  class IPASoftSimple : public ipa::soft::IPASoftInterface
 public:
 	IPASoftSimple()
 		: params_(static_cast<DebayerParams *>(MAP_FAILED)),
-		  stats_(static_cast<SwIspStats *>(MAP_FAILED)), ignore_updates_(0)
+		  stats_(static_cast<SwIspStats *>(MAP_FAILED)),
+		  blackLevel_(BlackLevel()), ignore_updates_(0)
 	{
 	}
 
@@ -63,6 +66,7 @@  private:
 	SharedFD fdParams_;
 	DebayerParams *params_;
 	SwIspStats *stats_;
+	BlackLevel blackLevel_;
 
 	int32_t exposure_min_, exposure_max_;
 	int32_t again_min_, again_max_;
@@ -196,6 +200,10 @@  void IPASoftSimple::processStats(const ControlList &sensorControls)
 	params_->gainG = 256;
 	params_->gamma = 0.5;
 
+	if (ignore_updates_ > 0)
+		blackLevel_.update(stats_->yHistogram);
+	params_->blackLevel = blackLevel_.get();
+
 	setIspParams.emit(0);
 
 	/*
@@ -211,18 +219,19 @@  void IPASoftSimple::processStats(const ControlList &sensorControls)
 	 * Calculate Mean Sample Value (MSV) according to formula from:
 	 * https://www.araa.asn.au/acra/acra2007/papers/paper84final.pdf
 	 */
-	constexpr unsigned int yHistValsPerBin =
-		SwIspStats::kYHistogramSize / kExposureBinsCount;
-	constexpr unsigned int yHistValsPerBinMod =
-		SwIspStats::kYHistogramSize /
-		(SwIspStats::kYHistogramSize % kExposureBinsCount + 1);
+	const unsigned int blackLevelHistIdx =
+		params_->blackLevel / (256 / SwIspStats::kYHistogramSize);
+	const unsigned int histogramSize = SwIspStats::kYHistogramSize - blackLevelHistIdx;
+	const unsigned int yHistValsPerBin = histogramSize / kExposureBinsCount;
+	const unsigned int yHistValsPerBinMod =
+		histogramSize / (histogramSize % kExposureBinsCount + 1);
 	int ExposureBins[kExposureBinsCount] = {};
 	unsigned int denom = 0;
 	unsigned int num = 0;
 
-	for (unsigned int i = 0; i < SwIspStats::kYHistogramSize; i++) {
+	for (unsigned int i = 0; i < histogramSize; i++) {
 		unsigned int idx = (i - (i / yHistValsPerBinMod)) / yHistValsPerBin;
-		ExposureBins[idx] += stats_->yHistogram[i];
+		ExposureBins[idx] += stats_->yHistogram[blackLevelHistIdx + i];
 	}
 
 	for (unsigned int i = 0; i < kExposureBinsCount; i++) {
@@ -256,7 +265,8 @@  void IPASoftSimple::processStats(const ControlList &sensorControls)
 
 	LOG(IPASoft, Debug) << "exposureMSV " << exposureMSV
 			    << " exp " << exposure_ << " again " << again_
-			    << " gain R/B " << params_->gainR << "/" << params_->gainB;
+			    << " gain R/B " << params_->gainR << "/" << params_->gainB
+			    << " black level " << params_->blackLevel;
 }
 
 void IPASoftSimple::updateExposure(double exposureMSV)
diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
index a1692693..3be3cdfe 100644
--- a/src/libcamera/software_isp/debayer_cpu.cpp
+++ b/src/libcamera/software_isp/debayer_cpu.cpp
@@ -35,7 +35,7 @@  namespace libcamera {
  * \param[in] stats Pointer to the stats object to use.
  */
 DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats)
-	: stats_(std::move(stats)), gamma_correction_(1.0)
+	: stats_(std::move(stats)), gamma_correction_(1.0), blackLevel_(0)
 {
 #ifdef __x86_64__
 	enableInputMemcpy_ = false;
@@ -683,11 +683,16 @@  void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams
 	}
 
 	/* Apply DebayerParams */
-	if (params.gamma != gamma_correction_) {
-		for (unsigned int i = 0; i < kGammaLookupSize; i++)
-			gamma_[i] = UINT8_MAX * powf(i / (kGammaLookupSize - 1.0), params.gamma);
+	if (params.gamma != gamma_correction_ || params.blackLevel != blackLevel_) {
+		const unsigned int blackIndex =
+			params.blackLevel * kGammaLookupSize / 256;
+		std::fill(gamma_.begin(), gamma_.begin() + blackIndex, 0);
+		const float divisor = kGammaLookupSize - blackIndex - 1.0;
+		for (unsigned int i = blackIndex; i < kGammaLookupSize; i++)
+			gamma_[i] = UINT8_MAX * powf((i - blackIndex) / divisor, params.gamma);
 
 		gamma_correction_ = params.gamma;
+		blackLevel_ = params.blackLevel;
 	}
 
 	if (swapRedBlueGains_)
diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
index 5f44fc65..ea02f909 100644
--- a/src/libcamera/software_isp/debayer_cpu.h
+++ b/src/libcamera/software_isp/debayer_cpu.h
@@ -147,6 +147,7 @@  private:
 	bool enableInputMemcpy_;
 	bool swapRedBlueGains_;
 	float gamma_correction_;
+	unsigned int blackLevel_;
 	unsigned int measuredFrames_;
 	int64_t frameProcessTime_;
 	/* Skip 30 frames for things to stabilize then measure 30 frames */
diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp
index 388b4496..9b49be41 100644
--- a/src/libcamera/software_isp/software_isp.cpp
+++ b/src/libcamera/software_isp/software_isp.cpp
@@ -64,7 +64,7 @@  LOG_DEFINE_CATEGORY(SoftwareIsp)
  */
 SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const ControlInfoMap &sensorControls)
 	: debayer_(nullptr),
-	  debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f },
+	  debayerParams_{ DebayerParams::kGain10, DebayerParams::kGain10, DebayerParams::kGain10, 0.5f, 0 },
 	  dmaHeap_(DmaHeap::DmaHeapFlag::Cma | DmaHeap::DmaHeapFlag::System)
 {
 	if (!dmaHeap_.isValid()) {