[4/5] libipa: histogram: Fix interQuantileMean() for small ranges
diff mbox series

Message ID 20250324170803.103296-5-stefan.klug@ideasonboard.com
State New
Headers show
Series
  • Fix histogram for some (corner) cases
Related show

Commit Message

Stefan Klug March 24, 2025, 5:07 p.m. UTC
The interQuantileMean() is supposed to return a weighted mean value
between two quantiles. This works for reasonably fine histograms, but
fails for coarse histograms and small quantile ranges because the weight
is always taken from the lower border of the bin.

Fix that by rewriting the algorithm to calculate a lower and upper bound
for every (partial) bin that goes into the mean calculation and weight
the bins by the middle of these bounds.

Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
---
 src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Kieran Bingham March 31, 2025, 6:12 p.m. UTC | #1
Quoting Stefan Klug (2025-03-24 17:07:39)
> The interQuantileMean() is supposed to return a weighted mean value
> between two quantiles. This works for reasonably fine histograms, but

reasonably 'for' fine histograms ...

> fails for coarse histograms and small quantile ranges because the weight
> is always taken from the lower border of the bin.
> 
> Fix that by rewriting the algorithm to calculate a lower and upper bound
> for every (partial) bin that goes into the mean calculation and weight
> the bins by the middle of these bounds.
> 
> Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>

I haven't given this the attention and checks through all paths here
that it would deserve for a full RB tag, but because you've added tests
that validate it, which I'll put my faith in ...

Acked-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

> ---
>  src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
> index c19a4cbbf3cd..31f017af3458 100644
> --- a/src/ipa/libipa/histogram.cpp
> +++ b/src/ipa/libipa/histogram.cpp
> @@ -153,22 +153,24 @@ double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
>         double lowPoint = quantile(lowQuantile);
>         /* Proportion of pixels which lies below highQuantile */
>         double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> -       double sumBinFreq = 0, cumulFreq = 0;
> +       double sumBinFreq = 0;
> +       double cumulFreq = 0;
> +
> +       for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {
> +               double lowBound = std::max(static_cast<double>(bin), lowPoint);
> +               double highBound = std::min(static_cast<double>(bin + 1), highPoint);
>  
> -       for (double p_next = floor(lowPoint) + 1.0;
> -            p_next <= ceil(highPoint);
> -            lowPoint = p_next, p_next += 1.0) {
> -               int bin = floor(lowPoint);
>                 double freq = (cumulative_[bin + 1] - cumulative_[bin])
> -                       * (std::min(p_next, highPoint) - lowPoint);
> +                       * (highBound - lowBound);
>  
>                 /* Accumulate weighted bin */
> -               sumBinFreq += bin * freq;
> +               sumBinFreq += 0.5 * (highBound + lowBound) * freq;
> +
>                 /* Accumulate weights */
>                 cumulFreq += freq;
>         }
> -       /* add 0.5 to give an average for bin mid-points */
> -       return sumBinFreq / cumulFreq + 0.5;
> +
> +       return sumBinFreq / cumulFreq;
>  }
>  
>  } /* namespace ipa */
> -- 
> 2.43.0
>
Laurent Pinchart April 1, 2025, 12:02 a.m. UTC | #2
Hi Stefan,

Thank you for the patch.

On Mon, Mar 24, 2025 at 06:07:39PM +0100, Stefan Klug wrote:
> The interQuantileMean() is supposed to return a weighted mean value
> between two quantiles. This works for reasonably fine histograms, but
> fails for coarse histograms and small quantile ranges because the weight
> is always taken from the lower border of the bin.
> 
> Fix that by rewriting the algorithm to calculate a lower and upper bound
> for every (partial) bin that goes into the mean calculation and weight
> the bins by the middle of these bounds.
> 
> Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
> ---
>  src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
> index c19a4cbbf3cd..31f017af3458 100644
> --- a/src/ipa/libipa/histogram.cpp
> +++ b/src/ipa/libipa/histogram.cpp
> @@ -153,22 +153,24 @@ double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
>  	double lowPoint = quantile(lowQuantile);
>  	/* Proportion of pixels which lies below highQuantile */
>  	double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));

Those two variables can now be const. You can write

	ASSERT(highQuantile > lowQuantile);
	
	/* Proportion of pixels which lies below lowQuantile and highQuantile. */
	const double lowPoint = quantile(lowQuantile);
        const double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));

> -	double sumBinFreq = 0, cumulFreq = 0;
> +	double sumBinFreq = 0;
> +	double cumulFreq = 0;
> +

Let's document the algorithm (and see if I understand it correctly :-)).

	/*
	 * Calculate the mean pixel value between the low and high points by
	 * summing all the pixels between the two points, and dividing the sum
	 * by the number of pixels. Given the discrete nature of the histogram
	 * data, the sum of the pixels is approximated by accummulating the
	 * product of the bin values (calculated as the mid point of the bin) by
	 * the number of pixels they contain, for each bin in the internal.
	 */

> +	for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {

It looks like bin can be unsigned.

> +		double lowBound = std::max(static_cast<double>(bin), lowPoint);

I think you can also write

		double lowBound = std::max<double>(bin, lowPoint);

Same for the next line. Up to you. Oh, and you can make them const too.

> +		double highBound = std::min(static_cast<double>(bin + 1), highPoint);

If I understand the code correctly, this is only meaningful for the
first and last iterations. I can't easily find a better construct that
wouldn't need to be run for each iteration, so this seems fine.

>  
> -	for (double p_next = floor(lowPoint) + 1.0;
> -	     p_next <= ceil(highPoint);
> -	     lowPoint = p_next, p_next += 1.0) {
> -		int bin = floor(lowPoint);
>  		double freq = (cumulative_[bin + 1] - cumulative_[bin])
> -			* (std::min(p_next, highPoint) - lowPoint);
> +			* (highBound - lowBound);

	 	/*
		 * The low and high quantile may not lie at bin boundaries, so
		 * the first and last bins need to be weighted accordingly. The
		 * best available approximation is to multiply the number of
		 * pixels by the partial bin width.
		 */
		const double freq = (cumulative_[bin + 1] - cumulative_[bin])
				  * (highBound - lowBound);

>  
>  		/* Accumulate weighted bin */
> -		sumBinFreq += bin * freq;
> +		sumBinFreq += 0.5 * (highBound + lowBound) * freq;

I wondered for a moment where the 0.5 came from. I think

		sumBinFreq += (highBound + lowBound) / 2 * freq;

would better reflect the intent.

> +
>  		/* Accumulate weights */
>  		cumulFreq += freq;

I wonder if we should rename sumBinFreq to sumPixelValues and numPixels.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

>  	}
> -	/* add 0.5 to give an average for bin mid-points */
> -	return sumBinFreq / cumulFreq + 0.5;
> +
> +	return sumBinFreq / cumulFreq;
>  }
>  
>  } /* namespace ipa */
Stefan Klug April 1, 2025, 10:38 a.m. UTC | #3
Hi Laurent,

Thank you for the review. 

On Tue, Apr 01, 2025 at 03:02:14AM +0300, Laurent Pinchart wrote:
> Hi Stefan,
> 
> Thank you for the patch.
> 
> On Mon, Mar 24, 2025 at 06:07:39PM +0100, Stefan Klug wrote:
> > The interQuantileMean() is supposed to return a weighted mean value
> > between two quantiles. This works for reasonably fine histograms, but
> > fails for coarse histograms and small quantile ranges because the weight
> > is always taken from the lower border of the bin.
> > 
> > Fix that by rewriting the algorithm to calculate a lower and upper bound
> > for every (partial) bin that goes into the mean calculation and weight
> > the bins by the middle of these bounds.
> > 
> > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
> > ---
> >  src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
> > index c19a4cbbf3cd..31f017af3458 100644
> > --- a/src/ipa/libipa/histogram.cpp
> > +++ b/src/ipa/libipa/histogram.cpp
> > @@ -153,22 +153,24 @@ double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
> >  	double lowPoint = quantile(lowQuantile);
> >  	/* Proportion of pixels which lies below highQuantile */
> >  	double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> 
> Those two variables can now be const. You can write

Does that technically help in any way? On compact algorithms I didn't
think about putting const anywhere. Anyways I added it.

> 
> 	ASSERT(highQuantile > lowQuantile);
> 	
> 	/* Proportion of pixels which lies below lowQuantile and highQuantile. */
> 	const double lowPoint = quantile(lowQuantile);
>         const double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> 
> > -	double sumBinFreq = 0, cumulFreq = 0;
> > +	double sumBinFreq = 0;
> > +	double cumulFreq = 0;
> > +
> 
> Let's document the algorithm (and see if I understand it correctly :-)).
> 
> 	/*
> 	 * Calculate the mean pixel value between the low and high points by
> 	 * summing all the pixels between the two points, and dividing the sum
> 	 * by the number of pixels. Given the discrete nature of the histogram
> 	 * data, the sum of the pixels is approximated by accummulating the
> 	 * product of the bin values (calculated as the mid point of the bin) by
> 	 * the number of pixels they contain, for each bin in the internal.
> 	 */

That nicely summarizes it. And actually it took me quite a while to
understand the algorithm. So that really helps. Thanks.

> 
> > +	for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {
> 
> It looks like bin can be unsigned.

I don't like unsigned :-) ... anyways, changed it.

> 
> > +		double lowBound = std::max(static_cast<double>(bin), lowPoint);
> 
> I think you can also write
> 
> 		double lowBound = std::max<double>(bin, lowPoint);

Oh yes. that looks way nicer.

> 
> Same for the next line. Up to you. Oh, and you can make them const too.
> 
> > +		double highBound = std::min(static_cast<double>(bin + 1), highPoint);
> 
> If I understand the code correctly, this is only meaningful for the
> first and last iterations. I can't easily find a better construct that
> wouldn't need to be run for each iteration, so this seems fine.
> 
> >  
> > -	for (double p_next = floor(lowPoint) + 1.0;
> > -	     p_next <= ceil(highPoint);
> > -	     lowPoint = p_next, p_next += 1.0) {
> > -		int bin = floor(lowPoint);
> >  		double freq = (cumulative_[bin + 1] - cumulative_[bin])
> > -			* (std::min(p_next, highPoint) - lowPoint);
> > +			* (highBound - lowBound);
> 
> 	 	/*
> 		 * The low and high quantile may not lie at bin boundaries, so
> 		 * the first and last bins need to be weighted accordingly. The
> 		 * best available approximation is to multiply the number of
> 		 * pixels by the partial bin width.
> 		 */
> 		const double freq = (cumulative_[bin + 1] - cumulative_[bin])
> 				  * (highBound - lowBound);
> 
> >  
> >  		/* Accumulate weighted bin */
> > -		sumBinFreq += bin * freq;
> > +		sumBinFreq += 0.5 * (highBound + lowBound) * freq;
> 
> I wondered for a moment where the 0.5 came from. I think
> 
> 		sumBinFreq += (highBound + lowBound) / 2 * freq;
> 
> would better reflect the intent.
> 
> > +
> >  		/* Accumulate weights */
> >  		cumulFreq += freq;
> 
> I wonder if we should rename sumBinFreq to sumPixelValues and numPixels.

Depends on the background. The math people only talk about frequency and
we even have a cumulativeFrequency() function. In the docs we often use
pixels as that is mostly what we count.

I left it as is, as that's not wrong either.

> 
> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

Thank you!

Best regards,
Stefan

> 
> >  	}
> > -	/* add 0.5 to give an average for bin mid-points */
> > -	return sumBinFreq / cumulFreq + 0.5;
> > +
> > +	return sumBinFreq / cumulFreq;
> >  }
> >  
> >  } /* namespace ipa */
> 
> -- 
> Regards,
> 
> Laurent Pinchart
Laurent Pinchart April 1, 2025, 1:22 p.m. UTC | #4
On Tue, Apr 01, 2025 at 12:38:52PM +0200, Stefan Klug wrote:
> On Tue, Apr 01, 2025 at 03:02:14AM +0300, Laurent Pinchart wrote:
> > On Mon, Mar 24, 2025 at 06:07:39PM +0100, Stefan Klug wrote:
> > > The interQuantileMean() is supposed to return a weighted mean value
> > > between two quantiles. This works for reasonably fine histograms, but
> > > fails for coarse histograms and small quantile ranges because the weight
> > > is always taken from the lower border of the bin.
> > > 
> > > Fix that by rewriting the algorithm to calculate a lower and upper bound
> > > for every (partial) bin that goes into the mean calculation and weight
> > > the bins by the middle of these bounds.
> > > 
> > > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
> > > ---
> > >  src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
> > >  1 file changed, 11 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
> > > index c19a4cbbf3cd..31f017af3458 100644
> > > --- a/src/ipa/libipa/histogram.cpp
> > > +++ b/src/ipa/libipa/histogram.cpp
> > > @@ -153,22 +153,24 @@ double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
> > >  	double lowPoint = quantile(lowQuantile);
> > >  	/* Proportion of pixels which lies below highQuantile */
> > >  	double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> > 
> > Those two variables can now be const. You can write
> 
> Does that technically help in any way? On compact algorithms I didn't
> think about putting const anywhere. Anyways I added it.

It's not mandatory. I think I like it because it shows what variables
are not meant to be modified, and I can classify such variable
declarations in my mind as aliases when reading the code.

> > 
> > 	ASSERT(highQuantile > lowQuantile);
> > 	
> > 	/* Proportion of pixels which lies below lowQuantile and highQuantile. */
> > 	const double lowPoint = quantile(lowQuantile);
> >         const double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> > 
> > > -	double sumBinFreq = 0, cumulFreq = 0;
> > > +	double sumBinFreq = 0;
> > > +	double cumulFreq = 0;
> > > +
> > 
> > Let's document the algorithm (and see if I understand it correctly :-)).
> > 
> > 	/*
> > 	 * Calculate the mean pixel value between the low and high points by
> > 	 * summing all the pixels between the two points, and dividing the sum
> > 	 * by the number of pixels. Given the discrete nature of the histogram
> > 	 * data, the sum of the pixels is approximated by accummulating the
> > 	 * product of the bin values (calculated as the mid point of the bin) by
> > 	 * the number of pixels they contain, for each bin in the internal.
> > 	 */
> 
> That nicely summarizes it. And actually it took me quite a while to
> understand the algorithm. So that really helps. Thanks.

It took me way too long to write those few lines :-)

> > 
> > > +	for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {
> > 
> > It looks like bin can be unsigned.
> 
> I don't like unsigned :-) ... anyways, changed it.
> 
> > 
> > > +		double lowBound = std::max(static_cast<double>(bin), lowPoint);
> > 
> > I think you can also write
> > 
> > 		double lowBound = std::max<double>(bin, lowPoint);
> 
> Oh yes. that looks way nicer.
> 
> > 
> > Same for the next line. Up to you. Oh, and you can make them const too.
> > 
> > > +		double highBound = std::min(static_cast<double>(bin + 1), highPoint);
> > 
> > If I understand the code correctly, this is only meaningful for the
> > first and last iterations. I can't easily find a better construct that
> > wouldn't need to be run for each iteration, so this seems fine.
> > 
> > >  
> > > -	for (double p_next = floor(lowPoint) + 1.0;
> > > -	     p_next <= ceil(highPoint);
> > > -	     lowPoint = p_next, p_next += 1.0) {
> > > -		int bin = floor(lowPoint);
> > >  		double freq = (cumulative_[bin + 1] - cumulative_[bin])
> > > -			* (std::min(p_next, highPoint) - lowPoint);
> > > +			* (highBound - lowBound);
> > 
> > 	 	/*
> > 		 * The low and high quantile may not lie at bin boundaries, so
> > 		 * the first and last bins need to be weighted accordingly. The
> > 		 * best available approximation is to multiply the number of
> > 		 * pixels by the partial bin width.
> > 		 */
> > 		const double freq = (cumulative_[bin + 1] - cumulative_[bin])
> > 				  * (highBound - lowBound);
> > 
> > >  
> > >  		/* Accumulate weighted bin */
> > > -		sumBinFreq += bin * freq;
> > > +		sumBinFreq += 0.5 * (highBound + lowBound) * freq;
> > 
> > I wondered for a moment where the 0.5 came from. I think
> > 
> > 		sumBinFreq += (highBound + lowBound) / 2 * freq;
> > 
> > would better reflect the intent.
> > 
> > > +
> > >  		/* Accumulate weights */
> > >  		cumulFreq += freq;
> > 
> > I wonder if we should rename sumBinFreq to sumPixelValues and numPixels.
> 
> Depends on the background. The math people only talk about frequency and
> we even have a cumulativeFrequency() function. In the docs we often use
> pixels as that is mostly what we count.
> 
> I left it as is, as that's not wrong either.
> 
> > Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> 
> Thank you!
> 
> > >  	}
> > > -	/* add 0.5 to give an average for bin mid-points */
> > > -	return sumBinFreq / cumulFreq + 0.5;
> > > +
> > > +	return sumBinFreq / cumulFreq;
> > >  }
> > >  
> > >  } /* namespace ipa */

Patch
diff mbox series

diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
index c19a4cbbf3cd..31f017af3458 100644
--- a/src/ipa/libipa/histogram.cpp
+++ b/src/ipa/libipa/histogram.cpp
@@ -153,22 +153,24 @@  double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
 	double lowPoint = quantile(lowQuantile);
 	/* Proportion of pixels which lies below highQuantile */
 	double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
-	double sumBinFreq = 0, cumulFreq = 0;
+	double sumBinFreq = 0;
+	double cumulFreq = 0;
+
+	for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {
+		double lowBound = std::max(static_cast<double>(bin), lowPoint);
+		double highBound = std::min(static_cast<double>(bin + 1), highPoint);
 
-	for (double p_next = floor(lowPoint) + 1.0;
-	     p_next <= ceil(highPoint);
-	     lowPoint = p_next, p_next += 1.0) {
-		int bin = floor(lowPoint);
 		double freq = (cumulative_[bin + 1] - cumulative_[bin])
-			* (std::min(p_next, highPoint) - lowPoint);
+			* (highBound - lowBound);
 
 		/* Accumulate weighted bin */
-		sumBinFreq += bin * freq;
+		sumBinFreq += 0.5 * (highBound + lowBound) * freq;
+
 		/* Accumulate weights */
 		cumulFreq += freq;
 	}
-	/* add 0.5 to give an average for bin mid-points */
-	return sumBinFreq / cumulFreq + 0.5;
+
+	return sumBinFreq / cumulFreq;
 }
 
 } /* namespace ipa */