[3/5] software_isp: debayer_cpu: Group innerloop variables together
diff mbox series

Message ID 20260216190204.106922-4-johannes.goede@oss.qualcomm.com
State Superseded
Headers show
Series
  • software_isp: debayer_cpu: Add multi-threading support
Related show

Commit Message

Hans de Goede Feb. 16, 2026, 7:02 p.m. UTC
Group variables used every pixel together, followed by variables used
every lines and then lastly variables only used every frame.

The idea here is to have all the data used every pixel fit in as few
cachelines as possible.

Benchmarking does not show any differerence before after, possibly
because most of the per pixel lookup tables where already grouped
together.

Despite that this still seems like a good idea.

Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
---
 src/libcamera/software_isp/debayer_cpu.h | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Milan Zamazal Feb. 17, 2026, 9:33 p.m. UTC | #1
Hi Hans,

thank you for the patch.

Hans de Goede <johannes.goede@oss.qualcomm.com> writes:

> Group variables used every pixel together, followed by variables used
> every lines and then lastly variables only used every frame.
>
> The idea here is to have all the data used every pixel fit in as few
> cachelines as possible.
>
> Benchmarking does not show any differerence before after, possibly
> because most of the per pixel lookup tables where already grouped
> together.

The struct layout may be indeed critical, IIRC I got ~10% penalty when I
had to switch the lookup table arrangement to fix wrong CCM
multiplication order.  Maybe there is a hidden opportunity for
significant speedup somewhere.  But this is a different problem than
what this patch solves.

With the typo below fixed:

Reviewed-by: Milan Zamazal <mzamazal@redhat.com>

> Despite that this still seems like a good idea.
>
> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
> ---
>  src/libcamera/software_isp/debayer_cpu.h | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
> index 800b018c..a54418dc 100644
> --- a/src/libcamera/software_isp/debayer_cpu.h
> +++ b/src/libcamera/software_isp/debayer_cpu.h
> @@ -135,6 +135,7 @@ private:
>  	};
>  	using LookupTable = std::array<uint8_t, kRGBLookupSize>;
>  	using CcmLookupTable = std::array<CcmColumn, kRGBLookupSize>;
> +	/* Variables used every pixel */
>  	LookupTable red_;
>  	LookupTable green_;
>  	LookupTable blue_;
> @@ -143,24 +144,26 @@ private:
>  	CcmLookupTable blueCcm_;
>  	std::array<double, kGammaLookupSize> gammaTable_;
>  	LookupTable gammaLut_;
> -	bool ccmEnabled_;
> -	DebayerParams params_;
> -	SwIspStats statsBuffer_;
> +	Rectangle window_;
>  
> +	/* Variables used every line */
> +	SwIspStats statsBuffer_;
>  	debayerFn debayer0_;
>  	debayerFn debayer1_;
>  	debayerFn debayer2_;
>  	debayerFn debayer3_;
> -	Rectangle window_;
>  	std::unique_ptr<SwStatsCpu> stats_;
>  	unsigned int lineBufferLength_;
>  	unsigned int lineBufferPadding_;
>  	unsigned int xShift_; /* Offset of 0/1 applied to window_.x */
>  	bool enableInputMemcpy_;
> -
>  	static constexpr unsigned int kMaxThreads = 4;
>  	struct DebayerCpuThreadData threadData_[kMaxThreads];
> +
> +	/* variables used every frame */

s/variables/Variables/

>  	unsigned int threadCount_;
> +	bool ccmEnabled_;
> +	DebayerParams params_;
>  };
>  
>  } /* namespace libcamera */
Laurent Pinchart Feb. 19, 2026, 2:16 p.m. UTC | #2
On Mon, Feb 16, 2026 at 08:02:02PM +0100, Hans de Goede wrote:
> Group variables used every pixel together, followed by variables used
> every lines and then lastly variables only used every frame.
> 
> The idea here is to have all the data used every pixel fit in as few
> cachelines as possible.
> 
> Benchmarking does not show any differerence before after, possibly
> because most of the per pixel lookup tables where already grouped
> together.
> 
> Despite that this still seems like a good idea.
> 
> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
> ---
>  src/libcamera/software_isp/debayer_cpu.h | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
> index 800b018c..a54418dc 100644
> --- a/src/libcamera/software_isp/debayer_cpu.h
> +++ b/src/libcamera/software_isp/debayer_cpu.h
> @@ -135,6 +135,7 @@ private:
>  	};
>  	using LookupTable = std::array<uint8_t, kRGBLookupSize>;
>  	using CcmLookupTable = std::array<CcmColumn, kRGBLookupSize>;
> +	/* Variables used every pixel */
>  	LookupTable red_;
>  	LookupTable green_;
>  	LookupTable blue_;
> @@ -143,24 +144,26 @@ private:
>  	CcmLookupTable blueCcm_;
>  	std::array<double, kGammaLookupSize> gammaTable_;
>  	LookupTable gammaLut_;
> -	bool ccmEnabled_;
> -	DebayerParams params_;
> -	SwIspStats statsBuffer_;
> +	Rectangle window_;
>  
> +	/* Variables used every line */
> +	SwIspStats statsBuffer_;
>  	debayerFn debayer0_;
>  	debayerFn debayer1_;
>  	debayerFn debayer2_;
>  	debayerFn debayer3_;
> -	Rectangle window_;
>  	std::unique_ptr<SwStatsCpu> stats_;
>  	unsigned int lineBufferLength_;
>  	unsigned int lineBufferPadding_;
>  	unsigned int xShift_; /* Offset of 0/1 applied to window_.x */
>  	bool enableInputMemcpy_;
> -
>  	static constexpr unsigned int kMaxThreads = 4;
>  	struct DebayerCpuThreadData threadData_[kMaxThreads];
> +
> +	/* variables used every frame */
>  	unsigned int threadCount_;
> +	bool ccmEnabled_;
> +	DebayerParams params_;

Could you reverse the order and start with the per-frame, then per-line
and finally per-pixel variables ? That would make a more logical read
order.

>  };
>  
>  } /* namespace libcamera */
Hans de Goede Feb. 23, 2026, 3:36 p.m. UTC | #3
Hi,

On 17-Feb-26 10:33 PM, Milan Zamazal wrote:
> Hi Hans,
> 
> thank you for the patch.
> 
> Hans de Goede <johannes.goede@oss.qualcomm.com> writes:
> 
>> Group variables used every pixel together, followed by variables used
>> every lines and then lastly variables only used every frame.
>>
>> The idea here is to have all the data used every pixel fit in as few
>> cachelines as possible.
>>
>> Benchmarking does not show any differerence before after, possibly
>> because most of the per pixel lookup tables where already grouped
>> together.
> 
> The struct layout may be indeed critical, IIRC I got ~10% penalty when I
> had to switch the lookup table arrangement to fix wrong CCM
> multiplication order.  Maybe there is a hidden opportunity for
> significant speedup somewhere.  But this is a different problem than
> what this patch solves.
> 
> With the typo below fixed:
> 
> Reviewed-by: Milan Zamazal <mzamazal@redhat.com>

Thank you for the review.

After the refactoring in v2, there is very little left to move here,
so I've dropped this patch for the upcoming v2 series.

Regards,

Hans

Patch
diff mbox series

diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
index 800b018c..a54418dc 100644
--- a/src/libcamera/software_isp/debayer_cpu.h
+++ b/src/libcamera/software_isp/debayer_cpu.h
@@ -135,6 +135,7 @@  private:
 	};
 	using LookupTable = std::array<uint8_t, kRGBLookupSize>;
 	using CcmLookupTable = std::array<CcmColumn, kRGBLookupSize>;
+	/* Variables used every pixel */
 	LookupTable red_;
 	LookupTable green_;
 	LookupTable blue_;
@@ -143,24 +144,26 @@  private:
 	CcmLookupTable blueCcm_;
 	std::array<double, kGammaLookupSize> gammaTable_;
 	LookupTable gammaLut_;
-	bool ccmEnabled_;
-	DebayerParams params_;
-	SwIspStats statsBuffer_;
+	Rectangle window_;
 
+	/* Variables used every line */
+	SwIspStats statsBuffer_;
 	debayerFn debayer0_;
 	debayerFn debayer1_;
 	debayerFn debayer2_;
 	debayerFn debayer3_;
-	Rectangle window_;
 	std::unique_ptr<SwStatsCpu> stats_;
 	unsigned int lineBufferLength_;
 	unsigned int lineBufferPadding_;
 	unsigned int xShift_; /* Offset of 0/1 applied to window_.x */
 	bool enableInputMemcpy_;
-
 	static constexpr unsigned int kMaxThreads = 4;
 	struct DebayerCpuThreadData threadData_[kMaxThreads];
+
+	/* variables used every frame */
 	unsigned int threadCount_;
+	bool ccmEnabled_;
+	DebayerParams params_;
 };
 
 } /* namespace libcamera */