[RFC,1/1] libcamera: debayer_cpu: Mask out unused bits from > 8bpp non packed src data
diff mbox series

Message ID 20251112090924.46295-2-johannes.goede@oss.qualcomm.com
State New
Headers show
Series
  • Fix softISP crash on 10/12bpp sparse input frames
Related show

Commit Message

Hans de Goede Nov. 12, 2025, 9:09 a.m. UTC
Users have been reporting invalid array access assert errors in
statsBGGR10Line0 inside the SWSTATS_ACCUMULATE_LINE_STATS() macro caused
by out of bounds accesses to the yHistogram array.

Another case of the same problem is out of bounds accesses to the various
lookup arrays used in the DebayerCpu code.

Both of these are caused by 10 bpp sparse (stored in 16 bit words) input
frames containing pixels values > 1023 leading to out of bounds array
accesses. IOW bits 15-10 of the 16 bit word are not 0 as they should be.

This can only happen if we somehow get a corrupt frame from the kernel.
The reported crashes show that this can actually happen so we will need
to harden the software ISP code to deal with this.

The cheapest (in CPU time) way to fix this is to simply mask out the high
unused bits of the words for sparse input formats directly after memcpy-ing
a line to one of the lineBuffers.

This also means that the software ISP must always use the memcpy path for
sparse 10/12 bpp input data (it should never write to the input buffer).
Add a new forceInputMemcpy_ for the setting coming from the config file and
set enableInputMemcpy_ to true if either forceInputMemcpy_ is set or
the input format is sparse 10/12 bpp.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2402746#c20
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
---
 src/libcamera/software_isp/debayer_cpu.cpp | 41 ++++++++++++++++++++--
 src/libcamera/software_isp/debayer_cpu.h   |  6 ++++
 2 files changed, 45 insertions(+), 2 deletions(-)

Comments

Milan Zamazal Nov. 13, 2025, 11:54 a.m. UTC | #1
Hi Hans,

thank you for the fix.

Hans de Goede <johannes.goede@oss.qualcomm.com> writes:

> Users have been reporting invalid array access assert errors in
> statsBGGR10Line0 inside the SWSTATS_ACCUMULATE_LINE_STATS() macro caused
> by out of bounds accesses to the yHistogram array.
>
> Another case of the same problem is out of bounds accesses to the various
> lookup arrays used in the DebayerCpu code.
>
> Both of these are caused by 10 bpp sparse (stored in 16 bit words) input
> frames containing pixels values > 1023 leading to out of bounds array
> accesses. IOW bits 15-10 of the 16 bit word are not 0 as they should be.
>
> This can only happen if we somehow get a corrupt frame from the kernel.
> The reported crashes show that this can actually happen so we will need
> to harden the software ISP code to deal with this.
>
> The cheapest (in CPU time) way to fix this is to simply mask out the high
> unused bits of the words for sparse input formats directly after memcpy-ing
> a line to one of the lineBuffers.

OK.  But how about GPU ISP?  We still run stats on a CPU.  And a
mechanism other than memcpy-ing should be used with GPU ISP.

> This also means that the software ISP must always use the memcpy path for
> sparse 10/12 bpp input data (it should never write to the input buffer).
> Add a new forceInputMemcpy_ for the setting coming from the config file and
> set enableInputMemcpy_ to true if either forceInputMemcpy_ is set or
> the input format is sparse 10/12 bpp.

Do we really need a new variable for the purpose?  Why not to use
enableInputMemcpy_ directly?

I'm also thinking about honouring forceInputMemcpy_ unconditionally if
set.  The masking would be disabled in case forceInputMemcpy_ is
explicitly set to false.  A warning in the documentation should avoid
getting much more bug reports about the crashes.

> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2402746#c20
> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
> ---
>  src/libcamera/software_isp/debayer_cpu.cpp | 41 ++++++++++++++++++++--
>  src/libcamera/software_isp/debayer_cpu.h   |  6 ++++
>  2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
> index 00738c56..616e4f13 100644
> --- a/src/libcamera/software_isp/debayer_cpu.cpp
> +++ b/src/libcamera/software_isp/debayer_cpu.cpp
> @@ -47,14 +47,14 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats, const GlobalConfigurat
>  	 * Reading from uncached buffers may be very slow.
>  	 * In such a case, it's better to copy input buffer data to normal memory.
>  	 * But in case of cached buffers, copying the data is unnecessary overhead.
> -	 * enable_input_memcpy_ makes this behavior configurable.  At the moment, we
> +	 * forceInputMemcpy_ makes this behavior configurable.  At the moment, we
>  	 * always set it to true as the safer choice but this should be changed in
>  	 * future.
>  	 *
>  	 * \todo Make memcpy automatic based on runtime detection of platform
>  	 * capabilities.
>  	 */
> -	enableInputMemcpy_ =
> +	forceInputMemcpy_ =
>  		configuration.option<bool>({ "software_isp", "copy_input_buffer" }).value_or(true);
>  }
>  
> @@ -288,6 +288,30 @@ void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])
>  	}
>  }
>  
> +/*
> + * Functions to mask out high unused bits from input buffers. These bits should
> + * already by 0, but sometimes with corrupt input frames these are not 0 causing
> + * out-of-bounds accesses to various lookup tables. These functions explicitly
> + * set the unused high bits to 0 to avoid corrupt frames causing crashes.
> + */
> +void DebayerCpu::inputMask10(uint8_t *data, unsigned int len)
> +{
> +	/* Everything is aligned to 2 16 bit pixels, mask 2 pixels at a time */
> +	uint32_t *input, 

Declare in the `for' loop?

> *end = (uint32_t *)(data + len);

static_cast<uint32_t *>(data + len)

(Similarly elsewhere.)

> +
> +	for (input = (uint32_t *)data; input < end; input++)
> +		*input &= 0x03ff03ff;
> +}
> +
> +void DebayerCpu::inputMask12(uint8_t *data, unsigned int len)
> +{
> +	/* Everything is aligned to 2 16 bit pixels, mask 2 pixels at a time */
> +	uint32_t *input, *end = (uint32_t *)(data + len);
> +
> +	for (input = (uint32_t *)data; input < end; input++)
> +		*input &= 0x0fff0fff;
> +}

There is a measurable performance penalty with this, ~4 % in my
environment.  Would perhaps changing the methods to perform the memcpy
directly, rather than fixing the input as an additional step, help?

> +
>  /*
>   * Setup the Debayer object according to the passed in parameters.
>   * Return 0 on success, a negative errno value on failure
> @@ -395,6 +419,8 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat,
>  
>  	xShift_ = 0;
>  	swapRedBlueGains_ = false;
> +	inputMask_ = NULL;

nullptr

> +	enableInputMemcpy_ = forceInputMemcpy_;
>  
>  	auto invalidFmt = []() -> int {
>  		LOG(Debayer, Error) << "Unsupported input output format combination";
> @@ -446,12 +472,17 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat,
>  			break;
>  		case 10:
>  			SET_DEBAYER_METHODS(debayer10_BGBG_BGR888, debayer10_GRGR_BGR888)
> +			inputMask_ = &DebayerCpu::inputMask10;
>  			break;
>  		case 12:
>  			SET_DEBAYER_METHODS(debayer12_BGBG_BGR888, debayer12_GRGR_BGR888)
> +			inputMask_ = &DebayerCpu::inputMask12;
>  			break;
>  		}
>  		setupStandardBayerOrder(bayerFormat.order);
> +		if (inputMask_)
> +			enableInputMemcpy_ = true;
> +
>  		return 0;
>  	}
>  
> @@ -601,6 +632,8 @@ void DebayerCpu::setupInputMemcpy(const uint8_t *linePointers[])
>  		memcpy(lineBuffers_[i].data(),
>  		       linePointers[i + 1] - lineBufferPadding_,
>  		       lineBufferLength_);
> +		if (inputMask_)
> +			(this->*inputMask_)(lineBuffers_[i].data(), lineBufferLength_);
>  		linePointers[i + 1] = lineBuffers_[i].data() + lineBufferPadding_;
>  	}
>  
> @@ -629,6 +662,10 @@ void DebayerCpu::memcpyNextLine(const uint8_t *linePointers[])
>  	memcpy(lineBuffers_[lineBufferIndex_].data(),
>  	       linePointers[patternHeight] - lineBufferPadding_,
>  	       lineBufferLength_);
> +	if (inputMask_)
> +		(this->*inputMask_)(lineBuffers_[lineBufferIndex_].data(),
> +				    lineBufferLength_);
> +
>  	linePointers[patternHeight] = lineBuffers_[lineBufferIndex_].data() + lineBufferPadding_;
>  
>  	lineBufferIndex_ = (lineBufferIndex_ + 1) % (patternHeight + 1);
> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
> index 3bf34ac3..a5feefa2 100644
> --- a/src/libcamera/software_isp/debayer_cpu.h
> +++ b/src/libcamera/software_isp/debayer_cpu.h
> @@ -79,6 +79,8 @@ private:
>  	 */
>  	using debayerFn = void (DebayerCpu::*)(uint8_t *dst, const uint8_t *src[]);
>  
> +	using inputMaskFn = void (DebayerCpu::*)(uint8_t *data, unsigned int len);
> +
>  	/* 8-bit raw bayer format */
>  	template<bool addAlphaByte, bool ccmEnabled>
>  	void debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
> @@ -89,11 +91,13 @@ private:
>  	void debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
>  	template<bool addAlphaByte, bool ccmEnabled>
>  	void debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);
> +	void inputMask10(uint8_t *data, unsigned int len);
>  	/* unpacked 12-bit raw bayer format */
>  	template<bool addAlphaByte, bool ccmEnabled>
>  	void debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
>  	template<bool addAlphaByte, bool ccmEnabled>
>  	void debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);
> +	void inputMask12(uint8_t *data, unsigned int len);
>  	/* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */
>  	template<bool addAlphaByte, bool ccmEnabled>
>  	void debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
> @@ -123,6 +127,7 @@ private:
>  	debayerFn debayer1_;
>  	debayerFn debayer2_;
>  	debayerFn debayer3_;
> +	inputMaskFn inputMask_;
>  	Rectangle window_;
>  	std::unique_ptr<SwStatsCpu> stats_;
>  	std::vector<uint8_t> lineBuffers_[kMaxLineBuffers];
> @@ -130,6 +135,7 @@ private:
>  	unsigned int lineBufferPadding_;
>  	unsigned int lineBufferIndex_;
>  	unsigned int xShift_; /* Offset of 0/1 applied to window_.x */
> +	bool forceInputMemcpy_;
>  	bool enableInputMemcpy_;
>  };
Laurent Pinchart Nov. 13, 2025, 1:12 p.m. UTC | #2
On Thu, Nov 13, 2025 at 12:54:01PM +0100, Milan Zamazal wrote:
> Hi Hans,
> 
> thank you for the fix.
> 
> Hans de Goede <johannes.goede@oss.qualcomm.com> writes:
> 
> > Users have been reporting invalid array access assert errors in
> > statsBGGR10Line0 inside the SWSTATS_ACCUMULATE_LINE_STATS() macro caused
> > by out of bounds accesses to the yHistogram array.
> >
> > Another case of the same problem is out of bounds accesses to the various
> > lookup arrays used in the DebayerCpu code.
> >
> > Both of these are caused by 10 bpp sparse (stored in 16 bit words) input
> > frames containing pixels values > 1023 leading to out of bounds array
> > accesses. IOW bits 15-10 of the 16 bit word are not 0 as they should be.
> >
> > This can only happen if we somehow get a corrupt frame from the kernel.
> > The reported crashes show that this can actually happen so we will need
> > to harden the software ISP code to deal with this.
> >
> > The cheapest (in CPU time) way to fix this is to simply mask out the high
> > unused bits of the words for sparse input formats directly after memcpy-ing
> > a line to one of the lineBuffers.
> 
> OK.  But how about GPU ISP?  We still run stats on a CPU.  And a
> mechanism other than memcpy-ing should be used with GPU ISP.
> 
> > This also means that the software ISP must always use the memcpy path for
> > sparse 10/12 bpp input data (it should never write to the input buffer).
> > Add a new forceInputMemcpy_ for the setting coming from the config file and
> > set enableInputMemcpy_ to true if either forceInputMemcpy_ is set or
> > the input format is sparse 10/12 bpp.
> 
> Do we really need a new variable for the purpose?  Why not to use
> enableInputMemcpy_ directly?
> 
> I'm also thinking about honouring forceInputMemcpy_ unconditionally if
> set.  The masking would be disabled in case forceInputMemcpy_ is
> explicitly set to false.  A warning in the documentation should avoid
> getting much more bug reports about the crashes.
> 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2402746#c20
> > Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
> > ---
> >  src/libcamera/software_isp/debayer_cpu.cpp | 41 ++++++++++++++++++++--
> >  src/libcamera/software_isp/debayer_cpu.h   |  6 ++++
> >  2 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
> > index 00738c56..616e4f13 100644
> > --- a/src/libcamera/software_isp/debayer_cpu.cpp
> > +++ b/src/libcamera/software_isp/debayer_cpu.cpp
> > @@ -47,14 +47,14 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats, const GlobalConfigurat
> >  	 * Reading from uncached buffers may be very slow.
> >  	 * In such a case, it's better to copy input buffer data to normal memory.
> >  	 * But in case of cached buffers, copying the data is unnecessary overhead.
> > -	 * enable_input_memcpy_ makes this behavior configurable.  At the moment, we
> > +	 * forceInputMemcpy_ makes this behavior configurable.  At the moment, we
> >  	 * always set it to true as the safer choice but this should be changed in
> >  	 * future.
> >  	 *
> >  	 * \todo Make memcpy automatic based on runtime detection of platform
> >  	 * capabilities.
> >  	 */
> > -	enableInputMemcpy_ =
> > +	forceInputMemcpy_ =
> >  		configuration.option<bool>({ "software_isp", "copy_input_buffer" }).value_or(true);
> >  }
> >  
> > @@ -288,6 +288,30 @@ void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])
> >  	}
> >  }
> >  
> > +/*
> > + * Functions to mask out high unused bits from input buffers. These bits should
> > + * already by 0, but sometimes with corrupt input frames these are not 0 causing
> > + * out-of-bounds accesses to various lookup tables. These functions explicitly
> > + * set the unused high bits to 0 to avoid corrupt frames causing crashes.
> > + */
> > +void DebayerCpu::inputMask10(uint8_t *data, unsigned int len)
> > +{
> > +	/* Everything is aligned to 2 16 bit pixels, mask 2 pixels at a time */
> > +	uint32_t *input, 
> 
> Declare in the `for' loop?
> 
> > *end = (uint32_t *)(data + len);
> 
> static_cast<uint32_t *>(data + len)
> 
> (Similarly elsewhere.)
> 
> > +
> > +	for (input = (uint32_t *)data; input < end; input++)
> > +		*input &= 0x03ff03ff;
> > +}
> > +
> > +void DebayerCpu::inputMask12(uint8_t *data, unsigned int len)
> > +{
> > +	/* Everything is aligned to 2 16 bit pixels, mask 2 pixels at a time */
> > +	uint32_t *input, *end = (uint32_t *)(data + len);
> > +
> > +	for (input = (uint32_t *)data; input < end; input++)
> > +		*input &= 0x0fff0fff;
> > +}
> 
> There is a measurable performance penalty with this, ~4 % in my
> environment.  Would perhaps changing the methods to perform the memcpy
> directly, rather than fixing the input as an additional step, help?

I ws about to suggest that, or even masking when reading the pixels to
support cases where memcpy is disabled.

> > +
> >  /*
> >   * Setup the Debayer object according to the passed in parameters.
> >   * Return 0 on success, a negative errno value on failure
> > @@ -395,6 +419,8 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat,
> >  
> >  	xShift_ = 0;
> >  	swapRedBlueGains_ = false;
> > +	inputMask_ = NULL;
> 
> nullptr
> 
> > +	enableInputMemcpy_ = forceInputMemcpy_;
> >  
> >  	auto invalidFmt = []() -> int {
> >  		LOG(Debayer, Error) << "Unsupported input output format combination";
> > @@ -446,12 +472,17 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat,
> >  			break;
> >  		case 10:
> >  			SET_DEBAYER_METHODS(debayer10_BGBG_BGR888, debayer10_GRGR_BGR888)
> > +			inputMask_ = &DebayerCpu::inputMask10;
> >  			break;
> >  		case 12:
> >  			SET_DEBAYER_METHODS(debayer12_BGBG_BGR888, debayer12_GRGR_BGR888)
> > +			inputMask_ = &DebayerCpu::inputMask12;
> >  			break;
> >  		}
> >  		setupStandardBayerOrder(bayerFormat.order);
> > +		if (inputMask_)
> > +			enableInputMemcpy_ = true;
> > +
> >  		return 0;
> >  	}
> >  
> > @@ -601,6 +632,8 @@ void DebayerCpu::setupInputMemcpy(const uint8_t *linePointers[])
> >  		memcpy(lineBuffers_[i].data(),
> >  		       linePointers[i + 1] - lineBufferPadding_,
> >  		       lineBufferLength_);
> > +		if (inputMask_)
> > +			(this->*inputMask_)(lineBuffers_[i].data(), lineBufferLength_);
> >  		linePointers[i + 1] = lineBuffers_[i].data() + lineBufferPadding_;
> >  	}
> >  
> > @@ -629,6 +662,10 @@ void DebayerCpu::memcpyNextLine(const uint8_t *linePointers[])
> >  	memcpy(lineBuffers_[lineBufferIndex_].data(),
> >  	       linePointers[patternHeight] - lineBufferPadding_,
> >  	       lineBufferLength_);
> > +	if (inputMask_)
> > +		(this->*inputMask_)(lineBuffers_[lineBufferIndex_].data(),
> > +				    lineBufferLength_);
> > +
> >  	linePointers[patternHeight] = lineBuffers_[lineBufferIndex_].data() + lineBufferPadding_;
> >  
> >  	lineBufferIndex_ = (lineBufferIndex_ + 1) % (patternHeight + 1);
> > diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
> > index 3bf34ac3..a5feefa2 100644
> > --- a/src/libcamera/software_isp/debayer_cpu.h
> > +++ b/src/libcamera/software_isp/debayer_cpu.h
> > @@ -79,6 +79,8 @@ private:
> >  	 */
> >  	using debayerFn = void (DebayerCpu::*)(uint8_t *dst, const uint8_t *src[]);
> >  
> > +	using inputMaskFn = void (DebayerCpu::*)(uint8_t *data, unsigned int len);
> > +
> >  	/* 8-bit raw bayer format */
> >  	template<bool addAlphaByte, bool ccmEnabled>
> >  	void debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
> > @@ -89,11 +91,13 @@ private:
> >  	void debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
> >  	template<bool addAlphaByte, bool ccmEnabled>
> >  	void debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);
> > +	void inputMask10(uint8_t *data, unsigned int len);
> >  	/* unpacked 12-bit raw bayer format */
> >  	template<bool addAlphaByte, bool ccmEnabled>
> >  	void debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
> >  	template<bool addAlphaByte, bool ccmEnabled>
> >  	void debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);
> > +	void inputMask12(uint8_t *data, unsigned int len);
> >  	/* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */
> >  	template<bool addAlphaByte, bool ccmEnabled>
> >  	void debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
> > @@ -123,6 +127,7 @@ private:
> >  	debayerFn debayer1_;
> >  	debayerFn debayer2_;
> >  	debayerFn debayer3_;
> > +	inputMaskFn inputMask_;
> >  	Rectangle window_;
> >  	std::unique_ptr<SwStatsCpu> stats_;
> >  	std::vector<uint8_t> lineBuffers_[kMaxLineBuffers];
> > @@ -130,6 +135,7 @@ private:
> >  	unsigned int lineBufferPadding_;
> >  	unsigned int lineBufferIndex_;
> >  	unsigned int xShift_; /* Offset of 0/1 applied to window_.x */
> > +	bool forceInputMemcpy_;
> >  	bool enableInputMemcpy_;
> >  };
>

Patch
diff mbox series

diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
index 00738c56..616e4f13 100644
--- a/src/libcamera/software_isp/debayer_cpu.cpp
+++ b/src/libcamera/software_isp/debayer_cpu.cpp
@@ -47,14 +47,14 @@  DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats, const GlobalConfigurat
 	 * Reading from uncached buffers may be very slow.
 	 * In such a case, it's better to copy input buffer data to normal memory.
 	 * But in case of cached buffers, copying the data is unnecessary overhead.
-	 * enable_input_memcpy_ makes this behavior configurable.  At the moment, we
+	 * forceInputMemcpy_ makes this behavior configurable.  At the moment, we
 	 * always set it to true as the safer choice but this should be changed in
 	 * future.
 	 *
 	 * \todo Make memcpy automatic based on runtime detection of platform
 	 * capabilities.
 	 */
-	enableInputMemcpy_ =
+	forceInputMemcpy_ =
 		configuration.option<bool>({ "software_isp", "copy_input_buffer" }).value_or(true);
 }
 
@@ -288,6 +288,30 @@  void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])
 	}
 }
 
+/*
+ * Functions to mask out high unused bits from input buffers. These bits should
+ * already by 0, but sometimes with corrupt input frames these are not 0 causing
+ * out-of-bounds accesses to various lookup tables. These functions explicitly
+ * set the unused high bits to 0 to avoid corrupt frames causing crashes.
+ */
+void DebayerCpu::inputMask10(uint8_t *data, unsigned int len)
+{
+	/* Everything is aligned to 2 16 bit pixels, mask 2 pixels at a time */
+	uint32_t *input, *end = (uint32_t *)(data + len);
+
+	for (input = (uint32_t *)data; input < end; input++)
+		*input &= 0x03ff03ff;
+}
+
+void DebayerCpu::inputMask12(uint8_t *data, unsigned int len)
+{
+	/* Everything is aligned to 2 16 bit pixels, mask 2 pixels at a time */
+	uint32_t *input, *end = (uint32_t *)(data + len);
+
+	for (input = (uint32_t *)data; input < end; input++)
+		*input &= 0x0fff0fff;
+}
+
 /*
  * Setup the Debayer object according to the passed in parameters.
  * Return 0 on success, a negative errno value on failure
@@ -395,6 +419,8 @@  int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat,
 
 	xShift_ = 0;
 	swapRedBlueGains_ = false;
+	inputMask_ = NULL;
+	enableInputMemcpy_ = forceInputMemcpy_;
 
 	auto invalidFmt = []() -> int {
 		LOG(Debayer, Error) << "Unsupported input output format combination";
@@ -446,12 +472,17 @@  int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat,
 			break;
 		case 10:
 			SET_DEBAYER_METHODS(debayer10_BGBG_BGR888, debayer10_GRGR_BGR888)
+			inputMask_ = &DebayerCpu::inputMask10;
 			break;
 		case 12:
 			SET_DEBAYER_METHODS(debayer12_BGBG_BGR888, debayer12_GRGR_BGR888)
+			inputMask_ = &DebayerCpu::inputMask12;
 			break;
 		}
 		setupStandardBayerOrder(bayerFormat.order);
+		if (inputMask_)
+			enableInputMemcpy_ = true;
+
 		return 0;
 	}
 
@@ -601,6 +632,8 @@  void DebayerCpu::setupInputMemcpy(const uint8_t *linePointers[])
 		memcpy(lineBuffers_[i].data(),
 		       linePointers[i + 1] - lineBufferPadding_,
 		       lineBufferLength_);
+		if (inputMask_)
+			(this->*inputMask_)(lineBuffers_[i].data(), lineBufferLength_);
 		linePointers[i + 1] = lineBuffers_[i].data() + lineBufferPadding_;
 	}
 
@@ -629,6 +662,10 @@  void DebayerCpu::memcpyNextLine(const uint8_t *linePointers[])
 	memcpy(lineBuffers_[lineBufferIndex_].data(),
 	       linePointers[patternHeight] - lineBufferPadding_,
 	       lineBufferLength_);
+	if (inputMask_)
+		(this->*inputMask_)(lineBuffers_[lineBufferIndex_].data(),
+				    lineBufferLength_);
+
 	linePointers[patternHeight] = lineBuffers_[lineBufferIndex_].data() + lineBufferPadding_;
 
 	lineBufferIndex_ = (lineBufferIndex_ + 1) % (patternHeight + 1);
diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
index 3bf34ac3..a5feefa2 100644
--- a/src/libcamera/software_isp/debayer_cpu.h
+++ b/src/libcamera/software_isp/debayer_cpu.h
@@ -79,6 +79,8 @@  private:
 	 */
 	using debayerFn = void (DebayerCpu::*)(uint8_t *dst, const uint8_t *src[]);
 
+	using inputMaskFn = void (DebayerCpu::*)(uint8_t *data, unsigned int len);
+
 	/* 8-bit raw bayer format */
 	template<bool addAlphaByte, bool ccmEnabled>
 	void debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
@@ -89,11 +91,13 @@  private:
 	void debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
 	template<bool addAlphaByte, bool ccmEnabled>
 	void debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);
+	void inputMask10(uint8_t *data, unsigned int len);
 	/* unpacked 12-bit raw bayer format */
 	template<bool addAlphaByte, bool ccmEnabled>
 	void debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
 	template<bool addAlphaByte, bool ccmEnabled>
 	void debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);
+	void inputMask12(uint8_t *data, unsigned int len);
 	/* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */
 	template<bool addAlphaByte, bool ccmEnabled>
 	void debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);
@@ -123,6 +127,7 @@  private:
 	debayerFn debayer1_;
 	debayerFn debayer2_;
 	debayerFn debayer3_;
+	inputMaskFn inputMask_;
 	Rectangle window_;
 	std::unique_ptr<SwStatsCpu> stats_;
 	std::vector<uint8_t> lineBuffers_[kMaxLineBuffers];
@@ -130,6 +135,7 @@  private:
 	unsigned int lineBufferPadding_;
 	unsigned int lineBufferIndex_;
 	unsigned int xShift_; /* Offset of 0/1 applied to window_.x */
+	bool forceInputMemcpy_;
 	bool enableInputMemcpy_;
 };