[v7,0/5] software_isp: debayer_cpu: Add multi-threading support
mbox series

Message ID 20260310120106.79922-1-johannes.goede@oss.qualcomm.com
Headers show
Series
  • software_isp: debayer_cpu: Add multi-threading support
Related show

Message

Hans de Goede March 10, 2026, 12:01 p.m. UTC
Hi All,

The QCM2290 SoC used on the Arduino Uno-Q seems to have a weak GPU, so weak
that it is barely faster then a single CPU core (CPU without CCM).

This has made me code-up the long envisioned multi-threading support
for the CPU softISP.

Changes in v7:
- Small update to DebayerCpuThread::configure() doc
- Add missing space between type and name for threads_ declaration
- Add Debug message logging thread count
- Gather various tags

Changes in v6:
- Use "LIBCAMERA_SOFTISP_MODE, software_isp.mode" as item title for patch 5/5
- Send out patch 5/5 as a standalone v6 as the rest of the series looked
  ready for merging (bad idea in hindsight)

Changes in v5:
- Extend software_isp.threads docs in runtime_configuration.rst
- New patch: "Documentation/runtime_configuration: Add missing
  software_isp.mode doc"

Changes in v4:
- Use const in for (const auto &s : stats_) {} in SwStatsCpu::finishFrame()
- Move kMaxLineBuffers constant to DebayerCpuThread class
- Document software_isp.threads option in runtime_configuration.rst
- Add an use constants for min/max/default number of threads

Changes in v3:
- Use std::unique_ptr for the DebayerCpuThread pointers
- Document new DebayerCpuThread class
- Make DebayerCpuThread inherit from both Thread and Object
- Use for (auto &thread : threads_)
- Use for (auto &s : stats_) {}
- Move input format logging from DebayerCpu::configure() to
  SoftwareIsp::configure()

Changes in v2:
- Quite a bit of refactoring based on v1 feedback, dropped patch 3/5 and 4/5
  from v2 since these now no longer make sense
- Move the allocation of the vector of SwIspStats objects to inside
  the SwStatsCpu class, controlled by a configure() arguments instead
  of making the caller allocate the objects
- Replace the DebayerCpuThreadData struct from v1 with a DebayerCpuThread
  class, derived from Object to allow calling invokeMethod for thread re-use
  in followup patches
- As part of this also move a bunch of methods which primarily deal with
  per thread data: setupInputMemcpy(), shiftLinePointers(), memcpyNextLine(),
  process*() to the new DebayerCpuThread class
- Re-use threads instead of starting new threads every frame
- Add a new patch adding some extra DebayerCpu input format logging

Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464
without CCM:

1 thread :  147ms / frame, ~6.5  fps
2 threads:   80ms / frame, ~12.5 fps
3 threads:   65ms / frame, ~15   fps
GPU:        130ms / frame, ~7,5  fps
GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)

Regards,

Hans


Hans de Goede (5):
  software_isp: swstats_cpu: Prepare for multi-threading support
  software_isp: debayer_cpu: Add DebayerCpuThread class
  software_isp: debayer_cpu: Add multi-threading support
  software_isp: Log input config from configure()
  Documentation/runtime_configuration: Add missing software_isp.mode doc

 Documentation/runtime_configuration.rst       |  17 ++
 .../internal/software_isp/swstats_cpu.h       |  25 +-
 src/libcamera/software_isp/debayer_cpu.cpp    | 288 ++++++++++++++----
 src/libcamera/software_isp/debayer_cpu.h      |  33 +-
 src/libcamera/software_isp/software_isp.cpp   |  12 +-
 src/libcamera/software_isp/swstats_cpu.cpp    |  54 ++--
 6 files changed, 320 insertions(+), 109 deletions(-)

Comments

Kieran Bingham April 1, 2026, 11:07 a.m. UTC | #1
Hi Hans,

This looks pretty good for merge, but there are a couple of niggles from
checkpatch:

Running checkstyle for patchwork/5825 (2a1c75504c789b686b5b37ed8729f632ec81d2f0..40a88692e6959511760590ecd976bb2c6723064a) 00:03
-------------------------------------------------------------------------------------------------------
1d3fd7a90cfb0c5a9c131f000a687b3e9bed164c software_isp: swstats_cpu: Prepare for multi-threading support
-------------------------------------------------------------------------------------------------------
--- src/libcamera/software_isp/swstats_cpu.cpp
+++ src/libcamera/software_isp/swstats_cpu.cpp
@@ -77,9 +77,9 @@
  * \fn void SwStatsCpu::processLine0(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0)
  * \brief Process line 0
  * \param[in] frame The frame number
- * \param[in] y The y coordinate.
- * \param[in] src The input data.
- * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading.
+ * \param[in] y The y coordinate
+ * \param[in] src The input data
+ * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading
  *
  * This function processes line 0 for input formats with
  * patternSize height == 1.
@@ -107,9 +107,9 @@
  * \fn void SwStatsCpu::processLine2(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0)
  * \brief Process line 2 and 3
  * \param[in] frame The frame number
- * \param[in] y The y coordinate.
- * \param[in] src The input data.
- * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading.
+ * \param[in] y The y coordinate
+ * \param[in] src The input data
+ * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading
  *
  * This function processes line 2 and 3 for input formats with
  * patternSize height == 4.
@@ -190,8 +190,8 @@
 	stats.yHistogram[yVal * SwIspStats::kYHistogramSize / (256 * 256 * (div))]++;
 
 #define SWSTATS_FINISH_LINE_STATS() \
-	stats.sum_.r() += sumR;    \
-	stats.sum_.g() += sumG;    \
+	stats.sum_.r() += sumR;     \
+	stats.sum_.g() += sumG;     \
 	stats.sum_.b() += sumB;
 
 void SwStatsCpu::statsBGGR8Line0(const uint8_t *src[], SwIspStats &stats)
---
3 potential issues detected, please review
----------------------------------------------------------------------------------------------
3e10daf140a229e0dcb8a4da59497b373c148f9e software_isp: debayer_cpu: Add DebayerCpuThread class
----------------------------------------------------------------------------------------------
No issue detected
-----------------------------------------------------------------------------------------------
10ad3ee318f67362169a03cfc8414afce007158d software_isp: debayer_cpu: Add multi-threading support
-----------------------------------------------------------------------------------------------
--- src/libcamera/software_isp/debayer_cpu.h
+++ src/libcamera/software_isp/debayer_cpu.h
@@ -15,8 +15,8 @@
 #include <stdint.h>
 #include <vector>
 
+#include <libcamera/base/mutex.h>
 #include <libcamera/base/object.h>
-#include <libcamera/base/mutex.h>
 
 #include "libcamera/internal/bayer_format.h"
 #include "libcamera/internal/global_configuration.h"
---
1 potential issue detected, please review
----------------------------------------------------------------------------------------
9afb093946cba43d5a8fdc195561c706c0452011 software_isp: Log input config from configure()
----------------------------------------------------------------------------------------
No issue detected
---------------------------------------------------------------------------------------------------------------
40a88692e6959511760590ecd976bb2c6723064a Documentation/runtime_configuration: Add missing software_isp.mode doc
---------------------------------------------------------------------------------------------------------------
No issue detected


Please add the checkpatch hooks to your local builds so you can see this
while you develop.

My preferred way is to :

  cp utils/hooks/post-commit .git/hooks/post-commit

these are really tiny, so I can fix them up with the following which
I'll do now: Reporting my workflow in case it's helpful to others:


git checkout patchwork/5825
git rebase -i origin/master -x ./utils/checkstyle.py
...
(let it run and report the break)
(Breakage on software_isp: swstats_cpu: Prepare for multi-threading support)

./utils/checkstyle.py | patch -p0
git add -p
git commit --amend
git rebase --continue

(next breakage on software_isp: debayer_cpu: Add multi-threading support)
./utils/checkstyle.py | patch -p0
git add -p
git commit --amend
git rebase --continue

Successfully rebased and updated refs/heads/patchwork/5825.

and with that I'll re-run CI and then be able to merge cleanly.

--
Kieran



Quoting Hans de Goede (2026-03-10 12:01:01)
> Hi All,
> 
> The QCM2290 SoC used on the Arduino Uno-Q seems to have a weak GPU, so weak
> that it is barely faster then a single CPU core (CPU without CCM).
> 
> This has made me code-up the long envisioned multi-threading support
> for the CPU softISP.
> 
> Changes in v7:
> - Small update to DebayerCpuThread::configure() doc
> - Add missing space between type and name for threads_ declaration
> - Add Debug message logging thread count
> - Gather various tags
> 
> Changes in v6:
> - Use "LIBCAMERA_SOFTISP_MODE, software_isp.mode" as item title for patch 5/5
> - Send out patch 5/5 as a standalone v6 as the rest of the series looked
>   ready for merging (bad idea in hindsight)
> 
> Changes in v5:
> - Extend software_isp.threads docs in runtime_configuration.rst
> - New patch: "Documentation/runtime_configuration: Add missing
>   software_isp.mode doc"
> 
> Changes in v4:
> - Use const in for (const auto &s : stats_) {} in SwStatsCpu::finishFrame()
> - Move kMaxLineBuffers constant to DebayerCpuThread class
> - Document software_isp.threads option in runtime_configuration.rst
> - Add an use constants for min/max/default number of threads
> 
> Changes in v3:
> - Use std::unique_ptr for the DebayerCpuThread pointers
> - Document new DebayerCpuThread class
> - Make DebayerCpuThread inherit from both Thread and Object
> - Use for (auto &thread : threads_)
> - Use for (auto &s : stats_) {}
> - Move input format logging from DebayerCpu::configure() to
>   SoftwareIsp::configure()
> 
> Changes in v2:
> - Quite a bit of refactoring based on v1 feedback, dropped patch 3/5 and 4/5
>   from v2 since these now no longer make sense
> - Move the allocation of the vector of SwIspStats objects to inside
>   the SwStatsCpu class, controlled by a configure() arguments instead
>   of making the caller allocate the objects
> - Replace the DebayerCpuThreadData struct from v1 with a DebayerCpuThread
>   class, derived from Object to allow calling invokeMethod for thread re-use
>   in followup patches
> - As part of this also move a bunch of methods which primarily deal with
>   per thread data: setupInputMemcpy(), shiftLinePointers(), memcpyNextLine(),
>   process*() to the new DebayerCpuThread class
> - Re-use threads instead of starting new threads every frame
> - Add a new patch adding some extra DebayerCpu input format logging
> 
> Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464
> without CCM:
> 
> 1 thread :  147ms / frame, ~6.5  fps
> 2 threads:   80ms / frame, ~12.5 fps
> 3 threads:   65ms / frame, ~15   fps
> GPU:        130ms / frame, ~7,5  fps
> GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
> GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)
> 
> Regards,
> 
> Hans
> 
> 
> Hans de Goede (5):
>   software_isp: swstats_cpu: Prepare for multi-threading support
>   software_isp: debayer_cpu: Add DebayerCpuThread class
>   software_isp: debayer_cpu: Add multi-threading support
>   software_isp: Log input config from configure()
>   Documentation/runtime_configuration: Add missing software_isp.mode doc
> 
>  Documentation/runtime_configuration.rst       |  17 ++
>  .../internal/software_isp/swstats_cpu.h       |  25 +-
>  src/libcamera/software_isp/debayer_cpu.cpp    | 288 ++++++++++++++----
>  src/libcamera/software_isp/debayer_cpu.h      |  33 +-
>  src/libcamera/software_isp/software_isp.cpp   |  12 +-
>  src/libcamera/software_isp/swstats_cpu.cpp    |  54 ++--
>  6 files changed, 320 insertions(+), 109 deletions(-)
> 
> -- 
> 2.53.0
>
Hans de Goede April 3, 2026, 10:31 a.m. UTC | #2
Hi,

On 1-Apr-26 13:07, Kieran Bingham wrote:
> Hi Hans,
> 
> This looks pretty good for merge, but there are a couple of niggles from
> checkpatch:
> 
> Running checkstyle for patchwork/5825 (2a1c75504c789b686b5b37ed8729f632ec81d2f0..40a88692e6959511760590ecd976bb2c6723064a) 00:03
> -------------------------------------------------------------------------------------------------------
> 1d3fd7a90cfb0c5a9c131f000a687b3e9bed164c software_isp: swstats_cpu: Prepare for multi-threading support
> -------------------------------------------------------------------------------------------------------
> --- src/libcamera/software_isp/swstats_cpu.cpp
> +++ src/libcamera/software_isp/swstats_cpu.cpp
> @@ -77,9 +77,9 @@
>   * \fn void SwStatsCpu::processLine0(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0)
>   * \brief Process line 0
>   * \param[in] frame The frame number
> - * \param[in] y The y coordinate.
> - * \param[in] src The input data.
> - * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading.
> + * \param[in] y The y coordinate
> + * \param[in] src The input data
> + * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading
>   *
>   * This function processes line 0 for input formats with
>   * patternSize height == 1.
> @@ -107,9 +107,9 @@
>   * \fn void SwStatsCpu::processLine2(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0)
>   * \brief Process line 2 and 3
>   * \param[in] frame The frame number
> - * \param[in] y The y coordinate.
> - * \param[in] src The input data.
> - * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading.
> + * \param[in] y The y coordinate
> + * \param[in] src The input data
> + * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading
>   *
>   * This function processes line 2 and 3 for input formats with
>   * patternSize height == 4.
> @@ -190,8 +190,8 @@
>  	stats.yHistogram[yVal * SwIspStats::kYHistogramSize / (256 * 256 * (div))]++;
>  
>  #define SWSTATS_FINISH_LINE_STATS() \
> -	stats.sum_.r() += sumR;    \
> -	stats.sum_.g() += sumG;    \
> +	stats.sum_.r() += sumR;     \
> +	stats.sum_.g() += sumG;     \
>  	stats.sum_.b() += sumB;
>  
>  void SwStatsCpu::statsBGGR8Line0(const uint8_t *src[], SwIspStats &stats)
> ---
> 3 potential issues detected, please review
> ----------------------------------------------------------------------------------------------
> 3e10daf140a229e0dcb8a4da59497b373c148f9e software_isp: debayer_cpu: Add DebayerCpuThread class
> ----------------------------------------------------------------------------------------------
> No issue detected
> -----------------------------------------------------------------------------------------------
> 10ad3ee318f67362169a03cfc8414afce007158d software_isp: debayer_cpu: Add multi-threading support
> -----------------------------------------------------------------------------------------------
> --- src/libcamera/software_isp/debayer_cpu.h
> +++ src/libcamera/software_isp/debayer_cpu.h
> @@ -15,8 +15,8 @@
>  #include <stdint.h>
>  #include <vector>
>  
> +#include <libcamera/base/mutex.h>
>  #include <libcamera/base/object.h>
> -#include <libcamera/base/mutex.h>
>  
>  #include "libcamera/internal/bayer_format.h"
>  #include "libcamera/internal/global_configuration.h"
> ---
> 1 potential issue detected, please review
> ----------------------------------------------------------------------------------------
> 9afb093946cba43d5a8fdc195561c706c0452011 software_isp: Log input config from configure()
> ----------------------------------------------------------------------------------------
> No issue detected
> ---------------------------------------------------------------------------------------------------------------
> 40a88692e6959511760590ecd976bb2c6723064a Documentation/runtime_configuration: Add missing software_isp.mode doc
> ---------------------------------------------------------------------------------------------------------------
> No issue detected
> 
> 
> Please add the checkpatch hooks to your local builds so you can see this
> while you develop.
> 
> My preferred way is to :
> 
>   cp utils/hooks/post-commit .git/hooks/post-commit

Ack, I've done so now.

> these are really tiny, so I can fix them up with the following which
> I'll do now: Reporting my workflow in case it's helpful to others:

Thank you for fixing these up and for merging this
as well as the "const" patch.

Regards,

Hans




> 
> 
> git checkout patchwork/5825
> git rebase -i origin/master -x ./utils/checkstyle.py
> ...
> (let it run and report the break)
> (Breakage on software_isp: swstats_cpu: Prepare for multi-threading support)
> 
> ./utils/checkstyle.py | patch -p0
> git add -p
> git commit --amend
> git rebase --continue
> 
> (next breakage on software_isp: debayer_cpu: Add multi-threading support)
> ./utils/checkstyle.py | patch -p0
> git add -p
> git commit --amend
> git rebase --continue
> 
> Successfully rebased and updated refs/heads/patchwork/5825.
> 
> and with that I'll re-run CI and then be able to merge cleanly.
> 
> --
> Kieran
> 
> 
> 
> Quoting Hans de Goede (2026-03-10 12:01:01)
>> Hi All,
>>
>> The QCM2290 SoC used on the Arduino Uno-Q seems to have a weak GPU, so weak
>> that it is barely faster then a single CPU core (CPU without CCM).
>>
>> This has made me code-up the long envisioned multi-threading support
>> for the CPU softISP.
>>
>> Changes in v7:
>> - Small update to DebayerCpuThread::configure() doc
>> - Add missing space between type and name for threads_ declaration
>> - Add Debug message logging thread count
>> - Gather various tags
>>
>> Changes in v6:
>> - Use "LIBCAMERA_SOFTISP_MODE, software_isp.mode" as item title for patch 5/5
>> - Send out patch 5/5 as a standalone v6 as the rest of the series looked
>>   ready for merging (bad idea in hindsight)
>>
>> Changes in v5:
>> - Extend software_isp.threads docs in runtime_configuration.rst
>> - New patch: "Documentation/runtime_configuration: Add missing
>>   software_isp.mode doc"
>>
>> Changes in v4:
>> - Use const in for (const auto &s : stats_) {} in SwStatsCpu::finishFrame()
>> - Move kMaxLineBuffers constant to DebayerCpuThread class
>> - Document software_isp.threads option in runtime_configuration.rst
>> - Add an use constants for min/max/default number of threads
>>
>> Changes in v3:
>> - Use std::unique_ptr for the DebayerCpuThread pointers
>> - Document new DebayerCpuThread class
>> - Make DebayerCpuThread inherit from both Thread and Object
>> - Use for (auto &thread : threads_)
>> - Use for (auto &s : stats_) {}
>> - Move input format logging from DebayerCpu::configure() to
>>   SoftwareIsp::configure()
>>
>> Changes in v2:
>> - Quite a bit of refactoring based on v1 feedback, dropped patch 3/5 and 4/5
>>   from v2 since these now no longer make sense
>> - Move the allocation of the vector of SwIspStats objects to inside
>>   the SwStatsCpu class, controlled by a configure() arguments instead
>>   of making the caller allocate the objects
>> - Replace the DebayerCpuThreadData struct from v1 with a DebayerCpuThread
>>   class, derived from Object to allow calling invokeMethod for thread re-use
>>   in followup patches
>> - As part of this also move a bunch of methods which primarily deal with
>>   per thread data: setupInputMemcpy(), shiftLinePointers(), memcpyNextLine(),
>>   process*() to the new DebayerCpuThread class
>> - Re-use threads instead of starting new threads every frame
>> - Add a new patch adding some extra DebayerCpu input format logging
>>
>> Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464
>> without CCM:
>>
>> 1 thread :  147ms / frame, ~6.5  fps
>> 2 threads:   80ms / frame, ~12.5 fps
>> 3 threads:   65ms / frame, ~15   fps
>> GPU:        130ms / frame, ~7,5  fps
>> GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
>> GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)
>>
>> Regards,
>>
>> Hans
>>
>>
>> Hans de Goede (5):
>>   software_isp: swstats_cpu: Prepare for multi-threading support
>>   software_isp: debayer_cpu: Add DebayerCpuThread class
>>   software_isp: debayer_cpu: Add multi-threading support
>>   software_isp: Log input config from configure()
>>   Documentation/runtime_configuration: Add missing software_isp.mode doc
>>
>>  Documentation/runtime_configuration.rst       |  17 ++
>>  .../internal/software_isp/swstats_cpu.h       |  25 +-
>>  src/libcamera/software_isp/debayer_cpu.cpp    | 288 ++++++++++++++----
>>  src/libcamera/software_isp/debayer_cpu.h      |  33 +-
>>  src/libcamera/software_isp/software_isp.cpp   |  12 +-
>>  src/libcamera/software_isp/swstats_cpu.cpp    |  54 ++--
>>  6 files changed, 320 insertions(+), 109 deletions(-)
>>
>> -- 
>> 2.53.0
>>