[0/5] software_isp: debayer_cpu: Add multi-threading support
mbox series

Message ID 20260216190204.106922-1-johannes.goede@oss.qualcomm.com
Headers show
Series
  • software_isp: debayer_cpu: Add multi-threading support
Related show

Message

Hans de Goede Feb. 16, 2026, 7:01 p.m. UTC
Hi All,

The QCM2290 SoC used on the Arduino Uno-Q seems to have a very weak GPU(1),
so weak that it is barely faster then a single CPU core.

This has made me code-up the long envisioned multi-threading support
for the CPU softISP :)

Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464:

1 thread :  147ms / frame, ~6.5  fps
2 threads:   81ms / frame, ~12   fps
3 threads:   66ms / frame, ~14.5 fps
GPU:        130ms / frame, ~7,5  fps
GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)

Regards,

Hans

1) If the GPU really is this weak needs to be investigated more

Hans de Goede (5):
  software_isp: swstats_cpu: Move accumulator storage out of the class
  software_isp: debayer_cpu: Add per render thread data
  software_isp: debayer_cpu: Group innerloop variables together
  software_isp: debayer_cpu: Select process inner loop by function
    pointer
  software_isp: debayer_cpu: Add multi-threading support

 .../internal/software_isp/swstats_cpu.h       |  29 ++--
 src/libcamera/software_isp/debayer_cpu.cpp    | 131 ++++++++++++------
 src/libcamera/software_isp/debayer_cpu.h      |  44 ++++--
 src/libcamera/software_isp/swstats_cpu.cpp    |  65 ++++++---
 4 files changed, 180 insertions(+), 89 deletions(-)

Comments

Kieran Bingham Feb. 17, 2026, 8:41 a.m. UTC | #1
Hi Hans,

Quoting Hans de Goede (2026-02-16 19:01:59)
> Hi All,
> 
> The QCM2290 SoC used on the Arduino Uno-Q seems to have a very weak GPU(1),
> so weak that it is barely faster then a single CPU core.
> 
> This has made me code-up the long envisioned multi-threading support
> for the CPU softISP :)
> 
> Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464:

I'm afraid I think there are CI failures in this branch:

https://gitlab.freedesktop.org/camera/libcamera/-/jobs/93408936#L538

> 
> 1 thread :  147ms / frame, ~6.5  fps
> 2 threads:   81ms / frame, ~12   fps
> 3 threads:   66ms / frame, ~14.5 fps
> GPU:        130ms / frame, ~7,5  fps
> GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
> GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)
> 
> Regards,
> 
> Hans
> 
> 1) If the GPU really is this weak needs to be investigated more
> 
> Hans de Goede (5):
>   software_isp: swstats_cpu: Move accumulator storage out of the class
>   software_isp: debayer_cpu: Add per render thread data
>   software_isp: debayer_cpu: Group innerloop variables together
>   software_isp: debayer_cpu: Select process inner loop by function
>     pointer
>   software_isp: debayer_cpu: Add multi-threading support
> 
>  .../internal/software_isp/swstats_cpu.h       |  29 ++--
>  src/libcamera/software_isp/debayer_cpu.cpp    | 131 ++++++++++++------
>  src/libcamera/software_isp/debayer_cpu.h      |  44 ++++--
>  src/libcamera/software_isp/swstats_cpu.cpp    |  65 ++++++---
>  4 files changed, 180 insertions(+), 89 deletions(-)
> 
> -- 
> 2.52.0
>
Milan Zamazal Feb. 17, 2026, 10 p.m. UTC | #2
Hans de Goede <johannes.goede@oss.qualcomm.com> writes:

> Hi All,
>
> The QCM2290 SoC used on the Arduino Uno-Q seems to have a very weak GPU(1),
> so weak that it is barely faster then a single CPU core.
>
> This has made me code-up the long envisioned multi-threading support
> for the CPU softISP :)

Reason to not drop CPU ISP in future?

> Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464:
>
> 1 thread :  147ms / frame, ~6.5  fps
> 2 threads:   81ms / frame, ~12   fps
> 3 threads:   66ms / frame, ~14.5 fps
> GPU:        130ms / frame, ~7,5  fps
> GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
> GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)

The CPU measurements are with or without CCM?

> Regards,
>
> Hans
>
> 1) If the GPU really is this weak needs to be investigated more
>
> Hans de Goede (5):
>   software_isp: swstats_cpu: Move accumulator storage out of the class
>   software_isp: debayer_cpu: Add per render thread data
>   software_isp: debayer_cpu: Group innerloop variables together
>   software_isp: debayer_cpu: Select process inner loop by function
>     pointer
>   software_isp: debayer_cpu: Add multi-threading support
>
>  .../internal/software_isp/swstats_cpu.h       |  29 ++--
>  src/libcamera/software_isp/debayer_cpu.cpp    | 131 ++++++++++++------
>  src/libcamera/software_isp/debayer_cpu.h      |  44 ++++--
>  src/libcamera/software_isp/swstats_cpu.cpp    |  65 ++++++---
>  4 files changed, 180 insertions(+), 89 deletions(-)
Laurent Pinchart Feb. 19, 2026, 2:12 p.m. UTC | #3
On Tue, Feb 17, 2026 at 11:00:06PM +0100, Milan Zamazal wrote:
> Hans de Goede writes:
> 
> > Hi All,
> >
> > The QCM2290 SoC used on the Arduino Uno-Q seems to have a very weak GPU(1),
> > so weak that it is barely faster then a single CPU core.
> >
> > This has made me code-up the long envisioned multi-threading support
> > for the CPU softISP :)
> 
> Reason to not drop CPU ISP in future?

Note that I still think the CPU implementation should evolve to perform
the same computation as the GPU implementation, which will include LSC
and other algorithms. It will therefore slow down.

The right solution to this problem is to support the hardware ISP
included in the QCM2290 :-)

> > Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464:
> >
> > 1 thread :  147ms / frame, ~6.5  fps
> > 2 threads:   81ms / frame, ~12   fps
> > 3 threads:   66ms / frame, ~14.5 fps
> > GPU:        130ms / frame, ~7,5  fps
> > GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
> > GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)
> 
> The CPU measurements are with or without CCM?
> 
> > Regards,
> >
> > Hans
> >
> > 1) If the GPU really is this weak needs to be investigated more
> >
> > Hans de Goede (5):
> >   software_isp: swstats_cpu: Move accumulator storage out of the class
> >   software_isp: debayer_cpu: Add per render thread data
> >   software_isp: debayer_cpu: Group innerloop variables together
> >   software_isp: debayer_cpu: Select process inner loop by function
> >     pointer
> >   software_isp: debayer_cpu: Add multi-threading support
> >
> >  .../internal/software_isp/swstats_cpu.h       |  29 ++--
> >  src/libcamera/software_isp/debayer_cpu.cpp    | 131 ++++++++++++------
> >  src/libcamera/software_isp/debayer_cpu.h      |  44 ++++--
> >  src/libcamera/software_isp/swstats_cpu.cpp    |  65 ++++++---
> >  4 files changed, 180 insertions(+), 89 deletions(-)
Hans de Goede Feb. 23, 2026, 3:44 p.m. UTC | #4
Hi,

On 17-Feb-26 11:00 PM, Milan Zamazal wrote:
> Hans de Goede <johannes.goede@oss.qualcomm.com> writes:
> 
>> Hi All,
>>
>> The QCM2290 SoC used on the Arduino Uno-Q seems to have a very weak GPU(1),
>> so weak that it is barely faster then a single CPU core.
>>
>> This has made me code-up the long envisioned multi-threading support
>> for the CPU softISP :)
> 
> Reason to not drop CPU ISP in future?

One reason yes, I think it will be good to keep it around as a lowest
common denominator anyways also for e.g. phones with older powervr
gfx which will never get FOSS GPU support and other cases where we
may not be able to use a GPU for one reason or another.

>> Benchmark results for the Uno-Q + IMX219 running at 3280x2464 -> 3272x2464:
>>
>> 1 thread :  147ms / frame, ~6.5  fps
>> 2 threads:   81ms / frame, ~12   fps
>> 3 threads:   66ms / frame, ~14.5 fps
>> GPU:        130ms / frame, ~7,5  fps
>> GPU 0-copy: 110ms / frame, ~9.5  fps (requires pipeline + camss hacks)
>> GPU lite:    85ms / frame, ~12   fps (CCM, contrast and gamma disabled)
> 
> The CPU measurements are with or without CCM?

without CCM.

Regards,

Hans