Message ID | 20250824-b4-v0-5-2-gpuisp-v2-a-v2-0-96f4576c814e@linaro.org |
---|---|
Headers | show |
Series |
|
Related | show |
On 24/08/2025 01:48, Bryan O'Donoghue wrote: > v2: > > This version 2 is an incomplete update with-respect-to previous comment > feedback, which ordinarily I would not publish however, given OSSEU is > starting on Monday and we have talk about this topic, in addition to some > pretty good progress in the interregnum I thought a v2 would be > appropriate. > > - V2 drops use of GBM surface in favour of generating a framebuffer from > the dma-buf handle, called render-to-texture. > > The conversion from GBM surface + memcpy() including the associated cache > invalidate has a dramatic effect on GPUISP performance. > > Some rough stats for a Qualcomm sm8250 "kona" device with an imx517 > sensor @ 4048 x 3040 ABRG8888 - debug builds > > CPUISP + CCM: > 2 FPS CPU usage > 100% single core pulls about 9 watts > > GPUISP v1 + CCM: > 14 FPS - power not measured > > GPUISP v2 + CCM: > 30 FPS - sensor linerate - CPU usage ~ 70 % pulling 8 Watts. > > Milan Zamal has reported a TI AM69 + imx219 - unknown resolution > > CPUISP 4 FPS > GPUISP v2 - 2 or 3 FPS > GPUISP v2 - 15 FPS == sensor linerate > > In other words for these boards we can hit linerate with GPUISP + 3A + > CCM. I should also mention - 24 bit output for the debayer phase is not planned. The reason is gl_FragColor expect to output vec4. Basically GPUs architecturally want to write a "word" of 32 bits for each bus transaction. - I'd still propose for a merge that GPUISP becomes the default Because the performance uplift with CCM is night and day. It would be interesting to try out sensors with much higher resolution At the moment on a debug build my test system has no problem doing getting linerate. - I don't propose any large functional changes to this series. Basically taking and implementing review feedback. - The only caveat to that is perhaps doing a denoise pass. We've already started to discuss exposing different controls CPU v GPU based on what the different implementations declare themselves capable of. Actually one very hacky way to implement a simple noise filter would be go resample the output texture applying GL_NEAREST filters. The filters pertain to the sampler2D in the GLSL shader not to the output so you'd have to upload your texture and have a shader that did a 1:1 read/write. The read phase would apply the filter and would then do a type of hardware denoising. It would be pretty hacky though... --- bod
Bryan O'Donoghue <bryan.odonoghue@linaro.org> writes: > v2: > > This version 2 is an incomplete update with-respect-to previous comment > feedback, which ordinarily I would not publish however, given OSSEU is > starting on Monday and we have talk about this topic, in addition to some > pretty good progress in the interregnum I thought a v2 would be > appropriate. > > - V2 drops use of GBM surface in favour of generating a framebuffer from > the dma-buf handle, called render-to-texture. > > The conversion from GBM surface + memcpy() including the associated cache > invalidate has a dramatic effect on GPUISP performance. > > Some rough stats for a Qualcomm sm8250 "kona" device with an imx517 > sensor @ 4048 x 3040 ABRG8888 - debug builds > > CPUISP + CCM: > 2 FPS CPU usage > 100% single core pulls about 9 watts > > GPUISP v1 + CCM: > 14 FPS - power not measured > > GPUISP v2 + CCM: > 30 FPS - sensor linerate - CPU usage ~ 70 % pulling 8 Watts. > > Milan Zamal has reported a TI AM69 + imx219 - unknown resolution The resolution is 3280x2464. I get full speed of 15 fps with GPU, 5-7.5 fps with CPU (with CCM). Apparently with GPU only, I get occasional horizontal artefacts on my display when there is movement in the image -- is this expected? > CPUISP 4 FPS > GPUISP v2 - 2 or 3 FPS > GPUISP v2 - 15 FPS == sensor linerate > > In other words for these boards we can hit linerate with GPUISP + 3A + > CCM. > > - Drop GBM surface rendering > - Drop swapbuffers > - Use eglCreateImageKHR to directly render into the output dma-buf buffer > eglCreateImageKHR lets you specify the FOURCC of the texture which means > we can create the texture in the uncompressed target output pixel format > we want. > - Fix stride calculation to 256 bytes > Laurent and Maxime explained to me about GPU stride alignments being > tribal wisdom and that 256 bytes is a good cross-platform value. > This helped to get the render-to-texture command right. > - A synchronous blocking wait is used to ensure GPU operations have > completed. Laurent wants this to be made async. > At the moment its not clear to me the eglWaitSyncKHR is really required > and in any case doesn't seem to have any performance impact. > But this part is still TBD - I've included the sync wait for simplicity > and safety. > - A Debayer::stop() method has been introduced to ensure we call > eglDestroySyncKHR when the eGL context is valid, as opposed to in the > callchain of destructors triggering eGL::~eGL(); > - stats move constructor call chain dropped - Branabas > - Incorporates Milan's area-of-interest constraint for Bayer stats > i.e. squashes his v3 update into debayer_egl.cpp directly > - Moves ALIGN_TO into a common area to facilitate its reuse in > egl.cpp > - Rebases on 0.5.2 > > - There are a number of known checks failing on the CI loop right now > > Link to v1: https://lists.libcamera.org/pipermail/libcamera-devel/2025-June/050692.html > > v1: > This series introduces a GLES 2.0 GPU ISP to libcamera. > > We have had extensive discussions, meetings and collaborative discussions > about this topic over the last year or so. > > As an overview we want to start to move as much processing of software_isp > into the GPU as possible. This is especially advantageous when we are > talking about processing a framebuffer's worth of pixels as quickly as > possible. > > The decision to use GLES 2.0 instead of say Vulcan stems from a desire to > support as much in the way of older hardware as possible and the fact we > already have upstream GLES 2.0 fragment shaders to do debayer. > > Generally the approach is > > - Move the fragment shaders out of qcam and into a common location > - Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate > addition of a new class DebayerEGL. > - Introduce that class > - Then do progressive change of the shaders and DebayerEGL class to make > the modifications as transparent as possible in the git log. > - Reuse as much of the SoftIPA data-structures and logic as possible. > - Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and > GPUISP give similar - hopefully the same results but with GPUISP going > faster. > > In order to get untiled and uncompressed pixel data out of the GPU > framebuffer we need to tell the GPU how to store the data it is writing to > that framebuffer. GPUs can store their framebuffer data in tiled or even > compressed formats which is why the naive approach of running your fragment > shader and then using glReadPixels(GL_RGBA); will be horrendously slow as > glReadPixels must convert from the internal GPU format to the requested > output format - an operation that for me takes ~ 10 milliseconds per frame. > > Instead we get the GPU to store its data as ARGB8888 swap buffers and > memcpy() from the swapped buffer to our output frame. Right now this series > supports 32 bit output formats only. > > The memcpy() also entails flushing the cache of the target buffer as per > the terms of the dma-buf software contract. > > This leads us onto the main outstanding TODOs > > - 24 bit GBM buffer support leading > - 24 bit output framebuffer support > - Surfaceless GBM and eGL context with no swapbuffer > - Render to texture > If we render directly to a buffer provided to the GPU the output > buffer we will not need to memcpy() to the output buffer > nor will we need to invalidate the output buffer cache. > - eglCreateImageKHR for the texture upload. > > This list is of the colour "make it go faster" not "make it work" which is > why we are moving to start to submit a v1 for discussion in the full > realisation it will have to go through several cycles of review giving us > the opportunity to fix: > > - Doxygen is missing for new classes and methods > - Some of the pipelines don't complete in gitlab > - 24 bit output seems doable before merge > - Render to texture perhaps even too > > For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam > with about 75% CPU usage versus > 100% - cam goes faster which to me > implies a good bit of time is being consumed in qcam itself. > > The series starts out with fixes and updates from Hans and finishes it out > with shader modifications from Milan both of whom along with Kieran, > Laurent and Maxime I'd like to thank for being some helpful and patient. > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> > --- > Bryan O'Donoghue (28): > libcamera: MappedFrameBuffer: Add MappedFrameBuffer::getPlaneFD() > libcamera: software_isp: Move useful items from DebayerCpu to Debayer base class > libcamera: software_isp: Move Bayer params init from DebayerCpu to Debayer > libcamera: software_isp: Move param select code to Debayer base class > libcamera: software_isp: Move DMA Sync code to Debayer base class > libcamera: software_isp: Move isStandardBayerOrder to base class > libcamera: software_isp: Start the ISP thread in configure > libcamera: software_isp: Move configure to worker thread > libcamera: software_isp: debayer: Make the debayer_ object of type class Debayer not DebayerCpu > libcamera: software_isp: debayer: Extend DebayerParams struct to hold a copy of per-frame CCM values > libcamera: software_isp: debayer: Introduce a stop() callback to the debayer object > libcamera: shaders: Move GL shader programs to src/libcamera/assets/shader > utils: gen-shader-headers: Add a utility to generate headers from shaders > meson: Automatically generate glsl_shaders.h from specified shader programs > libcamera: software_isp: ccm: Populate CCM table to Debayer params structure > libcamera: software_isp: lut: Make gain corrected CCM in lut.cpp available in debayer params > libcamera: software_isp: gbm: Add in a GBM helper class for GPU surface access > libcamera: utils: Move ALIGN_TO from camera_metadata.c to utils.h > libcamera: software_isp: egl: Introduce an eGL base helper class > libcamera: software_isp: debayer_egl: Add an eGL debayer class > libcamera: software_isp: debayer_egl: Make DebayerEGL an environment option > libcamera: shaders: Use highp not mediump for float precision > libcamera: shaders: Extend debayer shaders to apply RGB gain values on output > libcamera: software_isp: Switch on uncalibrated CCM to validate eGLDebayer > libcamera: software_isp: Make isStandardBayerOrder static > libcamera: software_isp: debayer_cpu: Make getInputConfig and getOutputConfig static > libcamera: shaders: Extend bayer shaders to support swapping R and B on output > libcamera: software_isp: Add a gpuisp todo list > > Hans de Goede (5): > libcamera: swstats_cpu: Update statsProcessFn() / processLine0() documentation > libcamera: swstats_cpu: Drop patternSize_ documentation > libcamera: swstats_cpu: Move header to libcamera/internal/software_isp > libcamera: software_isp: Move benchmark code to its own class > libcamera: swstats_cpu: Add processFrame() method > > Milan Zamazal (4): > libcamera: shaders: Fix neighbouring positions in 8-bit debayering > libcamera: software_isp: GPU support for unpacked 10/12-bit formats > libcamera: shaders: Rename bayer_8 to bayer_unpacked > libcamera: software_isp: Reduce statistics image area > > include/libcamera/base/utils.h | 3 + > include/libcamera/internal/egl.h | 133 +++++ > include/libcamera/internal/gbm.h | 39 ++ > include/libcamera/internal/mapped_framebuffer.h | 4 + > include/libcamera/internal/meson.build | 11 + > .../libcamera/internal/shaders}/RGB.frag | 2 +- > .../libcamera/internal/shaders}/YUV_2_planes.frag | 2 +- > .../libcamera/internal/shaders}/YUV_3_planes.frag | 2 +- > .../libcamera/internal/shaders}/YUV_packed.frag | 2 +- > .../internal/shaders}/bayer_1x_packed.frag | 62 +- > .../libcamera/internal/shaders/bayer_unpacked.frag | 78 ++- > .../libcamera/internal/shaders/bayer_unpacked.vert | 8 +- > .../libcamera/internal/shaders}/identity.vert | 0 > include/libcamera/internal/shaders/meson.build | 10 + > .../libcamera/internal/software_isp/benchmark.h | 36 ++ > .../internal/software_isp/debayer_params.h | 7 + > .../libcamera/internal/software_isp/meson.build | 2 + > .../libcamera/internal/software_isp/software_isp.h | 5 +- > .../libcamera/internal}/software_isp/swstats_cpu.h | 12 + > src/android/metadata/camera_metadata.c | 4 +- > src/apps/qcam/assets/shader/shaders.qrc | 16 +- > src/apps/qcam/viewfinder_gl.cpp | 70 +-- > src/ipa/simple/algorithms/ccm.cpp | 4 +- > src/ipa/simple/algorithms/lut.cpp | 1 + > src/ipa/simple/data/uncalibrated.yaml | 12 +- > src/libcamera/egl.cpp | 408 +++++++++++++ > src/libcamera/gbm.cpp | 61 ++ > src/libcamera/mapped_framebuffer.cpp | 10 + > src/libcamera/meson.build | 34 ++ > src/libcamera/software_isp/benchmark.cpp | 93 +++ > src/libcamera/software_isp/debayer.cpp | 61 ++ > src/libcamera/software_isp/debayer.h | 42 +- > src/libcamera/software_isp/debayer_cpu.cpp | 88 +-- > src/libcamera/software_isp/debayer_cpu.h | 44 +- > src/libcamera/software_isp/debayer_egl.cpp | 628 +++++++++++++++++++++ > src/libcamera/software_isp/debayer_egl.h | 171 ++++++ > src/libcamera/software_isp/gpuisp-todo.txt | 61 ++ > src/libcamera/software_isp/meson.build | 9 + > src/libcamera/software_isp/software_isp.cpp | 40 +- > src/libcamera/software_isp/swstats_cpu.cpp | 89 ++- > utils/gen-shader-header.py | 38 ++ > utils/gen-shader-headers.sh | 44 ++ > utils/meson.build | 2 + > 43 files changed, 2236 insertions(+), 212 deletions(-) > --- > base-commit: 1bd66f54a6bc928f99e321630f43d200df4d3579 > change-id: 20250823-b4-v0-5-2-gpuisp-v2-a-d40b3b78d741 > > Best regards,
On 25/08/2025 16:34, Milan Zamazal wrote: >> Milan Zamal has reported a TI AM69 + imx219 - unknown resolution > The resolution is 3280x2464. I get full speed of 15 fps with GPU, 5-7.5 > fps with CPU (with CCM). > > Apparently with GPU only, I get occasional horizontal artefacts on my > display when there is movement in the image -- is this expected? Not at all. Can you capture a video of it ? --- bod
Bryan O'Donoghue <bryan.odonoghue@linaro.org> writes: > On 25/08/2025 16:34, Milan Zamazal wrote: >>> Milan Zamal has reported a TI AM69 + imx219 - unknown resolution >> The resolution is 3280x2464. I get full speed of 15 fps with GPU, 5-7.5 >> fps with CPU (with CCM). >> Apparently with GPU only, I get occasional horizontal artefacts on my >> display when there is movement in the image -- is this expected? > > Not at all. > > Can you capture a video of it ? I haven't succeeded so far. But my guess is that it is related to changing buffers, like an unfinished writing to a buffer over its previous contents. There is an occasional horizontal "line" near the bottom of the image, regardless of the camera orientation. With a static scene, there is no such (clearly visible) problem. But with a moving scene, it looks to me like different images overlap and then there is a visible disruption at the place where one image finishes and the other one starts: +--------------+ | | | | | | | ----- | +--------------+ Is there anything I can try in the code to confirm the source of the problem?
On 27/08/2025 10:40, Milan Zamazal wrote: > Bryan O'Donoghue <bryan.odonoghue@linaro.org> writes: > >> On 25/08/2025 16:34, Milan Zamazal wrote: >>>> Milan Zamal has reported a TI AM69 + imx219 - unknown resolution >>> The resolution is 3280x2464. I get full speed of 15 fps with GPU, 5-7.5 >>> fps with CPU (with CCM). >>> Apparently with GPU only, I get occasional horizontal artefacts on my >>> display when there is movement in the image -- is this expected? >> >> Not at all. >> >> Can you capture a video of it ? > > I haven't succeeded so far. But my guess is that it is related to > changing buffers, like an unfinished writing to a buffer over its > previous contents. There is an occasional horizontal "line" near the > bottom of the image, regardless of the camera orientation. With a > static scene, there is no such (clearly visible) problem. But with a > moving scene, it looks to me like different images overlap and then > there is a visible disruption at the place where one image finishes and > the other one starts: > > +--------------+ > | | > | | > | | > | ----- | > +--------------+ > > Is there anything I can try in the code to confirm the source of the > problem? > I did see temporal artifacts on the qcom system with the base-case fragment shaders running inside of qcam, which then magically went away with some of the rebasing we did. I wonder could you run the fragment shaders inside of qcam with the raw stream and test again ? --- bod
Bryan O'Donoghue <bryan.odonoghue@linaro.org> writes: > On 27/08/2025 10:40, Milan Zamazal wrote: >> Bryan O'Donoghue <bryan.odonoghue@linaro.org> writes: >> > >>> On 25/08/2025 16:34, Milan Zamazal wrote: >>>>> Milan Zamal has reported a TI AM69 + imx219 - unknown resolution >>>> The resolution is 3280x2464. I get full speed of 15 fps with GPU, 5-7.5 >>>> fps with CPU (with CCM). >>>> Apparently with GPU only, I get occasional horizontal artefacts on my >>>> display when there is movement in the image -- is this expected? >>> >>> Not at all. >>> >>> Can you capture a video of it ? >> I haven't succeeded so far. But my guess is that it is related to >> changing buffers, like an unfinished writing to a buffer over its >> previous contents. There is an occasional horizontal "line" near the >> bottom of the image, regardless of the camera orientation. With a >> static scene, there is no such (clearly visible) problem. But with a >> moving scene, it looks to me like different images overlap and then >> there is a visible disruption at the place where one image finishes and >> the other one starts: >> +--------------+ >> | | >> | | >> | | >> | ----- | >> +--------------+ >> Is there anything I can try in the code to confirm the source of the >> problem? >> > > I did see temporal artifacts on the qcom system with the base-case fragment shaders running inside of > qcam, which then magically went away with some of the rebasing we did. > > I wonder could you run the fragment shaders inside of qcam with the raw stream and test again ? Unfortunately not, Qt applications don't work with the TI drivers I use.
I tried running this series with Pipewire - minus some issues that I'll address in separate mails it seems to work pretty well \o/ Here's a short howto in case anyone else wants to try: 1. Build PW against this series 2. The LIBCAMERA_SOFTISP_MODE env var can be enabled via a service drop-in override: systemctl --user edit pipewire add the lines: [Service] Environment=LIBCAMERA_SOFTISP_MODE=gpu SystemCallFilter=mincore Note that you can use the Environment variables several times - i.e. you could add Environment=LIBGL_ALWAYS_SOFTWARE=1 it order to test llvmpipe. After saving, restart PW to make the change take effect systemctl --user restart pipewire 3. The mincore syscal is needed for Mesas EGL implementation - but not for Vulkan or OpenCL. I haven't looked deeper into it, however assuming Mesa/EGL really needs it we'll probably want to enable it by default in PW - feedback welcome: https://gitlab.freedesktop.org/pipewire/pipewire/-/merge_requests/2530 Regards
On 29/08/2025 09:29, Robert Mader wrote: > I tried running this series with Pipewire - minus some issues that I'll > address in separate mails it seems to work pretty well \o/ > > Here's a short howto in case anyone else wants to try: > > 1. Build PW against this series > > 2. The LIBCAMERA_SOFTISP_MODE env var can be enabled via a service drop- > in override: > > systemctl --user edit pipewire > > add the lines: > > [Service] Environment=LIBCAMERA_SOFTISP_MODE=gpu SystemCallFilter=mincore > > Note that you can use the Environment variables several times - i.e. you > could add > > Environment=LIBGL_ALWAYS_SOFTWARE=1 > > it order to test llvmpipe. > > After saving, restart PW to make the change take effect > > systemctl --user restart pipewire > > 3. The mincore syscal is needed for Mesas EGL implementation - but not > for Vulkan or OpenCL. I haven't looked deeper into it, however assuming > Mesa/EGL really needs it we'll probably want to enable it by default in > PW - feedback welcome: https://gitlab.freedesktop.org/pipewire/ > pipewire/-/merge_requests/2530 > > Regards > > -- > Robert Mader > Consultant Software Developer Hey Robert. Great news, thanks for testing. One thing worth trying is toggling the CCM between GPU and CPU mode. https://patchwork.libcamera.org/patch/24213/ This is when you start seeing the 4x, 15x or 20x type performance diff between CPU and GPU. That should be the difference between works "pretty well" and "works at all" :) --- bod
v2: This version 2 is an incomplete update with-respect-to previous comment feedback, which ordinarily I would not publish however, given OSSEU is starting on Monday and we have talk about this topic, in addition to some pretty good progress in the interregnum I thought a v2 would be appropriate. - V2 drops use of GBM surface in favour of generating a framebuffer from the dma-buf handle, called render-to-texture. The conversion from GBM surface + memcpy() including the associated cache invalidate has a dramatic effect on GPUISP performance. Some rough stats for a Qualcomm sm8250 "kona" device with an imx517 sensor @ 4048 x 3040 ABRG8888 - debug builds CPUISP + CCM: 2 FPS CPU usage > 100% single core pulls about 9 watts GPUISP v1 + CCM: 14 FPS - power not measured GPUISP v2 + CCM: 30 FPS - sensor linerate - CPU usage ~ 70 % pulling 8 Watts. Milan Zamal has reported a TI AM69 + imx219 - unknown resolution CPUISP 4 FPS GPUISP v2 - 2 or 3 FPS GPUISP v2 - 15 FPS == sensor linerate In other words for these boards we can hit linerate with GPUISP + 3A + CCM. - Drop GBM surface rendering - Drop swapbuffers - Use eglCreateImageKHR to directly render into the output dma-buf buffer eglCreateImageKHR lets you specify the FOURCC of the texture which means we can create the texture in the uncompressed target output pixel format we want. - Fix stride calculation to 256 bytes Laurent and Maxime explained to me about GPU stride alignments being tribal wisdom and that 256 bytes is a good cross-platform value. This helped to get the render-to-texture command right. - A synchronous blocking wait is used to ensure GPU operations have completed. Laurent wants this to be made async. At the moment its not clear to me the eglWaitSyncKHR is really required and in any case doesn't seem to have any performance impact. But this part is still TBD - I've included the sync wait for simplicity and safety. - A Debayer::stop() method has been introduced to ensure we call eglDestroySyncKHR when the eGL context is valid, as opposed to in the callchain of destructors triggering eGL::~eGL(); - stats move constructor call chain dropped - Branabas - Incorporates Milan's area-of-interest constraint for Bayer stats i.e. squashes his v3 update into debayer_egl.cpp directly - Moves ALIGN_TO into a common area to facilitate its reuse in egl.cpp - Rebases on 0.5.2 - There are a number of known checks failing on the CI loop right now Link to v1: https://lists.libcamera.org/pipermail/libcamera-devel/2025-June/050692.html v1: This series introduces a GLES 2.0 GPU ISP to libcamera. We have had extensive discussions, meetings and collaborative discussions about this topic over the last year or so. As an overview we want to start to move as much processing of software_isp into the GPU as possible. This is especially advantageous when we are talking about processing a framebuffer's worth of pixels as quickly as possible. The decision to use GLES 2.0 instead of say Vulcan stems from a desire to support as much in the way of older hardware as possible and the fact we already have upstream GLES 2.0 fragment shaders to do debayer. Generally the approach is - Move the fragment shaders out of qcam and into a common location - Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate addition of a new class DebayerEGL. - Introduce that class - Then do progressive change of the shaders and DebayerEGL class to make the modifications as transparent as possible in the git log. - Reuse as much of the SoftIPA data-structures and logic as possible. - Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and GPUISP give similar - hopefully the same results but with GPUISP going faster. In order to get untiled and uncompressed pixel data out of the GPU framebuffer we need to tell the GPU how to store the data it is writing to that framebuffer. GPUs can store their framebuffer data in tiled or even compressed formats which is why the naive approach of running your fragment shader and then using glReadPixels(GL_RGBA); will be horrendously slow as glReadPixels must convert from the internal GPU format to the requested output format - an operation that for me takes ~ 10 milliseconds per frame. Instead we get the GPU to store its data as ARGB8888 swap buffers and memcpy() from the swapped buffer to our output frame. Right now this series supports 32 bit output formats only. The memcpy() also entails flushing the cache of the target buffer as per the terms of the dma-buf software contract. This leads us onto the main outstanding TODOs - 24 bit GBM buffer support leading - 24 bit output framebuffer support - Surfaceless GBM and eGL context with no swapbuffer - Render to texture If we render directly to a buffer provided to the GPU the output buffer we will not need to memcpy() to the output buffer nor will we need to invalidate the output buffer cache. - eglCreateImageKHR for the texture upload. This list is of the colour "make it go faster" not "make it work" which is why we are moving to start to submit a v1 for discussion in the full realisation it will have to go through several cycles of review giving us the opportunity to fix: - Doxygen is missing for new classes and methods - Some of the pipelines don't complete in gitlab - 24 bit output seems doable before merge - Render to texture perhaps even too For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam with about 75% CPU usage versus > 100% - cam goes faster which to me implies a good bit of time is being consumed in qcam itself. The series starts out with fixes and updates from Hans and finishes it out with shader modifications from Milan both of whom along with Kieran, Laurent and Maxime I'd like to thank for being some helpful and patient. Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> --- Bryan O'Donoghue (28): libcamera: MappedFrameBuffer: Add MappedFrameBuffer::getPlaneFD() libcamera: software_isp: Move useful items from DebayerCpu to Debayer base class libcamera: software_isp: Move Bayer params init from DebayerCpu to Debayer libcamera: software_isp: Move param select code to Debayer base class libcamera: software_isp: Move DMA Sync code to Debayer base class libcamera: software_isp: Move isStandardBayerOrder to base class libcamera: software_isp: Start the ISP thread in configure libcamera: software_isp: Move configure to worker thread libcamera: software_isp: debayer: Make the debayer_ object of type class Debayer not DebayerCpu libcamera: software_isp: debayer: Extend DebayerParams struct to hold a copy of per-frame CCM values libcamera: software_isp: debayer: Introduce a stop() callback to the debayer object libcamera: shaders: Move GL shader programs to src/libcamera/assets/shader utils: gen-shader-headers: Add a utility to generate headers from shaders meson: Automatically generate glsl_shaders.h from specified shader programs libcamera: software_isp: ccm: Populate CCM table to Debayer params structure libcamera: software_isp: lut: Make gain corrected CCM in lut.cpp available in debayer params libcamera: software_isp: gbm: Add in a GBM helper class for GPU surface access libcamera: utils: Move ALIGN_TO from camera_metadata.c to utils.h libcamera: software_isp: egl: Introduce an eGL base helper class libcamera: software_isp: debayer_egl: Add an eGL debayer class libcamera: software_isp: debayer_egl: Make DebayerEGL an environment option libcamera: shaders: Use highp not mediump for float precision libcamera: shaders: Extend debayer shaders to apply RGB gain values on output libcamera: software_isp: Switch on uncalibrated CCM to validate eGLDebayer libcamera: software_isp: Make isStandardBayerOrder static libcamera: software_isp: debayer_cpu: Make getInputConfig and getOutputConfig static libcamera: shaders: Extend bayer shaders to support swapping R and B on output libcamera: software_isp: Add a gpuisp todo list Hans de Goede (5): libcamera: swstats_cpu: Update statsProcessFn() / processLine0() documentation libcamera: swstats_cpu: Drop patternSize_ documentation libcamera: swstats_cpu: Move header to libcamera/internal/software_isp libcamera: software_isp: Move benchmark code to its own class libcamera: swstats_cpu: Add processFrame() method Milan Zamazal (4): libcamera: shaders: Fix neighbouring positions in 8-bit debayering libcamera: software_isp: GPU support for unpacked 10/12-bit formats libcamera: shaders: Rename bayer_8 to bayer_unpacked libcamera: software_isp: Reduce statistics image area include/libcamera/base/utils.h | 3 + include/libcamera/internal/egl.h | 133 +++++ include/libcamera/internal/gbm.h | 39 ++ include/libcamera/internal/mapped_framebuffer.h | 4 + include/libcamera/internal/meson.build | 11 + .../libcamera/internal/shaders}/RGB.frag | 2 +- .../libcamera/internal/shaders}/YUV_2_planes.frag | 2 +- .../libcamera/internal/shaders}/YUV_3_planes.frag | 2 +- .../libcamera/internal/shaders}/YUV_packed.frag | 2 +- .../internal/shaders}/bayer_1x_packed.frag | 62 +- .../libcamera/internal/shaders/bayer_unpacked.frag | 78 ++- .../libcamera/internal/shaders/bayer_unpacked.vert | 8 +- .../libcamera/internal/shaders}/identity.vert | 0 include/libcamera/internal/shaders/meson.build | 10 + .../libcamera/internal/software_isp/benchmark.h | 36 ++ .../internal/software_isp/debayer_params.h | 7 + .../libcamera/internal/software_isp/meson.build | 2 + .../libcamera/internal/software_isp/software_isp.h | 5 +- .../libcamera/internal}/software_isp/swstats_cpu.h | 12 + src/android/metadata/camera_metadata.c | 4 +- src/apps/qcam/assets/shader/shaders.qrc | 16 +- src/apps/qcam/viewfinder_gl.cpp | 70 +-- src/ipa/simple/algorithms/ccm.cpp | 4 +- src/ipa/simple/algorithms/lut.cpp | 1 + src/ipa/simple/data/uncalibrated.yaml | 12 +- src/libcamera/egl.cpp | 408 +++++++++++++ src/libcamera/gbm.cpp | 61 ++ src/libcamera/mapped_framebuffer.cpp | 10 + src/libcamera/meson.build | 34 ++ src/libcamera/software_isp/benchmark.cpp | 93 +++ src/libcamera/software_isp/debayer.cpp | 61 ++ src/libcamera/software_isp/debayer.h | 42 +- src/libcamera/software_isp/debayer_cpu.cpp | 88 +-- src/libcamera/software_isp/debayer_cpu.h | 44 +- src/libcamera/software_isp/debayer_egl.cpp | 628 +++++++++++++++++++++ src/libcamera/software_isp/debayer_egl.h | 171 ++++++ src/libcamera/software_isp/gpuisp-todo.txt | 61 ++ src/libcamera/software_isp/meson.build | 9 + src/libcamera/software_isp/software_isp.cpp | 40 +- src/libcamera/software_isp/swstats_cpu.cpp | 89 ++- utils/gen-shader-header.py | 38 ++ utils/gen-shader-headers.sh | 44 ++ utils/meson.build | 2 + 43 files changed, 2236 insertions(+), 212 deletions(-) --- base-commit: 1bd66f54a6bc928f99e321630f43d200df4d3579 change-id: 20250823-b4-v0-5-2-gpuisp-v2-a-d40b3b78d741 Best regards,