[00/35] Add GLES 2.0 GPUISP to libcamera
mbox series

Message ID 20250611013245.133785-1-bryan.odonoghue@linaro.org
Headers show
Series
  • Add GLES 2.0 GPUISP to libcamera
Related show

Message

Bryan O'Donoghue June 11, 2025, 1:32 a.m. UTC
This series introduces a GLES 2.0 GPU ISP to libcamera.

We have had extensive discussions, meetings and collaborative discussions
about this topic over the last year or so.

As an overview we want to start to move as much processing of software_isp
into the GPU as possible. This is especially advantageous when we are
talking about processing a framebuffer's worth of pixels as quickly as
possible.

The decision to use GLES 2.0 instead of say Vulcan stems from a desire to
support as much in the way of older hardware as possible and the fact we
already have upstream GLES 2.0 fragment shaders to do debayer.

Generally the approach is

- Move the fragment shaders out of qcam and into a common location
- Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate
  addition of a new class DebayerEGL.
- Introduce that class
- Then do progressive change of the shaders and DebayerEGL class to make
  the modifications as transparent as possible in the git log.
- Reuse as much of the SoftIPA data-structures and logic as possible.
- Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and
  GPUISP give similar - hopefully the same results but with GPUISP going
  faster.

In order to get untiled and uncompressed pixel data out of the GPU
framebuffer we need to tell the GPU how to store the data it is writing to
that framebuffer. GPUs can store their framebuffer data in tiled or even
compressed formats which is why the naive approach of running your fragment
shader and then using glReadPixels(GL_RGBA); will be horrendously slow as
glReadPixels must convert from the internal GPU format to the requested
output format - an operation that for me takes ~ 10 milliseconds per frame.

Instead we get the GPU to store its data as ARGB8888 swap buffers and
memcpy() from the swapped buffer to our output frame. Right now this series
supports 32 bit output formats only.

The memcpy() also entails flushing the cache of the target buffer as per
the terms of the dma-buf software contract.

This leads us onto the main outstanding TODOs

- 24 bit GBM buffer support leading
- 24 bit output framebuffer support
- Surfaceless GBM and eGL context with no swapbuffer
- Render to texture
  If we render directly to a buffer provided to the GPU the output
  buffer we will not need to memcpy() to the output buffer
  nor will we need to invalidate the output buffer cache.
- eglCreateImageKHR for the texture upload.

This list is of the colour "make it go faster" not "make it work" which is
why we are moving to start to submit a v1 for discussion in the full
realisation it will have to go through several cycles of review giving us
the opportunity to fix:

- Doxygen is missing for new classes and methods
- Some of the pipelines don't complete in gitlab
- 24 bit output seems doable before merge
- Render to texture perhaps even too

For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam
with about 75% CPU usage versus > 100% - cam goes faster which to me
implies a good bit of time is being consumed in qcam itself.

The series starts out with fixes and updates from Hans and finishes it out
with shader modifications from Milan both of whom along with Kieran,
Laurent and Maxime I'd like to thank for being some helpful and patient.

Bryan O'Donoghue (27):
  libcamera: MappedFrameBuffer: Latch a pointer to the framebuffer
  libcamera: MappedFrameBuffer: Add MappedFrameBuffer::getPlaneFD()
  libcamera: software_isp: Move useful items from DebayerCpu to Debayer
    base class
  libcamera: software_isp: Move Bayer parans init from DebayerCpu to
    Debayer
  libcamera: software_isp: Move param select code to Debayer base class
  libcamera: software_isp: Move isStandardBayerOrder to base class
  libcamera: software_isp: Start the ISP thread in configure
  libcamera: software_isp: Move configure to worker thread
  libcamera: software_isp: debayer: Make the debayer_ object of type
    class Debayer not DebayerCpu
  libcamera: software_isp: debayer: Extend DebayerParams struct to hold
    a copy of per-frame CCM values
  libcamera: shaders: Move GL shader programs to
    src/libcamera/assets/shader
  utils: gen-shader-headers: Add a utility to generate headers from
    shaders
  meson: Automatically generate glsl_shaders.h from specified shader
    programs
  libcamera: software_isp: ccm: Populate CCM table to Debayer params
    structure
  libcamera: software_isp: lut: Make gain corrected CCM in lut.cpp
    available in debayer params
  libcamera: software_isp: gbm: Add in a GBM helper class for GPU
    surface access
  libcamera: software_isp: egl: Introduce an eGL base helper class
  libcamera: software_isp: debayer_egl: Add an eGL debayer class
  libcamera: software_isp: debayer_egl: Make DebayerEGL an environment
    option
  libcamera: shaders: Use highp not mediump for float precision
  libcamera: shaders: Extend debayer shaders to apply RGB gain values on
    output
  libcamera: software_isp: debayer_egl: Convert from identity CCM to CCM
    calculated by SoftIPA
  libcamera: software_isp: Switch on uncalibrated CCM to validate
    eGLDebayer
  libcamera: software_isp: Make isStandardBayerOrder static
  libcamera: software_isp: debayer_cpu: Make getInputConfig and
    getOutputConfig static
  libcamera: shaders: Extend bayer shaders to support swapping R and B
    on output
  libcamera: software_isp: Add a gpuisp todo list

Hans de Goede (5):
  libcamera: swstats_cpu: Update statsProcessFn() / processLine0()
    documentation
  libcamera: swstats_cpu: Drop patternSize_ documentation
  libcamera: swstats_cpu: Move header to libcamera/internal/software_isp
  libcamera: software_isp: Move benchmark code to its own class
  libcamera: swstats_cpu: Add processFrame() method

Milan Zamazal (3):
  libcamera: shaders: Fix neighbouring positions in 8-bit debayering
  libcamera: software_isp: GPU support for unpacked 10/12-bit formats
  libcamera: shaders: Rename bayer_8 to bayer_unpacked

 include/libcamera/internal/egl.h              | 110 +++
 include/libcamera/internal/gbm.h              |  55 ++
 .../libcamera/internal/mapped_framebuffer.h   |   4 +
 include/libcamera/internal/meson.build        |  11 +
 .../libcamera/internal/shaders}/RGB.frag      |   2 +-
 .../internal/shaders}/YUV_2_planes.frag       |   2 +-
 .../internal/shaders}/YUV_3_planes.frag       |   2 +-
 .../internal/shaders}/YUV_packed.frag         |   2 +-
 .../internal/shaders}/bayer_1x_packed.frag    |  62 +-
 .../internal/shaders/bayer_unpacked.frag      |  78 ++-
 .../internal/shaders/bayer_unpacked.vert      |   8 +-
 .../libcamera/internal/shaders}/identity.vert |   0
 .../libcamera/internal/shaders/meson.build    |  10 +
 .../internal/software_isp/benchmark.h         |  36 +
 .../internal/software_isp/debayer_params.h    |   7 +
 .../internal/software_isp/meson.build         |   2 +
 .../internal/software_isp/software_isp.h      |   5 +-
 .../internal}/software_isp/swstats_cpu.h      |  12 +
 src/apps/qcam/assets/shader/shaders.qrc       |  16 +-
 src/apps/qcam/viewfinder_gl.cpp               |  70 +-
 src/ipa/simple/algorithms/ccm.cpp             |   4 +-
 src/ipa/simple/algorithms/lut.cpp             |   1 +
 src/ipa/simple/data/uncalibrated.yaml         |  12 +-
 src/libcamera/egl.cpp                         | 369 ++++++++++
 src/libcamera/gbm.cpp                         | 137 ++++
 src/libcamera/mapped_framebuffer.cpp          |   7 +
 src/libcamera/meson.build                     |  34 +
 src/libcamera/software_isp/benchmark.cpp      |  93 +++
 src/libcamera/software_isp/debayer.cpp        |  61 ++
 src/libcamera/software_isp/debayer.h          |  41 +-
 src/libcamera/software_isp/debayer_cpu.cpp    |  81 +--
 src/libcamera/software_isp/debayer_cpu.h      |  44 +-
 src/libcamera/software_isp/debayer_egl.cpp    | 632 ++++++++++++++++++
 src/libcamera/software_isp/debayer_egl.h      | 171 +++++
 src/libcamera/software_isp/gpuisp-todo.txt    |  42 ++
 src/libcamera/software_isp/meson.build        |   9 +
 src/libcamera/software_isp/software_isp.cpp   |  37 +-
 src/libcamera/software_isp/swstats_cpu.cpp    |  89 ++-
 utils/gen-shader-header.py                    |  38 ++
 utils/gen-shader-headers.sh                   |  44 ++
 utils/meson.build                             |   2 +
 41 files changed, 2234 insertions(+), 208 deletions(-)
 create mode 100644 include/libcamera/internal/egl.h
 create mode 100644 include/libcamera/internal/gbm.h
 rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/RGB.frag (93%)
 rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/YUV_2_planes.frag (97%)
 rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/YUV_3_planes.frag (96%)
 rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/YUV_packed.frag (99%)
 rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/bayer_1x_packed.frag (76%)
 rename src/apps/qcam/assets/shader/bayer_8.frag => include/libcamera/internal/shaders/bayer_unpacked.frag (56%)
 rename src/apps/qcam/assets/shader/bayer_8.vert => include/libcamera/internal/shaders/bayer_unpacked.vert (85%)
 rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/identity.vert (100%)
 create mode 100644 include/libcamera/internal/shaders/meson.build
 create mode 100644 include/libcamera/internal/software_isp/benchmark.h
 rename {src/libcamera => include/libcamera/internal}/software_isp/swstats_cpu.h (86%)
 create mode 100644 src/libcamera/egl.cpp
 create mode 100644 src/libcamera/gbm.cpp
 create mode 100644 src/libcamera/software_isp/benchmark.cpp
 create mode 100644 src/libcamera/software_isp/debayer_egl.cpp
 create mode 100644 src/libcamera/software_isp/debayer_egl.h
 create mode 100644 src/libcamera/software_isp/gpuisp-todo.txt
 create mode 100755 utils/gen-shader-header.py
 create mode 100755 utils/gen-shader-headers.sh

Comments

Bryan O'Donoghue June 11, 2025, 1:41 a.m. UTC | #1
On 11/06/2025 02:32, Bryan O'Donoghue wrote:
> This series introduces a GLES 2.0 GPU ISP to libcamera.
> 
> We have had extensive discussions, meetings and collaborative discussions
> about this topic over the last year or so.
> 
> As an overview we want to start to move as much processing of software_isp
> into the GPU as possible. This is especially advantageous when we are
> talking about processing a framebuffer's worth of pixels as quickly as
> possible.
> 
> The decision to use GLES 2.0 instead of say Vulcan stems from a desire to
> support as much in the way of older hardware as possible and the fact we
> already have upstream GLES 2.0 fragment shaders to do debayer.
> 
> Generally the approach is
> 
> - Move the fragment shaders out of qcam and into a common location
> - Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate
>    addition of a new class DebayerEGL.
> - Introduce that class
> - Then do progressive change of the shaders and DebayerEGL class to make
>    the modifications as transparent as possible in the git log.
> - Reuse as much of the SoftIPA data-structures and logic as possible.
> - Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and
>    GPUISP give similar - hopefully the same results but with GPUISP going
>    faster.
> 
> In order to get untiled and uncompressed pixel data out of the GPU
> framebuffer we need to tell the GPU how to store the data it is writing to
> that framebuffer. GPUs can store their framebuffer data in tiled or even
> compressed formats which is why the naive approach of running your fragment
> shader and then using glReadPixels(GL_RGBA); will be horrendously slow as
> glReadPixels must convert from the internal GPU format to the requested
> output format - an operation that for me takes ~ 10 milliseconds per frame.
> 
> Instead we get the GPU to store its data as ARGB8888 swap buffers and
> memcpy() from the swapped buffer to our output frame. Right now this series
> supports 32 bit output formats only.
> 
> The memcpy() also entails flushing the cache of the target buffer as per
> the terms of the dma-buf software contract.
> 
> This leads us onto the main outstanding TODOs
> 
> - 24 bit GBM buffer support leading
> - 24 bit output framebuffer support
> - Surfaceless GBM and eGL context with no swapbuffer
> - Render to texture
>    If we render directly to a buffer provided to the GPU the output
>    buffer we will not need to memcpy() to the output buffer
>    nor will we need to invalidate the output buffer cache.
> - eglCreateImageKHR for the texture upload.
> 
> This list is of the colour "make it go faster" not "make it work" which is
> why we are moving to start to submit a v1 for discussion in the full
> realisation it will have to go through several cycles of review giving us
> the opportunity to fix:
> 
> - Doxygen is missing for new classes and methods
> - Some of the pipelines don't complete in gitlab
> - 24 bit output seems doable before merge
> - Render to texture perhaps even too
> 
> For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam
> with about 75% CPU usage versus > 100% - cam goes faster which to me
> implies a good bit of time is being consumed in qcam itself.
> 
> The series starts out with fixes and updates from Hans and finishes it out
> with shader modifications from Milan both of whom along with Kieran,
> Laurent and Maxime I'd like to thank for being some helpful and patient.
Oh I forgot to mention, you need an environment variable to run with 
GPUISP until we make it the default.

% LIBCAMERA_SOFTISP_MODE=gpu ./builds/build.master.dbg/src/apps/qcam/qcam

---
bod