Message ID | 20250611013245.133785-1-bryan.odonoghue@linaro.org |
---|---|
Headers | show |
Series |
|
Related | show |
On 11/06/2025 02:32, Bryan O'Donoghue wrote: > This series introduces a GLES 2.0 GPU ISP to libcamera. > > We have had extensive discussions, meetings and collaborative discussions > about this topic over the last year or so. > > As an overview we want to start to move as much processing of software_isp > into the GPU as possible. This is especially advantageous when we are > talking about processing a framebuffer's worth of pixels as quickly as > possible. > > The decision to use GLES 2.0 instead of say Vulcan stems from a desire to > support as much in the way of older hardware as possible and the fact we > already have upstream GLES 2.0 fragment shaders to do debayer. > > Generally the approach is > > - Move the fragment shaders out of qcam and into a common location > - Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate > addition of a new class DebayerEGL. > - Introduce that class > - Then do progressive change of the shaders and DebayerEGL class to make > the modifications as transparent as possible in the git log. > - Reuse as much of the SoftIPA data-structures and logic as possible. > - Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and > GPUISP give similar - hopefully the same results but with GPUISP going > faster. > > In order to get untiled and uncompressed pixel data out of the GPU > framebuffer we need to tell the GPU how to store the data it is writing to > that framebuffer. GPUs can store their framebuffer data in tiled or even > compressed formats which is why the naive approach of running your fragment > shader and then using glReadPixels(GL_RGBA); will be horrendously slow as > glReadPixels must convert from the internal GPU format to the requested > output format - an operation that for me takes ~ 10 milliseconds per frame. > > Instead we get the GPU to store its data as ARGB8888 swap buffers and > memcpy() from the swapped buffer to our output frame. Right now this series > supports 32 bit output formats only. > > The memcpy() also entails flushing the cache of the target buffer as per > the terms of the dma-buf software contract. > > This leads us onto the main outstanding TODOs > > - 24 bit GBM buffer support leading > - 24 bit output framebuffer support > - Surfaceless GBM and eGL context with no swapbuffer > - Render to texture > If we render directly to a buffer provided to the GPU the output > buffer we will not need to memcpy() to the output buffer > nor will we need to invalidate the output buffer cache. > - eglCreateImageKHR for the texture upload. > > This list is of the colour "make it go faster" not "make it work" which is > why we are moving to start to submit a v1 for discussion in the full > realisation it will have to go through several cycles of review giving us > the opportunity to fix: > > - Doxygen is missing for new classes and methods > - Some of the pipelines don't complete in gitlab > - 24 bit output seems doable before merge > - Render to texture perhaps even too > > For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam > with about 75% CPU usage versus > 100% - cam goes faster which to me > implies a good bit of time is being consumed in qcam itself. > > The series starts out with fixes and updates from Hans and finishes it out > with shader modifications from Milan both of whom along with Kieran, > Laurent and Maxime I'd like to thank for being some helpful and patient. Oh I forgot to mention, you need an environment variable to run with GPUISP until we make it the default. % LIBCAMERA_SOFTISP_MODE=gpu ./builds/build.master.dbg/src/apps/qcam/qcam --- bod