From patchwork Wed Oct 15 01:22:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bryan O'Donoghue X-Patchwork-Id: 24645 Return-Path: X-Original-To: parsemail@patchwork.libcamera.org Delivered-To: parsemail@patchwork.libcamera.org Received: from lancelot.ideasonboard.com (lancelot.ideasonboard.com [92.243.16.209]) by patchwork.libcamera.org (Postfix) with ESMTPS id 236F2BF415 for ; Wed, 15 Oct 2025 01:22:58 +0000 (UTC) Received: from lancelot.ideasonboard.com (localhost [IPv6:::1]) by lancelot.ideasonboard.com (Postfix) with ESMTP id 35F6F6060D; Wed, 15 Oct 2025 03:22:57 +0200 (CEST) Authentication-Results: lancelot.ideasonboard.com; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="SkVDBfcN"; dkim-atps=neutral Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lancelot.ideasonboard.com (Postfix) with ESMTPS id 90FE46031A for ; Wed, 15 Oct 2025 03:22:55 +0200 (CEST) Received: by mail-ej1-x62b.google.com with SMTP id a640c23a62f3a-b3f5e0e2bf7so1113249766b.3 for ; Tue, 14 Oct 2025 18:22:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1760491375; x=1761096175; darn=lists.libcamera.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7Qo25SVfoQDmiqPoUrcd73SnzB0XbmDu4mtZl4qjXU8=; b=SkVDBfcNoDZd2GujFU/DIm7kFfdvuYEzyMZ6HEVKNPREiTGdPp5IZpPku8KUO71L6q fTOJOogAHLWLo8dbRlDwoo3x7RSMtq7PH1CcVzj5Zw3uSxa94m+9Vld0Pv7QTFSSbmrv 49EXfmnjGJh1mGVgEBQoICN5fiZYEhPYFw/zQ2yzzUxC33bxa0tsQkwBxq5ZCW50NNYV zGyt8vTdm8BJZZqJ2ZJcqdhR1SAJVtYPdVctkePIAW/xVQuhKKfIA0MkE6YXRH/DmR0M l0Sh4V9AZiuv4qsfUbpg4kCZqRgBPnik+vnuZVNUttdt9spyd/bmgaLYcptZ7ZKkcWas 3/ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760491375; x=1761096175; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7Qo25SVfoQDmiqPoUrcd73SnzB0XbmDu4mtZl4qjXU8=; b=D442urJWeUmqUNJ01gQY+P/w6dOlVS0+05Q3PDNZG/nXKhjyYADBUiPT+svRPk89ed w3kOskoJa3vsZdqSi0+EM2HauZPdhxAwoDsuMwZtg2jG/UR9NKHXb7bN1u0bTGMUXHPq ib8wP1TNJyUOZLo+zAlR2G+tsAopXRR3CNX0Ckx5LQc6SkEra2TSjzL7p+5v3DSgVsSh rr6NlM5BOKHgojUohsvdcGhsrAl9RTGfg7LKDjTISbfVETKLMDmqbc9Rfd8nAq25xE1F e6C2j7QZ7VL0RtvwsqUpaFkqXEERp4ecc9IIumfbmQwTVn5CwVIBKk8LK4HtrZCeTmat P1AA== X-Gm-Message-State: AOJu0YzB1djkjBK0JngpXLfPk3h8Nl6qiQTdxCYHhcepEFvZrrBcnzvO E+IsSEufSL7O5WCyyjt8fMf/4jk7TupbRbzcUq/hx1yCTRbt0KfQRAyAZDQyLaRunX8b5klJV4X Rq34i X-Gm-Gg: ASbGncsjGorEgHdq0iP4aN2gZ9pfCyKEGP6V+bl5MWYhsyNq2pSO1YACJRPCzL0nIys 43J+3SrlPGraE1UoaPRGiW4LlGBU5DZG3T71KYhChJNDvozgk/Fjw8A0jLIpyNEfCPv6juI7HXu RUKGKcmXdYLD0E1VpxZzXyu6NnGmVFTCGT8VVRtGeZ2d0drcftHRSUcESaYUxHx4MEJOmJeGhox ymmU4WHs1cgghWPWH3PLQPnR6Wo8n90JXh4ciO5v2TRJ3S5elYQw2HGNvbpXvdRZ2fKO2CjslkI jZJfw5RyLZON1ahNldrsmQCsof+1Ep9TAGDFfKB3O3JnYqSigVYvqvFw1WhMRx1GcPqFIWOAFV2 QNIf++OJhl4OZDGsKxv05U78kwQVJvdCd+ZRVtYOFsmJlFyv/xKzLGWcW/fQq8cH2zyt4fNKCQo uWVAWSCMLI/Onsd5ZlcDMYFMcjtSno90LOHsmSPV6T X-Google-Smtp-Source: AGHT+IHPatBg749RtbLz236t9ao8aRtDmM9Wnvs6kBcEg639Es8yoUO/9T+v0eTRnNPiixc5ATmAtA== X-Received: by 2002:a17:907:db15:b0:b3d:b251:cded with SMTP id a640c23a62f3a-b50aa393ba1mr3131522666b.16.1760491374626; Tue, 14 Oct 2025 18:22:54 -0700 (PDT) Received: from inspiron14p-linux.nxsw.local (188-141-3-146.dynamic.upc.ie. [188.141.3.146]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b5ccd7b202dsm98348466b.82.2025.10.14.18.22.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 18:22:54 -0700 (PDT) From: Bryan O'Donoghue To: libcamera-devel@lists.libcamera.org Cc: hdegoede@redhat.com, mzamazal@redhat.com, bryan.odonoghue@linaro.org, bod.linux@nxsw.ie Subject: [PATCH v3 00/39] Add GLES 2.0 GPUISP to libcamera Date: Wed, 15 Oct 2025 02:22:12 +0100 Message-ID: <20251015012251.17508-1-bryan.odonoghue@linaro.org> X-Mailer: git-send-email 2.51.0 MIME-Version: 1.0 X-BeenThere: libcamera-devel@lists.libcamera.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libcamera-devel-bounces@lists.libcamera.org Sender: "libcamera-devel" This version 3 - Adds AWB to the debayer routine as calculated by the IPA thread - Implements ~ all of the feedback from Barnabas quicker to mention what hasn't been done. a) A comment about member initialisation in eGL.cpp code I wrote to make constructor init common seemed to negate that ask. b) meson dependency checks for egl. I remember struggling with this earlier on in development. I will certainly try to do this for a v4 so its more pending a try as opposed to not indended to be done. - Incorporates various fixes from Robert Mader When to sync removing tearing for Milan Some error checking that although Robert didn't mention in his feedback were in his patches so I stole that code. Thanks. - Also worth mentioning Robert identified a permissions fix that pipewire would need for eGL to work in libcamera with pipewire published that fix and got it merged too. Owe you a beer for that one. - Is rebased on tip-of-tree - Currently the documentation checks for the various classes don't pass but that is easy enough to fix in a V4. - In line with our discussions gpuisp is now the default instead of cpuisp. - Since its only the documentation checks that are pending I thought rather than delay further it was time to publish the series without and see if anything major gets snagged. v2: This version 2 is an incomplete update with-respect-to previous comment feedback, which ordinarily I would not publish however, given OSSEU is starting on Monday and we have talk about this topic, in addition to some pretty good progress in the interregnum I thought a v2 would be appropriate. - V2 drops use of GBM surface in favour of generating a framebuffer from the dma-buf handle, called render-to-texture. The conversion from GBM surface + memcpy() including the associated cache invalidate has a dramatic effect on GPUISP performance. Some rough stats for a Qualcomm sm8250 "kona" device with an imx517 sensor @ 4048 x 3040 ABRG8888 - debug builds CPUISP + CCM: 2 FPS CPU usage > 100% single core pulls about 9 watts GPUISP v1 + CCM: 14 FPS - power not measured GPUISP v2 + CCM: 30 FPS - sensor linerate - CPU usage ~ 70 % pulling 8 Watts. Milan Zamal has reported a TI AM69 + imx219 - unknown resolution CPUISP 4 FPS GPUISP v2 - 2 or 3 FPS GPUISP v2 - 15 FPS =3D=3D sensor linerate In other words for these boards we can hit linerate with GPUISP + 3A + CCM. - Drop GBM surface rendering - Drop swapbuffers - Use eglCreateImageKHR to directly render into the output dma-buf buffer eglCreateImageKHR lets you specify the FOURCC of the texture which means we can create the texture in the uncompressed target output pixel format we want. - Fix stride calculation to 256 bytes Laurent and Maxime explained to me about GPU stride alignments being tribal wisdom and that 256 bytes is a good cross-platform value. This helped to get the render-to-texture command right. - A synchronous blocking wait is used to ensure GPU operations have completed. Laurent wants this to be made async. At the moment its not clear to me the eglWaitSyncKHR is really required and in any case doesn't seem to have any performance impact. But this part is still TBD - I've included the sync wait for simplicity and safety. - A Debayer::stop() method has been introduced to ensure we call eglDestroySyncKHR when the eGL context is valid, as opposed to in the callchain of destructors triggering eGL::~eGL(); - stats move constructor call chain dropped - Branabas - Incorporates Milan's area-of-interest constraint for Bayer stats i.e. squashes his v3 update into debayer_egl.cpp directly - Moves ALIGN_TO into a common area to facilitate its reuse in egl.cpp - Rebases on 0.5.2 - There are a number of known checks failing on the CI loop right now Link to v1: https://lists.libcamera.org/pipermail/libcamera-devel/2025-June= /050692.html v1: This series introduces a GLES 2.0 GPU ISP to libcamera. We have had extensive discussions, meetings and collaborative discussions about this topic over the last year or so. As an overview we want to start to move as much processing of software_isp into the GPU as possible. This is especially advantageous when we are talking about processing a framebuffer's worth of pixels as quickly as possible. The decision to use GLES 2.0 instead of say Vulcan stems from a desire to support as much in the way of older hardware as possible and the fact we already have upstream GLES 2.0 fragment shaders to do debayer. Generally the approach is - Move the fragment shaders out of qcam and into a common location - Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate addition of a new class DebayerEGL. - Introduce that class - Then do progressive change of the shaders and DebayerEGL class to make the modifications as transparent as possible in the git log. - Reuse as much of the SoftIPA data-structures and logic as possible. - Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and GPUISP give similar - hopefully the same results but with GPUISP going faster. In order to get untiled and uncompressed pixel data out of the GPU framebuffer we need to tell the GPU how to store the data it is writing to that framebuffer. GPUs can store their framebuffer data in tiled or even compressed formats which is why the naive approach of running your fragment shader and then using glReadPixels(GL_RGBA); will be horrendously slow as glReadPixels must convert from the internal GPU format to the requested output format - an operation that for me takes ~ 10 milliseconds per frame. Instead we get the GPU to store its data as ARGB8888 swap buffers and memcpy() from the swapped buffer to our output frame. Right now this series supports 32 bit output formats only. The memcpy() also entails flushing the cache of the target buffer as per the terms of the dma-buf software contract. This leads us onto the main outstanding TODOs - 24 bit GBM buffer support leading - 24 bit output framebuffer support - Surfaceless GBM and eGL context with no swapbuffer - Render to texture If we render directly to a buffer provided to the GPU the output buffer we will not need to memcpy() to the output buffer nor will we need to invalidate the output buffer cache. - eglCreateImageKHR for the texture upload. This list is of the colour "make it go faster" not "make it work" which is why we are moving to start to submit a v1 for discussion in the full realisation it will have to go through several cycles of review giving us the opportunity to fix: - Doxygen is missing for new classes and methods - Some of the pipelines don't complete in gitlab - 24 bit output seems doable before merge - Render to texture perhaps even too For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam with about 75% CPU usage versus > 100% - cam goes faster which to me implies a good bit of time is being consumed in qcam itself. The series starts out with fixes and updates from Hans and finishes it out with shader modifications from Milan both of whom along with Kieran, Laurent and Maxime I'd like to thank for being some helpful and patient. Bryan O'Donoghue (31): libcamera: shaders: Move GL shader programs to src/libcamera/assets/shader utils: gen-shader-headers: Add a utility to generate headers from shaders meson: Automatically generate glsl_shaders.h from specified shader programs libcamera: software_isp: Move useful items from DebayerCpu to Debayer base class libcamera: software_isp: Move Bayer params init from DebayerCpu to Debayer libcamera: software_isp: Move param select code to Debayer base class libcamera: software_isp: Move DMA Sync code to Debayer base class libcamera: software_isp: Make output DMA sync contingent libcamera: software_isp: Move isStandardBayerOrder to base class libcamera: software_isp: Start the ISP thread in configure libcamera: software_isp: Move configure to worker thread libcamera: software_isp: debayer: Make the debayer_ object of type class Debayer not DebayerCpu libcamera: software_isp: debayer: Extend DebayerParams struct to hold a copy of per-frame CCM values libcamera: software_isp: debayer: Extend DebayerParams to hold a copy of per-frame AWB values libcamera: software_isp: awb Populate AWB gains to Debayer params structure libcamera: software_isp: ccm: Populate CCM table to Debayer params structure libcamera: software_isp: debayer: Introduce a stop() callback to the debayer object libcamera: software_isp: lut: Make gain corrected CCM in lut.cpp available in debayer params libcamera: software_isp: gbm: Add in a GBM helper class for GPU surface access libcamera: software_isp: Make isStandardBayerOrder static libcamera: software_isp: egl: Introduce an eGL base helper class libcamera: shaders: Use highp not mediump for float precision libcamera: shaders: Extend debayer shaders to apply RGB gain values on output libcamera: shaders: Extend bayer shaders to support swapping R and B on output libcamera: shaders: Add support for Auto White Balance gains libcamera: software_isp: debayer_egl: Add an eGL debayer class libcamera: software_isp: debayer_egl: Make DebayerEGL an environment option libcamera: software_isp: debayer_egl: Make gpuisp default softisp mode libcamera: software_isp: debayer_cpu: Make getInputConfig and getOutputConfig static libcamera: software_isp: Switch on uncalibrated CCM to validate eGLDebayer libcamera: software_isp: Add a gpuisp todo list Hans de Goede (5): libcamera: swstats_cpu: Update statsProcessFn() / processLine0() documentation libcamera: swstats_cpu: Drop patternSize_ documentation libcamera: swstats_cpu: Move header to libcamera/internal/software_isp libcamera: software_isp: Move benchmark code to its own class libcamera: swstats_cpu: Add processFrame() method Milan Zamazal (3): libcamera: shaders: Rename bayer_8 to bayer_unpacked libcamera: shaders: Fix neighbouring positions in 8-bit debayering libcamera: software_isp: GPU support for unpacked 10/12-bit formats include/libcamera/internal/egl.h | 162 +++++ include/libcamera/internal/gbm.h | 43 ++ include/libcamera/internal/meson.build | 11 + .../libcamera/internal/shaders}/RGB.frag | 2 +- .../internal/shaders}/YUV_2_planes.frag | 2 +- .../internal/shaders}/YUV_3_planes.frag | 2 +- .../internal/shaders}/YUV_packed.frag | 2 +- .../internal/shaders}/bayer_1x_packed.frag | 68 +- .../internal/shaders/bayer_unpacked.frag | 84 ++- .../internal/shaders/bayer_unpacked.vert | 8 +- .../libcamera/internal/shaders}/identity.vert | 0 .../libcamera/internal/shaders/meson.build | 10 + .../internal/software_isp/benchmark.h | 39 ++ .../internal/software_isp/debayer_params.h | 13 + .../internal/software_isp/meson.build | 2 + .../internal/software_isp/software_isp.h | 5 +- .../internal}/software_isp/swstats_cpu.h | 15 +- src/apps/qcam/assets/shader/shaders.qrc | 16 +- src/apps/qcam/meson.build | 3 + src/apps/qcam/viewfinder_gl.cpp | 70 +- src/ipa/simple/algorithms/awb.cpp | 4 +- src/ipa/simple/algorithms/ccm.cpp | 4 +- src/ipa/simple/algorithms/lut.cpp | 1 + src/ipa/simple/data/uncalibrated.yaml | 12 +- src/libcamera/egl.cpp | 435 ++++++++++++ src/libcamera/gbm.cpp | 61 ++ src/libcamera/meson.build | 34 + src/libcamera/software_isp/benchmark.cpp | 92 +++ src/libcamera/software_isp/debayer.cpp | 63 ++ src/libcamera/software_isp/debayer.h | 53 +- src/libcamera/software_isp/debayer_cpu.cpp | 88 +-- src/libcamera/software_isp/debayer_cpu.h | 44 +- src/libcamera/software_isp/debayer_egl.cpp | 648 ++++++++++++++++++ src/libcamera/software_isp/debayer_egl.h | 174 +++++ src/libcamera/software_isp/gpuisp-todo.txt | 83 +++ src/libcamera/software_isp/meson.build | 9 + src/libcamera/software_isp/software_isp.cpp | 49 +- src/libcamera/software_isp/swstats_cpu.cpp | 79 ++- utils/gen-shader-header.py | 38 + utils/gen-shader-headers.sh | 44 ++ utils/meson.build | 2 + 41 files changed, 2370 insertions(+), 204 deletions(-) create mode 100644 include/libcamera/internal/egl.h create mode 100644 include/libcamera/internal/gbm.h rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/RGB.frag (93%) rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/YUV_2_planes.frag (97%) rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/YUV_3_planes.frag (96%) rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/YUV_packed.frag (99%) rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/bayer_1x_packed.frag (75%) rename src/apps/qcam/assets/shader/bayer_8.frag => include/libcamera/internal/shaders/bayer_unpacked.frag (55%) rename src/apps/qcam/assets/shader/bayer_8.vert => include/libcamera/internal/shaders/bayer_unpacked.vert (85%) rename {src/apps/qcam/assets/shader => include/libcamera/internal/shaders}/identity.vert (100%) create mode 100644 include/libcamera/internal/shaders/meson.build create mode 100644 include/libcamera/internal/software_isp/benchmark.h rename {src/libcamera => include/libcamera/internal}/software_isp/swstats_cpu.h (84%) create mode 100644 src/libcamera/egl.cpp create mode 100644 src/libcamera/gbm.cpp create mode 100644 src/libcamera/software_isp/benchmark.cpp create mode 100644 src/libcamera/software_isp/debayer_egl.cpp create mode 100644 src/libcamera/software_isp/debayer_egl.h create mode 100644 src/libcamera/software_isp/gpuisp-todo.txt create mode 100755 utils/gen-shader-header.py create mode 100755 utils/gen-shader-headers.sh