| Message ID | 20260618122245.946138-1-bryan.odonoghue@linaro.org |
|---|---|
| Headers | show
Return-Path: <libcamera-devel-bounces@lists.libcamera.org> X-Original-To: parsemail@patchwork.libcamera.org Delivered-To: parsemail@patchwork.libcamera.org Received: from lancelot.ideasonboard.com (lancelot.ideasonboard.com [92.243.16.209]) by patchwork.libcamera.org (Postfix) with ESMTPS id D4C28BF415 for <parsemail@patchwork.libcamera.org>; Thu, 18 Jun 2026 12:23:00 +0000 (UTC) Received: from lancelot.ideasonboard.com (localhost [IPv6:::1]) by lancelot.ideasonboard.com (Postfix) with ESMTP id 7237662980; Thu, 18 Jun 2026 14:22:59 +0200 (CEST) Authentication-Results: lancelot.ideasonboard.com; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="hMu49oqD"; dkim-atps=neutral Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lancelot.ideasonboard.com (Postfix) with ESMTPS id 8FCC962980 for <libcamera-devel@lists.libcamera.org>; Thu, 18 Jun 2026 14:22:57 +0200 (CEST) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-4903d730b1fso10003925e9.2 for <libcamera-devel@lists.libcamera.org>; Thu, 18 Jun 2026 05:22:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1781785377; x=1782390177; darn=lists.libcamera.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=BP5r2JBI8xwUOXwB+DWavykOb8wMEcrmY8nBUnIGir8=; b=hMu49oqDuaXSB9Pjj/eHSlER/qV4dDvUHpMOqolH6YDU/3//aXpgVciifz1smhQsVj d86hs02X16/fePz0+egAGchiw9LZH2Ipqm9E5kgZ7OARERoIzss2H2OvkcQ3HoceA2EP CKZfvug5cgEXhQLjsttRVplLWTfPNNd8UjJyZ0SDeyH567VLvmR35Ph0sBupUT3FpuF5 a6j93igdo5fKAb8eWMVm8cwfWtOciRXXDQ4ir2poFNVnaVKhjXPXq0ZYGRF5VatGHtz0 t17LasRehQywcpx72lC3nCoBp3KN7gCUygK3E6h4JzOfgYP8JnEslu8yIXa2XkVYwF21 49wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781785377; x=1782390177; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=BP5r2JBI8xwUOXwB+DWavykOb8wMEcrmY8nBUnIGir8=; b=QjIHvSYs8Ip38lbjg/2uwAjOeEjVALENtbFG2hwLe+1e1K/PIs9+GGlOR0FySYGm6N +01kQus/Ma28dcHQznp1eaA0/ImtrT7LTk2zm/dzAaZVuE349/RdlwwICPWFGMo2wO6q /wP5FRmXJ8eINkSXXgrRZF2yTvcfviYVxRvT7I2BH5Lgvxv0+kblHlfKBTDJ5Y8DSjlt 2lv10dXlwR6/XIf+1nSts1iYLu7kXH1xEGm2n9zN/R1cxayv4+glzX4bml15Mgj9Hv5S cbsVf+ClpgLZ86GYRqxfSr024YStkt53fCcaKICwPhOZ3DclYb7a6Hamcz9D0JFv/Hcy p98w== X-Gm-Message-State: AOJu0Yy0eWtQmqBjGAT8/pKSkvFOgnptMCTQowC+NIqDxf30beU6uCNm W2JqWwmmEL1eMML1UG9E6EWp0A/kSd8beE6sY0+J8yc04BkxqEgZ6Rp1N+A+SVj/0s4I40of17Z IdAI8NGU= X-Gm-Gg: AfdE7cmjJiC1VWQ5QtcxPBG8TztP69hoZHJkFPCXeKqQK0+9fgo3GgOU9CDt4FQ8nB2 O90Hg8x287WzUjuN7ifBPgkM85SV2fOvy+jFMCoiGPTXHFlteAcwUdG1GW6En7JrEAEDKnt56S4 JWev76tNCtzbkVp95LJ0RM6McdDklngXVf3jPE/dqr1FBiA2aO6LO37uNdpY3pKZX6GrY6aqizW T33ovbW2SHRg7uqLTO8M3pE1MLX5FySIBQENUZnmAAX2/BIdykkDUXDstb+gLdXRd2lSiovRqP2 jMx6Xng9Jn7qU1NBElY1HqRjpltd/UL2MZfzh5mdb0/mb+jqECes1FEUBYwBv3pLug5PmD9wiuq KsaJZF13GPbRGs7h6EgO1WQV7pmYShcFBoZglJjUFBJRDwZmkhB/9rX+gPTuMCB6TJHVG4TF9aM T5yrP6195cA1KSOLRMsHt+urwF8cWgUlx78FLodoU= X-Received: by 2002:a05:600c:1d0d:b0:492:1e50:978d with SMTP id 5b1f17b1804b1-492381f13e0mr56568505e9.16.1781785376978; Thu, 18 Jun 2026 05:22:56 -0700 (PDT) Received: from inspiron14p-linux ([109.76.144.236]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4922fa3a4easm275198015e9.3.2026.06.18.05.22.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jun 2026 05:22:56 -0700 (PDT) From: Bryan O'Donoghue <bryan.odonoghue@linaro.org> To: libcamera-devel@lists.libcamera.org Cc: bryan.odonoghue@linaro.org, pavel@ucw.cz Subject: [PATCH 00/30] RFC/RFT: gpuisp: Multipass with speed optimisations on top Date: Thu, 18 Jun 2026 13:22:13 +0100 Message-ID: <20260618122245.946138-1-bryan.odonoghue@linaro.org> X-Mailer: git-send-email 2.54.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: libcamera-devel@lists.libcamera.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: <libcamera-devel.lists.libcamera.org> List-Unsubscribe: <https://lists.libcamera.org/options/libcamera-devel>, <mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe> List-Archive: <https://lists.libcamera.org/pipermail/libcamera-devel/> List-Post: <mailto:libcamera-devel@lists.libcamera.org> List-Help: <mailto:libcamera-devel-request@lists.libcamera.org?subject=help> List-Subscribe: <https://lists.libcamera.org/listinfo/libcamera-devel>, <mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe> Errors-To: libcamera-devel-bounces@lists.libcamera.org Sender: "libcamera-devel" <libcamera-devel-bounces@lists.libcamera.org> |
| Series |
|
| Related |
show
|
Greetings. This series implements multli-pass gpuisp as a two phase thing. - Some initial housekeeping to make the naming more logical ShaderPass et al. - Dragging the existing implementation through a progressive change set. - Adding in shaders to "normalise" i.e. to take either a packed or unpacked CSI2 pixel stream - apply BLC to it and output a standard GL16F frame. - This allows us to dispense with having two demosiac shaders and to use the logic from the exisitng unpacked demosiac for all streams. This is actually a nice change for the packed case as the unpacked algorithm is slightly better. - Some benchmarking added to the GpuIspShaderPass shows that the two shaders on a reference from costing 20ms consume about 5.5ms. - In that light some caching is done at the end of the series to improve the current throughput. On the slower system I tested the series on the before case averages out at about 22ms per frame. Post this series we get between 18ms and 20ms. I'd call that a 10-15% win. On the slow system rb5-sm8250 the shaders are about 5.5ms of runtime. On the fast hamoa-x1e system the whole process is about 6ms - Not tested Unpacked CSI2 input - I don't have easy access to this input right now DMABUF input caching - none of my hardware supports - The next steps for this series are: - Converting the BLC normalise phase to a compute shader - Having the BLC normalise phase produce an additional SSBO which is the histogram of th bayer input. - Using that fact to no-longer run CPU side bayer stats including not having to map the buffer in the CPU. - Emitting the input buffer after the BLC shader completes. Possible only when generating stats in GPU. - The CPU stats on the slow system I'm targetting consume about 4.5ms of the original 20ms. Done right the additional time in the shader should be low though what syncing around the SSBO to subsequently use the buffer may be a gotcha. - Since "we are where we are" on fencing and glFinish() costs about 8 ms in this reference slow case - reaping 4.5 additional ms seems like the next most logical thing to try to attack. Caveated on the fact this series is too large and messy right now :) - Fencing/Deferred fencing. Right now libcamera as Nicolas pointed out on IRC, doesn't require framebuffer recipients to dma-fence. This means we need to fence in libcamera. - Would the dma-buf ioctl for the output framebuffer produce better synchronous wait times than glFinish() and if so could we really trust the result. Probably no and yes I'd guess. - Could we use egl fencing to achieve better wait-times. No almost certainly not glFinish() is doing a real thing here ensuring the GPU is finished. - Could we do an asychronous wait - doing a fenceKHR at the top of processGPU() for the previous frame ? This would let the CPU do productive work while the GPU completes. - Finally can we "just" require users to dma-fence the framebuffer. - Other permutations on this theme are possible. - Adding additional passes We would like to have shaders and objects that are "composable" so that for example you could run a GPU based noise filter on any frame. This means having more granular GPUISP support is desirable. Since the time spent moving from one shader to the other appears quite low and not where we are burning time - a more granular number of passes at the moment seems achievable. Bryan O'Donoghue (30): libcamera: software_isp: Rename Bayer classes to SoftwareIspPipeline libcamera: software_isp: gpu: Change the name of eglImageBayerOut_ to eglImageRGBAOut_ libcamera: software_isp: gpu: rename debayerGPU to processGPU libcamera: software_isp: egl: Add new helper attachTextureToFBO libcamera: software_isp: gpu_pipeline_shader_pass: Add base class GpuPipelineShaderPass libcamera: software_isp: gpu_pipeline_shader_pass: Add GpuPipelineShaderPassDemosiac libcamera: software_isp: gpu: Switch to using GpuIspShaderPassDemosiac libcamera: software_isp: gpu: Drop unused method definitions libcamera: software_isp: gpu: Make Rectangle window_ a local variable in configure() libcamera: software_isp: gpu_pipeline_shader_pass: Move common attribute and uniform variables to base shader class libcamera: software_isp: gpu_pipeline_shader_pass: Move common shader selection logic into base class in new method initShaders() libcamera: shaders: Split packed and unpacked demosiac up libcamera: shaders: bayer_glr16_to_rgba.frag: Use bilinear filtering libcamera: software_isp: gpu: Add GpuIspShaderPassBlcNormalise libcamera: software_isp: egl: Extend eGL::createTexture2D to understand floats libcamera: software_isp: egl: Move to GLES 3.0 libcamera: software_isp: egl: Rename createTexture2D to createInputTexture2D libcamera: software_isp: egl: Use Texture Unit 3 for final output texture libcamera: software_isp: egl: Add Ping/Pong buffers with start/stop bindings only libcamera: software_isp: gpu: Include GpuIspShaderPassBlcNormalise in init sequence libcamera: software_isp: egl: Add createOutputTexture2D libcamera: software_isp: gpu: Swtich to two pass logic libcamera: software_isp: egl: Add method lookups for GPU benchmark rountines libcamera: software_isp: egl: Add eglBenchMark libcamera: software_isp: gpu_pipeline_shader_pass: Add shader DEBUG time logging libcamera: software_isp: gpu: Do a synchronous BenchMark print after syncOutput libcamera: software_isp: egl: Add updateInputTexture2D libcamera: software_isp: gpu: Switch to using glTexSubImage2D on slow path upload libcamera: software_isp: gpu: Cache output framebuffers, only recreate when necessary libcamera: software_isp: gpu: Cache input framebuffers, only do texture creation when required include/libcamera/internal/egl.h | 65 +- .../internal/software_isp/software_isp.h | 4 +- src/libcamera/egl.cpp | 131 +++- .../bayer_1x_packed_to_blc_glr16f.frag | 97 +++ .../shaders/bayer_glr16_to_rgba.frag | 155 ++++ .../shaders/bayer_unpacked_to_blc_glr16f.frag | 46 ++ src/libcamera/shaders/meson.build | 3 + src/libcamera/software_isp/debayer_egl.cpp | 671 ------------------ .../software_isp/gpu_pipeline_shader_pass.cpp | 196 +++++ .../software_isp/gpu_pipeline_shader_pass.h | 109 +++ ...gpu_pipeline_shader_pass_blc_normalise.cpp | 270 +++++++ .../gpu_pipeline_shader_pass_blc_normalise.h | 55 ++ .../gpu_pipeline_shader_pass_demosiac.cpp | 239 +++++++ .../gpu_pipeline_shader_pass_demosiac.h | 60 ++ src/libcamera/software_isp/meson.build | 9 +- src/libcamera/software_isp/software_isp.cpp | 60 +- ...{debayer.cpp => software_isp_pipeline.cpp} | 77 +- .../{debayer.h => software_isp_pipeline.h} | 8 +- ..._cpu.cpp => software_isp_pipeline_cpu.cpp} | 160 ++--- ...ayer_cpu.h => software_isp_pipeline_cpu.h} | 10 +- .../software_isp_pipeline_gpu.cpp | 433 +++++++++++ ...ayer_egl.h => software_isp_pipeline_gpu.h} | 61 +- 22 files changed, 2023 insertions(+), 896 deletions(-) create mode 100644 src/libcamera/shaders/bayer_1x_packed_to_blc_glr16f.frag create mode 100644 src/libcamera/shaders/bayer_glr16_to_rgba.frag create mode 100644 src/libcamera/shaders/bayer_unpacked_to_blc_glr16f.frag delete mode 100644 src/libcamera/software_isp/debayer_egl.cpp create mode 100644 src/libcamera/software_isp/gpu_pipeline_shader_pass.cpp create mode 100644 src/libcamera/software_isp/gpu_pipeline_shader_pass.h create mode 100644 src/libcamera/software_isp/gpu_pipeline_shader_pass_blc_normalise.cpp create mode 100644 src/libcamera/software_isp/gpu_pipeline_shader_pass_blc_normalise.h create mode 100644 src/libcamera/software_isp/gpu_pipeline_shader_pass_demosiac.cpp create mode 100644 src/libcamera/software_isp/gpu_pipeline_shader_pass_demosiac.h rename src/libcamera/software_isp/{debayer.cpp => software_isp_pipeline.cpp} (73%) rename src/libcamera/software_isp/{debayer.h => software_isp_pipeline.h} (93%) rename src/libcamera/software_isp/{debayer_cpu.cpp => software_isp_pipeline_cpu.cpp} (84%) rename src/libcamera/software_isp/{debayer_cpu.h => software_isp_pipeline_cpu.h} (95%) create mode 100644 src/libcamera/software_isp/software_isp_pipeline_gpu.cpp rename src/libcamera/software_isp/{debayer_egl.h => software_isp_pipeline_gpu.h} (63%)