[{"id":38977,"web_url":"https://patchwork.libcamera.org/comment/38977/","msgid":"<7b53e1dc-617b-4923-8657-05efadbba484@nxsw.ie>","date":"2026-06-02T08:26:41","subject":"Re: [PATCH v5 5/5] debayer_egl: Implement dmabuf import for input\n\tbuffers","submitter":{"id":226,"url":"https://patchwork.libcamera.org/api/people/226/","name":"Bryan O'Donoghue","email":"bod.linux@nxsw.ie"},"content":"On 27/05/2026 09:15, Robert Mader wrote:\n> In many cases we can import the GPU-ISP input buffers, dmabufs from v4l2,\n> directly into EGL instead of mapping and uploading - i.e. copying - them.\n> \n> Doing so can have positive effects in multiple areas, including reducing\n> memory bandwidth and CPU usage, as well as avoiding expensive dmabuf syncs\n> and syscalls.\n> \n> The main reason direct imports may not work are the more demanding stride\n> alignment requirements many GPUs have - often 128 or 256 bytes - compared\n> to ISPs - apparently often closer to 32 bytes.\n> \n> Thus we first try to import buffers directly and - if that fails - fall back\n> to the previous upload path. Failing imports should come at low cost as\n> drivers know the limitations and can bail out early, without causing\n> additional IO or context switches.\n> \n> In the future we might be able to request buffers with a matching stride\n> from v4l2 drivers in many cases, making direct import the norm instead\n> of a hit-or-miss. An optional kernel API for that exists, but doesn't\n> seem to be implemented by any driver tested so far.\n> \n> Note that passing around MappedFrameBuffer and DmaSyncer variables ensures\n> we don't do unnecessary mappings and dmabuf syncs.\n> \n> Below are some benchmark results. All where done using postmarketOS edge\n> with updates from 21th May 2026 (Mesa 26.1.1). The mentioned pipelines\n> where run five times each, with the mean value included here, which should\n> be quite representive as the variance was rather small. All devices\n> where using the powersave governor.\n> \n> - FairPhone 5\n> \n> -- Back camera\n> cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@29 -s width=1920,height=1080 --capture=60\n> Before: 14027 us/frame\n> After: 12122 us/frame\n> \n> - OnePlus 6\n> \n> -- Back camera (imx519)\n> cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@10 -s width=1920,height=1080 --capture=60\n> Before: 30091 us/frame\n> After: 19878 us/frame\n> \n> - Librem 5\n> \n> -- Back camera\n> cam -c /base/soc@0/bus@30800000/i2c@30a50000/camera@2d -s width=1280,height=720 --capture=60\n> Before: 69092 us/frame\n> After: 41250 us/frame\n> \n> - PinePhone\n> \n> -- Front Camera\n> cam -c /base/i2c-csi/front-camera@3c -s width=1280,height=720 --capture=60\n> Before: 173769 us/frame\n> After: 143274 us/frame\n> \n> -- Back camera\n> cam -c /base/i2c-csi/rear-camera@4c -s width=1280,height=720 --capture=60\n> Before: 174833 us/frame\n> After: 144476 us/frame\n> \n> There is one case where performance regresses:\n> \n> - Pixel 3a\n> \n> -- Back camera\n> cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@1a -s width=1920,height=1080 --capture=60\n> Before: 14257 us/frame\n> After: 15161 us/frame\n> \n> To my knowledge this is likely caused by bad sampling performance from\n> linear buffers. IMO this is a driver issue - if a copy to tiled format\n> makes sampling faster, drivers should do so implicitly (like e.g. v3d\n> already does).\n> \n> Signed-off-by: Robert Mader <robert.mader@collabora.com>\n> ---\n>   src/libcamera/software_isp/debayer_egl.cpp | 58 ++++++++++++++++------\n>   src/libcamera/software_isp/debayer_egl.h   |  2 +-\n>   2 files changed, 43 insertions(+), 17 deletions(-)\n> \n> diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp\n> index d08634640..e83a68a97 100644\n> --- a/src/libcamera/software_isp/debayer_egl.cpp\n> +++ b/src/libcamera/software_isp/debayer_egl.cpp\n> @@ -10,6 +10,7 @@\n>   #include \"debayer_egl.h\"\n> \n>   #include <algorithm>\n> +#include <assert.h>\n>   #include <memory>\n>   #include <stdlib.h>\n>   #include <string>\n> @@ -500,16 +501,34 @@ void DebayerEGL::setShaderVariableValues(const DebayerParams &params)\n>   \treturn;\n>   }\n> \n> -int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams &params)\n> +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams &params, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer)\n>   {\n> +\tbool dmabuf_import_succeeded = false;\n> +\n>   \t/* eGL context switch */\n>   \tegl_.makeCurrent();\n> \n> -\t/* Create a standard texture input */\n> -\tegl_.createTexture2D(*eglImageBayerIn_, in.planes()[0].data());\n> +\t/* Try to create texture for input buffer via dmabuf import */\n> +\tif (!eglImageBayerIn_->dmabuf_import_failed_) {\n> +\t\tif (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, input->planes()[0].fd.get()) == 0)\n> +\t\t\tdmabuf_import_succeeded = true;\n> +\t\telse\n> +\t\t\tLOG(Debayer, Info) << \"Importing input buffer with DMABuf import failed, falling back to upload\";\n> +\t}\n> +\n> +\t/* Otherwise create texture for input buffer via upload from CPU */\n> +\tif (!dmabuf_import_succeeded) {\n> +\t\tinDmaSyncer->emplace(input->planes()[0].fd, DmaSyncer::SyncType::Read);\n> +\t\tinMapped->emplace(input, MappedFrameBuffer::MapFlag::Read);\n> +\t\tif (!inMapped->value().isValid()) {\n> +\t\t\tLOG(Debayer, Error) << \"mmap-ing buffer(s) failed\";\n> +\t\t\treturn -ENODEV;\n> +\t\t}\n> +\t\tegl_.createTexture2D(*eglImageBayerIn_, inMapped->value().planes()[0].data());\n> +\t}\n> \n>   \t/* Generate the output render framebuffer as render to texture */\n> -\tegl_.createOutputDMABufTexture2D(*eglImageBayerOut_, out_fd);\n> +\tegl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get());\n> \n>   \tsetShaderVariableValues(params);\n>   \tglViewport(0, 0, width_, height_);\n> @@ -531,23 +550,16 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output\n>   {\n>   \tbench_.startFrame();\n> \n> -\tstd::vector<DmaSyncer> dmaSyncers;\n> -\n> -\tdmaSyncBegin(dmaSyncers, input, nullptr);\n> -\n>   \t/* Copy metadata from the input buffer */\n>   \tFrameMetadata &metadata = output->_d()->metadata();\n>   \tmetadata.status = input->metadata().status;\n>   \tmetadata.sequence = input->metadata().sequence;\n>   \tmetadata.timestamp = input->metadata().timestamp;\n> \n> -\tMappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read);\n> -\tif (!in.isValid()) {\n> -\t\tLOG(Debayer, Error) << \"mmap-ing buffer(s) failed\";\n> -\t\tgoto error;\n> -\t}\n> +\tstd::optional<MappedFrameBuffer> inMapped;\n> +\tstd::optional<DmaSyncer> inDmaSyncer;\n> \n> -\tif (debayerGPU(in, output->planes()[0].fd.get(), params)) {\n> +\tif (debayerGPU(input, output, params, &inMapped, &inDmaSyncer)) {\n>   \t\tLOG(Debayer, Error) << \"debayerGPU failed\";\n>   \t\tgoto error;\n>   \t}\n> @@ -555,8 +567,22 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output\n>   \tmetadata.planes()[0].bytesused = output->planes()[0].length;\n> \n>   \t/* Calculate stats for the whole frame */\n> -\tstats_->processFrame(frame, 0, in);\n> -\tdmaSyncers.clear();\n> +\tif (frame % SwStatsCpu::kStatPerNumFrames) {\n> +\t\tstats_->finishFrame(frame, 0);\n> +\t} else {\n> +\t\tif (!inMapped) {\n> +\t\t\t/*\n> +\t\t\t * The buffer was directly imported into EGL and thus\n> +\t\t\t * not mapped for texture upload. Do it now for the\n> +\t\t\t * CPU-based stats calculation.\n> +\t\t\t */\n> +\t\t\tassert(!inDmaSyncer);\n> +\t\t\tinDmaSyncer.emplace(input->planes()[0].fd, DmaSyncer::SyncType::Read);\n> +\t\t\tinMapped.emplace(input, MappedFrameBuffer::MapFlag::Read);\n> +\t\t}\n> +\t\tstats_->processFrame(frame, 0, inMapped.value());\n> +\t}\n> +\tinDmaSyncer.reset();\n> \n>   \tegl_.syncOutput();\n>   \tbench_.finishFrame();\n> diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h\n> index 141fb288f..875e7cfc5 100644\n> --- a/src/libcamera/software_isp/debayer_egl.h\n> +++ b/src/libcamera/software_isp/debayer_egl.h\n> @@ -65,7 +65,7 @@ private:\n>   \tint initBayerShaders(PixelFormat inputFormat, PixelFormat outputFormat);\n>   \tint getShaderVariableLocations();\n>   \tvoid setShaderVariableValues(const DebayerParams &params);\n> -\tint debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams &params);\n> +\tint debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams &params, std::optional<MappedFrameBuffer> *mappedInputBuffer, std::optional<DmaSyncer> *inputBufferDmaSyncer);\n> \n>   \t/* Shader program identifiers */\n>   \tGLuint vertexShaderId_ = 0;\n> --\n> 2.54.0\n> \n\nLGTM\n\nReviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>\n\n---\nbod","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id BA484BDCBC\n\tfor <parsemail@patchwork.libcamera.org>;\n\tTue,  2 Jun 2026 08:26:47 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id AD46E63038;\n\tTue,  2 Jun 2026 10:26:46 +0200 (CEST)","from tor.source.kernel.org (tor.source.kernel.org\n\t[IPv6:2600:3c04:e001:324:0:1991:8:25])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 37E2062E6A\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tTue,  2 Jun 2026 10:26:45 +0200 (CEST)","from smtp.kernel.org (quasi.space.kernel.org [100.103.45.18])\n\tby tor.source.kernel.org (Postfix) with ESMTP id F20266001A;\n\tTue,  2 Jun 2026 08:26:43 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id 136261F00893;\n\tTue,  2 Jun 2026 08:26:42 +0000 (UTC)"],"Message-ID":"<7b53e1dc-617b-4923-8657-05efadbba484@nxsw.ie>","Date":"Tue, 2 Jun 2026 09:26:41 +0100","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH v5 5/5] debayer_egl: Implement dmabuf import for input\n\tbuffers","To":"Robert Mader <robert.mader@collabora.com>,\n\tlibcamera-devel@lists.libcamera.org","References":"<20260527081534.20245-1-robert.mader@collabora.com>\n\t<8Vv7aI_42dXVz_PKHfb5_dyoylKhYD2yEqSCmuu_XB01oVjoogzJQBqy_m9B6QQ5nqegLgL8nNivjtwKvL8-sw==@protonmail.internalid>\n\t<20260527081534.20245-6-robert.mader@collabora.com>","From":"Bryan O'Donoghue <bod.linux@nxsw.ie>","Content-Language":"en-US","In-Reply-To":"<20260527081534.20245-6-robert.mader@collabora.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"7bit","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}}]