[29/30] libcamera: software_isp: gpu: Cache output framebuffers, only recreate when necessary
diff mbox series

Message ID 20260618122245.946138-30-bryan.odonoghue@linaro.org
State New
Headers show
Series
  • RFC/RFT: gpuisp: Multipass with speed optimisations on top
Related show

Commit Message

Bryan O'Donoghue June 18, 2026, 12:22 p.m. UTC
Once a texture has been created using dma-buf handle, we can switch texture
units and ids with our glsl program without re-creating textures.

Since we are mapping pages, instead of copying the GPU simply takes the
maps it needs and operates on those.

Much faster.

➜  libcamera git:(0.7.0-multipass-v4) ✗ grep Bench before.log
[15:07:08.009062165] [1195303]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 729270us, 24309 us/frame
[15:07:11.686143411] [1195334]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 733995us, 24466 us/frame
[15:07:14.980640685] [1195363]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 740157us, 24671 us/frame
[15:07:18.163299379] [1195393]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 720094us, 24003 us/frame
[15:07:21.366461990] [1195422]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 719166us, 23972 us/frame
[15:07:24.718877325] [1195451]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 725425us, 24180 us/frame
[15:07:28.924768220] [1195481]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 753400us, 25113 us/frame
[15:07:32.336224289] [1195513]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 727160us, 24238 us/frame
[15:07:35.638928194] [1195542]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 762408us, 25413 us/frame
[15:07:38.868084716] [1195579]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 728991us, 24299 us/frame

➜  libcamera git:(0.7.0-multipass-v4) ✗ grep Bench after.log
[16:26:07.109426223] [1202010]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 650120us, 21670 us/frame
[16:26:18.925748074] [1202048]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 611062us, 20368 us/frame
[16:26:22.712614967] [1202077]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 609333us, 20311 us/frame
[16:26:26.551615514] [1202107]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 609791us, 20326 us/frame
[16:26:30.085663553] [1202136]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 612838us, 20427 us/frame
[16:26:34.945255617] [1202165]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 681918us, 22730 us/frame
[16:26:39.031353171] [1202194]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 595551us, 19851 us/frame
[16:26:42.610503048] [1202227]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 663929us, 22130 us/frame
[16:26:46.100211690] [1202256]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 580685us, 19356 us/frame
[16:26:49.394640903] [1202286]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 595072us, 19835 us/frame

Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
 .../software_isp/software_isp_pipeline_gpu.cpp  | 17 ++++++++++-------
 .../software_isp/software_isp_pipeline_gpu.h    |  2 +-
 2 files changed, 11 insertions(+), 8 deletions(-)

Comments

Barnabás Pőcze June 18, 2026, 12:50 p.m. UTC | #1
Hi

2026. 06. 18. 14:22 keltezéssel, Bryan O'Donoghue írta:
> Once a texture has been created using dma-buf handle, we can switch texture
> units and ids with our glsl program without re-creating textures.
> 
> Since we are mapping pages, instead of copying the GPU simply takes the
> maps it needs and operates on those.
> 
> Much faster.
> 
> ➜  libcamera git:(0.7.0-multipass-v4) ✗ grep Bench before.log
> [15:07:08.009062165] [1195303]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 729270us, 24309 us/frame
> [15:07:11.686143411] [1195334]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 733995us, 24466 us/frame
> [15:07:14.980640685] [1195363]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 740157us, 24671 us/frame
> [15:07:18.163299379] [1195393]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 720094us, 24003 us/frame
> [15:07:21.366461990] [1195422]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 719166us, 23972 us/frame
> [15:07:24.718877325] [1195451]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 725425us, 24180 us/frame
> [15:07:28.924768220] [1195481]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 753400us, 25113 us/frame
> [15:07:32.336224289] [1195513]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 727160us, 24238 us/frame
> [15:07:35.638928194] [1195542]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 762408us, 25413 us/frame
> [15:07:38.868084716] [1195579]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 728991us, 24299 us/frame
> 
> ➜  libcamera git:(0.7.0-multipass-v4) ✗ grep Bench after.log
> [16:26:07.109426223] [1202010]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 650120us, 21670 us/frame
> [16:26:18.925748074] [1202048]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 611062us, 20368 us/frame
> [16:26:22.712614967] [1202077]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 609333us, 20311 us/frame
> [16:26:26.551615514] [1202107]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 609791us, 20326 us/frame
> [16:26:30.085663553] [1202136]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 612838us, 20427 us/frame
> [16:26:34.945255617] [1202165]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 681918us, 22730 us/frame
> [16:26:39.031353171] [1202194]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 595551us, 19851 us/frame
> [16:26:42.610503048] [1202227]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 663929us, 22130 us/frame
> [16:26:46.100211690] [1202256]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 580685us, 19356 us/frame
> [16:26:49.394640903] [1202286]  INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 595072us, 19835 us/frame
> 
> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> ---
>   .../software_isp/software_isp_pipeline_gpu.cpp  | 17 ++++++++++-------
>   .../software_isp/software_isp_pipeline_gpu.h    |  2 +-
>   2 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp b/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp
> index 2e5c0e40e..bc5d59575 100644
> --- a/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp
> +++ b/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp
> @@ -263,8 +263,14 @@ int SoftwareIspPipelineGpu::processGPU(FrameBuffer *input, FrameBuffer *output,
>   		egl_.updateInputTexture2D(*eglImageBayerIn_, inMapped->value().planes()[0].data());
>   	}
>   
> -	/* Generate the output render framebuffer as render to texture */
> -	egl_.createOutputDMABufTexture2D(*eglImageRGBAOut_, output->planes()[0].fd.get());
> +	/* Find an existing eglImage in the cache */
> +	auto [output_cache, output_miss] = eglImageRGBAOut_.try_emplace(output);
> +	if (output_miss) {
> +		/* Generate the output render framebuffer as render to texture */
> +		output_cache->second = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE3, 3);
> +		egl_.createOutputDMABufTexture2D(*output_cache->second, output->planes()[0].fd.get());
> +	}
> +	eGLImage &eglImageRGBAOut = *output_cache->second;
>   
>   	pipelineResult = gpuIspShaderPassBlcNormalise_.process(*eglImageBayerIn_, *eglImagePingPong_[0], width_, height_, params);
>   	if (pipelineResult) {
> @@ -272,7 +278,7 @@ int SoftwareIspPipelineGpu::processGPU(FrameBuffer *input, FrameBuffer *output,
>   		return pipelineResult;
>   	}
>   
> -	pipelineResult = gpuIspShaderPassDemosiac_.process(*eglImagePingPong_[0], *eglImageRGBAOut_, width_, height_, params);
> +	pipelineResult = gpuIspShaderPassDemosiac_.process(*eglImagePingPong_[0], eglImageRGBAOut, width_, height_, params);
>   	if (pipelineResult) {
>   		LOG(Debayer, Error) << "Demosiac fail";
>   		return pipelineResult;
> @@ -371,9 +377,6 @@ int SoftwareIspPipelineGpu::start()
>   	eglImagePingPong_[0] = std::make_unique<eGLImage>(gpuIspShaderPassDemosiac_.glFormat_, width_, height_, outputConfig_.stride, GL_TEXTURE1, 1);
>   	eglImagePingPong_[1] = std::make_unique<eGLImage>(gpuIspShaderPassDemosiac_.glFormat_, width_, height_, outputConfig_.stride, GL_TEXTURE2, 2);
>   
> -	/* Texture we will render to */
> -	eglImageRGBAOut_ = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE3, 3);
> -
>   	egl_.createInputTexture2D(*eglImageBayerIn_, NULL);
>   	egl_.createOutputTexture2D(*eglImagePingPong_[0]);
>   	egl_.createOutputTexture2D(*eglImagePingPong_[1]);
> @@ -383,7 +386,7 @@ int SoftwareIspPipelineGpu::start()
>   
>   void SoftwareIspPipelineGpu::stop()
>   {
> -	eglImageRGBAOut_.reset();
> +	eglImageRGBAOut_.clear();
>   	eglImagePingPong_[1].reset();
>   	eglImagePingPong_[0].reset();
>   	eglImageBayerIn_.reset();
> diff --git a/src/libcamera/software_isp/software_isp_pipeline_gpu.h b/src/libcamera/software_isp/software_isp_pipeline_gpu.h
> index b32d4cad3..995e84295 100644
> --- a/src/libcamera/software_isp/software_isp_pipeline_gpu.h
> +++ b/src/libcamera/software_isp/software_isp_pipeline_gpu.h
> @@ -69,7 +69,7 @@ private:
>   	/* Pointer to object representing input texture */
>   	std::unique_ptr<eGLImage> eglImageBayerIn_;
>   	std::unique_ptr<eGLImage> eglImagePingPong_[2];
> -	std::unique_ptr<eGLImage> eglImageRGBAOut_;
> +	std::unordered_map<FrameBuffer *, std::unique_ptr<eGLImage>> eglImageRGBAOut_;

This has to have a hard-limit on the number of entries because technically
each request may use a different set of buffers, so the size must be limited.

Then it must also handle situation correctly where a `FrameBuffer` is destroyed after request
completion, and a new one is created for a later request that happens to have the same address.

I think something like `V4L2BufferCache` is needed if you want to do caching.


Regards,
Barnabás Pőcze

>   
>   	std::unique_ptr<SwStatsCpu> stats_;
>   	eGL egl_;

Patch
diff mbox series

diff --git a/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp b/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp
index 2e5c0e40e..bc5d59575 100644
--- a/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp
+++ b/src/libcamera/software_isp/software_isp_pipeline_gpu.cpp
@@ -263,8 +263,14 @@  int SoftwareIspPipelineGpu::processGPU(FrameBuffer *input, FrameBuffer *output,
 		egl_.updateInputTexture2D(*eglImageBayerIn_, inMapped->value().planes()[0].data());
 	}
 
-	/* Generate the output render framebuffer as render to texture */
-	egl_.createOutputDMABufTexture2D(*eglImageRGBAOut_, output->planes()[0].fd.get());
+	/* Find an existing eglImage in the cache */
+	auto [output_cache, output_miss] = eglImageRGBAOut_.try_emplace(output);
+	if (output_miss) {
+		/* Generate the output render framebuffer as render to texture */
+		output_cache->second = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE3, 3);
+		egl_.createOutputDMABufTexture2D(*output_cache->second, output->planes()[0].fd.get());
+	}
+	eGLImage &eglImageRGBAOut = *output_cache->second;
 
 	pipelineResult = gpuIspShaderPassBlcNormalise_.process(*eglImageBayerIn_, *eglImagePingPong_[0], width_, height_, params);
 	if (pipelineResult) {
@@ -272,7 +278,7 @@  int SoftwareIspPipelineGpu::processGPU(FrameBuffer *input, FrameBuffer *output,
 		return pipelineResult;
 	}
 
-	pipelineResult = gpuIspShaderPassDemosiac_.process(*eglImagePingPong_[0], *eglImageRGBAOut_, width_, height_, params);
+	pipelineResult = gpuIspShaderPassDemosiac_.process(*eglImagePingPong_[0], eglImageRGBAOut, width_, height_, params);
 	if (pipelineResult) {
 		LOG(Debayer, Error) << "Demosiac fail";
 		return pipelineResult;
@@ -371,9 +377,6 @@  int SoftwareIspPipelineGpu::start()
 	eglImagePingPong_[0] = std::make_unique<eGLImage>(gpuIspShaderPassDemosiac_.glFormat_, width_, height_, outputConfig_.stride, GL_TEXTURE1, 1);
 	eglImagePingPong_[1] = std::make_unique<eGLImage>(gpuIspShaderPassDemosiac_.glFormat_, width_, height_, outputConfig_.stride, GL_TEXTURE2, 2);
 
-	/* Texture we will render to */
-	eglImageRGBAOut_ = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE3, 3);
-
 	egl_.createInputTexture2D(*eglImageBayerIn_, NULL);
 	egl_.createOutputTexture2D(*eglImagePingPong_[0]);
 	egl_.createOutputTexture2D(*eglImagePingPong_[1]);
@@ -383,7 +386,7 @@  int SoftwareIspPipelineGpu::start()
 
 void SoftwareIspPipelineGpu::stop()
 {
-	eglImageRGBAOut_.reset();
+	eglImageRGBAOut_.clear();
 	eglImagePingPong_[1].reset();
 	eglImagePingPong_[0].reset();
 	eglImageBayerIn_.reset();
diff --git a/src/libcamera/software_isp/software_isp_pipeline_gpu.h b/src/libcamera/software_isp/software_isp_pipeline_gpu.h
index b32d4cad3..995e84295 100644
--- a/src/libcamera/software_isp/software_isp_pipeline_gpu.h
+++ b/src/libcamera/software_isp/software_isp_pipeline_gpu.h
@@ -69,7 +69,7 @@  private:
 	/* Pointer to object representing input texture */
 	std::unique_ptr<eGLImage> eglImageBayerIn_;
 	std::unique_ptr<eGLImage> eglImagePingPong_[2];
-	std::unique_ptr<eGLImage> eglImageRGBAOut_;
+	std::unordered_map<FrameBuffer *, std::unique_ptr<eGLImage>> eglImageRGBAOut_;
 
 	std::unique_ptr<SwStatsCpu> stats_;
 	eGL egl_;