[v2,2/2] debayer_egl: Sync output buffers after processing stats
diff mbox series

Message ID 20260526080639.70173-3-robert.mader@collabora.com
State New
Headers show
Series
  • debayer_egl: Sync output buffers after processing stats
Related show

Commit Message

Robert Mader May 26, 2026, 8:06 a.m. UTC
Instead of waiting for the GPU to finish output buffers *before* computing
stats, do so afterwards. This allows work to happen in parallel on the GPU
and CPU, improving throughput and reducing latency on various devices as
shown below.

On order for this to work well we need to flush all GL commands to the
GPU before doing heavy CPU work - thus add a corresponding function.

Below are some benchmark results. All where done using postmarketOS edge
with updates from 21th May 2026 (Mesa 26.1.1). The mentioned pipelines
where run five times each, with the mean value included here, which should
be quite representive as the variance was rather small. All devices
where using the powersave governor.

Notes:
1. We only expect changes for frames where stats get computed, currently
   every fourth frame (see kStatPerNumFrames) - and the improvements indeed
   have been observed to increase when computing stats more often.
2. At least the qcom/freedreno devices have been found to be affected by a
   performance issue *without* this patch. This issue can be worked
   around by calling `glFlush()` directly before `glFinish()` - as done
   in v1 of this series - or by running with `GALLIUM_THREAD=0`. The
   benchmarks below show the performance gains *with* those workarounds
   applied in order to not inflate the impact of this patch. See
   https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15516 for more
   context.

cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@1a -s width=1920,height=1080 --capture=60
Before: 33596 us/frame
After: 30179 us/frame

cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@1a -s width=1920,height=1080 --capture=60
Before: 14922 us/frame
After: 14304 us/frame

cam -c /base/soc@0/cci@ac4b000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60
Before: 26106 us/frame
After: 23312 us/frame

cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@29 -s width=1920,height=1080 --capture=60
Before: 15897 us/frame
After: 14791 us/frame

cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60
Before: 25721 us/frame
After: 23625 us/frame

cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@10 -s width=1920,height=1080 --capture=60
Before: 34124 us/frame
After: 29471 us/frame

cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@1a -s width=1920,height=1080 --capture=60
Before: 23707 us/frame
After: 21890 us/frame

cam -c /base/soc@0/bus@30800000/i2c@30a40000/camera@20 -s width=1280,height=720 --capture=60
Before: 91649 us/frame
After: 83233 us/frame

cam -c /base/soc@0/bus@30800000/i2c@30a50000/camera@2d -s width=1280,height=720 --capture=60
Before: 76956 us/frame
After: 69569 us/frame

cam -c /base/i2c-csi/front-camera@3c -s width=1280,height=720 --capture=60
Before: 188500 us/frame
After: 173764 us/frame

cam -c /base/i2c-csi/rear-camera@4c -s width=1280,height=720 --capture=60
Before: 190222 us/frame
After: 177251 us/frame

Signed-off-by: Robert Mader <robert.mader@collabora.com>
---
 include/libcamera/internal/egl.h           |  1 +
 src/libcamera/egl.cpp                      | 13 +++++++++++++
 src/libcamera/software_isp/debayer_egl.cpp |  3 ++-
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Bryan O'Donoghue May 26, 2026, 5:22 p.m. UTC | #1
On 26/05/2026 09:06, Robert Mader wrote:
> Instead of waiting for the GPU to finish output buffers *before* computing
> stats, do so afterwards. This allows work to happen in parallel on the GPU
> and CPU, improving throughput and reducing latency on various devices as
> shown below.
> 
> On order for this to work well we need to flush all GL commands to the
> GPU before doing heavy CPU work - thus add a corresponding function.
> 
> Below are some benchmark results. All where done using postmarketOS edge
> with updates from 21th May 2026 (Mesa 26.1.1). The mentioned pipelines
> where run five times each, with the mean value included here, which should
> be quite representive as the variance was rather small. All devices
> where using the powersave governor.
> 
> Notes:
> 1. We only expect changes for frames where stats get computed, currently
>     every fourth frame (see kStatPerNumFrames) - and the improvements indeed
>     have been observed to increase when computing stats more often.
> 2. At least the qcom/freedreno devices have been found to be affected by a
>     performance issue *without* this patch. This issue can be worked
>     around by calling `glFlush()` directly before `glFinish()` - as done
>     in v1 of this series - or by running with `GALLIUM_THREAD=0`. The
>     benchmarks below show the performance gains *with* those workarounds
>     applied in order to not inflate the impact of this patch. See
>     https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15516 for more
>     context.
> 
> cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@1a -s width=1920,height=1080 --capture=60
> Before: 33596 us/frame
> After: 30179 us/frame
> 
> cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@1a -s width=1920,height=1080 --capture=60
> Before: 14922 us/frame
> After: 14304 us/frame
> 
> cam -c /base/soc@0/cci@ac4b000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60
> Before: 26106 us/frame
> After: 23312 us/frame
> 
> cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@29 -s width=1920,height=1080 --capture=60
> Before: 15897 us/frame
> After: 14791 us/frame
> 
> cam -c /base/soc@0/cci@ac4a000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60
> Before: 25721 us/frame
> After: 23625 us/frame
> 
> cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@10 -s width=1920,height=1080 --capture=60
> Before: 34124 us/frame
> After: 29471 us/frame
> 
> cam -c /base/soc@0/cci@ac4a000/i2c-bus@0/camera@1a -s width=1920,height=1080 --capture=60
> Before: 23707 us/frame
> After: 21890 us/frame
> 
> cam -c /base/soc@0/bus@30800000/i2c@30a40000/camera@20 -s width=1280,height=720 --capture=60
> Before: 91649 us/frame
> After: 83233 us/frame
> 
> cam -c /base/soc@0/bus@30800000/i2c@30a50000/camera@2d -s width=1280,height=720 --capture=60
> Before: 76956 us/frame
> After: 69569 us/frame
> 
> cam -c /base/i2c-csi/front-camera@3c -s width=1280,height=720 --capture=60
> Before: 188500 us/frame
> After: 173764 us/frame
> 
> cam -c /base/i2c-csi/rear-camera@4c -s width=1280,height=720 --capture=60
> Before: 190222 us/frame
> After: 177251 us/frame
> 
> Signed-off-by: Robert Mader <robert.mader@collabora.com>
> ---
>   include/libcamera/internal/egl.h           |  1 +
>   src/libcamera/egl.cpp                      | 13 +++++++++++++
>   src/libcamera/software_isp/debayer_egl.cpp |  3 ++-
>   3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/libcamera/internal/egl.h b/include/libcamera/internal/egl.h
> index 0ad2320b1..0ec8ea6ec 100644
> --- a/include/libcamera/internal/egl.h
> +++ b/include/libcamera/internal/egl.h
> @@ -119,6 +119,7 @@ public:
>   	void useProgram(GLuint programId);
>   	void deleteProgram(GLuint programId);
>   	void syncOutput();
> +	void flushOutput();
> 
>   private:
>   	LIBCAMERA_DISABLE_COPY_AND_MOVE(eGL)
> diff --git a/src/libcamera/egl.cpp b/src/libcamera/egl.cpp
> index 357918711..19ae92305 100644
> --- a/src/libcamera/egl.cpp
> +++ b/src/libcamera/egl.cpp
> @@ -97,6 +97,19 @@ void eGL::syncOutput()
>   	glFinish();
>   }
> 
> +/**
> + * \brief Flush the rendering pipeline
> + *
> + * Calls glFlush().
> + *
> + */
> +void eGL::flushOutput()
> +{
> +	ASSERT(tid_ == Thread::currentId());
> +
> +	glFlush();
> +}
> +
>   /**
>    * \brief Create a DMA-BUF backed 2D texture
>    * \param[in,out] eglImage EGL image to associate with the DMA-BUF
> diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp
> index ed9a68013..7b9e02d90 100644
> --- a/src/libcamera/software_isp/debayer_egl.cpp
> +++ b/src/libcamera/software_isp/debayer_egl.cpp
> @@ -521,7 +521,7 @@ int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParam
>   		LOG(eGL, Error) << "Drawing scene fail " << err;
>   		return -ENODEV;
>   	} else {
> -		egl_.syncOutput();
> +		egl_.flushOutput();
>   	}
> 
>   	return 0;
> @@ -558,6 +558,7 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output
>   	stats_->processFrame(frame, 0, input);
>   	dmaSyncers.clear();
> 
> +	egl_.syncOutput();
>   	bench_.finishFrame();
> 
>   	outputBufferReady.emit(output);
> --
> 2.54.0
> 

Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>

---
bod

Patch
diff mbox series

diff --git a/include/libcamera/internal/egl.h b/include/libcamera/internal/egl.h
index 0ad2320b1..0ec8ea6ec 100644
--- a/include/libcamera/internal/egl.h
+++ b/include/libcamera/internal/egl.h
@@ -119,6 +119,7 @@  public:
 	void useProgram(GLuint programId);
 	void deleteProgram(GLuint programId);
 	void syncOutput();
+	void flushOutput();
 
 private:
 	LIBCAMERA_DISABLE_COPY_AND_MOVE(eGL)
diff --git a/src/libcamera/egl.cpp b/src/libcamera/egl.cpp
index 357918711..19ae92305 100644
--- a/src/libcamera/egl.cpp
+++ b/src/libcamera/egl.cpp
@@ -97,6 +97,19 @@  void eGL::syncOutput()
 	glFinish();
 }
 
+/**
+ * \brief Flush the rendering pipeline
+ *
+ * Calls glFlush().
+ *
+ */
+void eGL::flushOutput()
+{
+	ASSERT(tid_ == Thread::currentId());
+
+	glFlush();
+}
+
 /**
  * \brief Create a DMA-BUF backed 2D texture
  * \param[in,out] eglImage EGL image to associate with the DMA-BUF
diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp
index ed9a68013..7b9e02d90 100644
--- a/src/libcamera/software_isp/debayer_egl.cpp
+++ b/src/libcamera/software_isp/debayer_egl.cpp
@@ -521,7 +521,7 @@  int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParam
 		LOG(eGL, Error) << "Drawing scene fail " << err;
 		return -ENODEV;
 	} else {
-		egl_.syncOutput();
+		egl_.flushOutput();
 	}
 
 	return 0;
@@ -558,6 +558,7 @@  void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output
 	stats_->processFrame(frame, 0, input);
 	dmaSyncers.clear();
 
+	egl_.syncOutput();
 	bench_.finishFrame();
 
 	outputBufferReady.emit(output);