Message ID | 20240831190511.128151-1-robert.mader@collabora.com |
---|---|
State | New |
Headers | show |
Series |
|
Related | show |
CC'ing Hans, Milan and Bryan. On Sat, Aug 31, 2024 at 09:05:11PM +0200, Robert Mader wrote: > Using `DMA_BUF_IOCTL_SYNC` is required for dmabufs in order to ensure > correct results. Not doing so currently results in occasional tearing > and/or backlashes in GL/VK clients that use the buffers directly for > rendering. > > Signed-off-by: Robert Mader <robert.mader@collabora.com> > --- > src/libcamera/software_isp/debayer_cpu.cpp | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp > index 077f7f4b..6c953b03 100644 > --- a/src/libcamera/software_isp/debayer_cpu.cpp > +++ b/src/libcamera/software_isp/debayer_cpu.cpp > @@ -11,7 +11,9 @@ > > #include "debayer_cpu.h" > > +#include <linux/dma-buf.h> > #include <stdlib.h> > +#include <sys/ioctl.h> > #include <time.h> > > #include <libcamera/formats.h> > @@ -733,6 +735,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > clock_gettime(CLOCK_MONOTONIC_RAW, &frameStartTime); > } > > + for (const FrameBuffer::Plane &plane : output->planes()) { > + const int fd = plane.fd.get(); > + struct dma_buf_sync sync = { DMA_BUF_SYNC_START }; > + > + sync.flags |= DMA_BUF_SYNC_WRITE; > + > + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) > + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; > + } > + > green_ = params.green; > red_ = swapRedBlueGains_ ? params.blue : params.red; > blue_ = swapRedBlueGains_ ? params.red : params.blue; > @@ -760,6 +772,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > > metadata.planes()[0].bytesused = out.planes()[0].size(); > > + for (const FrameBuffer::Plane &plane : output->planes()) { > + const int fd = plane.fd.get(); > + struct dma_buf_sync sync = { DMA_BUF_SYNC_END }; > + > + sync.flags |= DMA_BUF_SYNC_WRITE; > + > + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) > + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; > + } > + > /* Measure before emitting signals */ > if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure && > ++measuredFrames_ > DebayerCpu::kFramesToSkip) {
Hi Robert, On 8/31/24 9:05 PM, Robert Mader wrote: > Using `DMA_BUF_IOCTL_SYNC` is required for dmabufs in order to ensure > correct results. Not doing so currently results in occasional tearing > and/or backlashes in GL/VK clients that use the buffers directly for > rendering. > > Signed-off-by: Robert Mader <robert.mader@collabora.com> Thank you for your patch. I'm not a dmabuf expert, with that said the suggested change looks reasonable to me. One small code-style remark inline below. > --- > src/libcamera/software_isp/debayer_cpu.cpp | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp > index 077f7f4b..6c953b03 100644 > --- a/src/libcamera/software_isp/debayer_cpu.cpp > +++ b/src/libcamera/software_isp/debayer_cpu.cpp > @@ -11,7 +11,9 @@ > > #include "debayer_cpu.h" > > +#include <linux/dma-buf.h> > #include <stdlib.h> > +#include <sys/ioctl.h> > #include <time.h> > > #include <libcamera/formats.h> > @@ -733,6 +735,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > clock_gettime(CLOCK_MONOTONIC_RAW, &frameStartTime); > } > > + for (const FrameBuffer::Plane &plane : output->planes()) { > + const int fd = plane.fd.get(); > + struct dma_buf_sync sync = { DMA_BUF_SYNC_START }; > + > + sync.flags |= DMA_BUF_SYNC_WRITE; These 2 lines are a weird way to write: struct dma_buf_sync sync = { DMA_BUF_SYNC_START | DMA_BUF_SYNC_WRITE }; ? > + > + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) > + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; > + } > + > green_ = params.green; > red_ = swapRedBlueGains_ ? params.blue : params.red; > blue_ = swapRedBlueGains_ ? params.red : params.blue; > @@ -760,6 +772,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > > metadata.planes()[0].bytesused = out.planes()[0].size(); > > + for (const FrameBuffer::Plane &plane : output->planes()) { > + const int fd = plane.fd.get(); > + struct dma_buf_sync sync = { DMA_BUF_SYNC_END }; > + > + sync.flags |= DMA_BUF_SYNC_WRITE; Same here, why not use: struct dma_buf_sync sync = { DMA_BUF_SYNC_END | DMA_BUF_SYNC_WRITE }; ? > + > + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) > + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; > + } > + > /* Measure before emitting signals */ > if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure && > ++measuredFrames_ > DebayerCpu::kFramesToSkip) { Regards, Hans
On Sun, Sep 01, 2024 at 12:56:10PM +0200, Hans de Goede wrote: > Hi Robert, > > On 8/31/24 9:05 PM, Robert Mader wrote: > > Using `DMA_BUF_IOCTL_SYNC` is required for dmabufs in order to ensure > > correct results. Not doing so currently results in occasional tearing > > and/or backlashes in GL/VK clients that use the buffers directly for > > rendering. > > > > Signed-off-by: Robert Mader <robert.mader@collabora.com> > > Thank you for your patch. > > I'm not a dmabuf expert, with that said the suggested change looks > reasonable to me. Hans, would you be able to test this on an IPU6-based device, and check the performance impact ? I don't expect expensive cache management operations on an x86 device. Bryan, could you do the same with camss ? > One small code-style remark inline below. > > > --- > > src/libcamera/software_isp/debayer_cpu.cpp | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp > > index 077f7f4b..6c953b03 100644 > > --- a/src/libcamera/software_isp/debayer_cpu.cpp > > +++ b/src/libcamera/software_isp/debayer_cpu.cpp > > @@ -11,7 +11,9 @@ > > > > #include "debayer_cpu.h" > > > > +#include <linux/dma-buf.h> > > #include <stdlib.h> > > +#include <sys/ioctl.h> > > #include <time.h> > > > > #include <libcamera/formats.h> > > @@ -733,6 +735,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > > clock_gettime(CLOCK_MONOTONIC_RAW, &frameStartTime); > > } > > > > + for (const FrameBuffer::Plane &plane : output->planes()) { > > + const int fd = plane.fd.get(); > > + struct dma_buf_sync sync = { DMA_BUF_SYNC_START }; > > + > > + sync.flags |= DMA_BUF_SYNC_WRITE; > > These 2 lines are a weird way to write: > > struct dma_buf_sync sync = { DMA_BUF_SYNC_START | DMA_BUF_SYNC_WRITE }; > > ? > > > + > > + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) > > + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; > > + } > > + > > green_ = params.green; > > red_ = swapRedBlueGains_ ? params.blue : params.red; > > blue_ = swapRedBlueGains_ ? params.red : params.blue; > > @@ -760,6 +772,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams > > > > metadata.planes()[0].bytesused = out.planes()[0].size(); > > > > + for (const FrameBuffer::Plane &plane : output->planes()) { > > + const int fd = plane.fd.get(); > > + struct dma_buf_sync sync = { DMA_BUF_SYNC_END }; > > + > > + sync.flags |= DMA_BUF_SYNC_WRITE; > > Same here, why not use: > > struct dma_buf_sync sync = { DMA_BUF_SYNC_END | DMA_BUF_SYNC_WRITE }; > > ? > > > + > > + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) > > + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; > > + } > > + > > /* Measure before emitting signals */ > > if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure && > > ++measuredFrames_ > DebayerCpu::kFramesToSkip) {
On 01.09.24 13:07, Laurent Pinchart wrote: > On Sun, Sep 01, 2024 at 12:56:10PM +0200, Hans de Goede wrote: >> Hi Robert, >> >> On 8/31/24 9:05 PM, Robert Mader wrote: >>> Using `DMA_BUF_IOCTL_SYNC` is required for dmabufs in order to ensure >>> correct results. Not doing so currently results in occasional tearing >>> and/or backlashes in GL/VK clients that use the buffers directly for >>> rendering. >>> >>> Signed-off-by: Robert Mader <robert.mader@collabora.com> >> Thank you for your patch. >> >> I'm not a dmabuf expert, with that said the suggested change looks >> reasonable to me. > Hans, would you be able to test this on an IPU6-based device, and check > the performance impact ? I don't expect expensive cache management > operations on an x86 device. > > Bryan, could you do the same with camss ? Just to let you know: I directly submitted this yesterday to start a discussion, however after a bit more thinking I figured we can probably do even better by moving these calls to MappedFrameBuffer. I wrote https://gitlab.freedesktop.org/rmader/libcamera/-/commit/6b7d280634accd9c214d052df34565eb208652e7 for that if you want to have look (will submit early next week). >> One small code-style remark inline below. >> >>> --- >>> src/libcamera/software_isp/debayer_cpu.cpp | 22 ++++++++++++++++++++++ >>> 1 file changed, 22 insertions(+) >>> >>> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp >>> index 077f7f4b..6c953b03 100644 >>> --- a/src/libcamera/software_isp/debayer_cpu.cpp >>> +++ b/src/libcamera/software_isp/debayer_cpu.cpp >>> @@ -11,7 +11,9 @@ >>> >>> #include "debayer_cpu.h" >>> >>> +#include <linux/dma-buf.h> >>> #include <stdlib.h> >>> +#include <sys/ioctl.h> >>> #include <time.h> >>> >>> #include <libcamera/formats.h> >>> @@ -733,6 +735,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams >>> clock_gettime(CLOCK_MONOTONIC_RAW, &frameStartTime); >>> } >>> >>> + for (const FrameBuffer::Plane &plane : output->planes()) { >>> + const int fd = plane.fd.get(); >>> + struct dma_buf_sync sync = { DMA_BUF_SYNC_START }; >>> + >>> + sync.flags |= DMA_BUF_SYNC_WRITE; >> These 2 lines are a weird way to write: >> >> struct dma_buf_sync sync = { DMA_BUF_SYNC_START | DMA_BUF_SYNC_WRITE }; >> >> ? >> >>> + >>> + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) >>> + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; >>> + } >>> + >>> green_ = params.green; >>> red_ = swapRedBlueGains_ ? params.blue : params.red; >>> blue_ = swapRedBlueGains_ ? params.red : params.blue; >>> @@ -760,6 +772,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams >>> >>> metadata.planes()[0].bytesused = out.planes()[0].size(); >>> >>> + for (const FrameBuffer::Plane &plane : output->planes()) { >>> + const int fd = plane.fd.get(); >>> + struct dma_buf_sync sync = { DMA_BUF_SYNC_END }; >>> + >>> + sync.flags |= DMA_BUF_SYNC_WRITE; >> Same here, why not use: >> >> struct dma_buf_sync sync = { DMA_BUF_SYNC_END | DMA_BUF_SYNC_WRITE }; >> >> ? >> >>> + >>> + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) >>> + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; >>> + } >>> + >>> /* Measure before emitting signals */ >>> if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure && >>> ++measuredFrames_ > DebayerCpu::kFramesToSkip) {
On 01.09.24 13:39, Robert Mader wrote: > > On 01.09.24 13:07, Laurent Pinchart wrote: >> Hans, would you be able to test this on an IPU6-based device, and check >> the performance impact ? I don't expect expensive cache management >> operations on an x86 device. >> >> Bryan, could you do the same with camss ? Heads up that in my initial testing around different Gstreamer pipelines on arm64 I saw mixed results: 1. Cases involving successful dmabuf import to the GPU are (much) less prone to glitches while not seeming to regress much in terms of frame rates. This includes running Gnome-Snapshot or waylandsink on devices like the Librem5, PinePhone or Pixel 3a (generally qcom). 2. Cases where Gst mmaps the buffers seem to get a noticeable performance hit. Crucially this applies to common fallback paths like in following example: - glupload tries to import the buffer as dmabuf - fails due to stride requirements... - uses the "raw" importer that mmap the buffer This case is almost tragic IMO. The buffer data ends up only getting accessed by the CPU but we flush the catches/sync to the GPU *twice* - just to upload a copy in the end. And while I see potential to improve this scenario in the other parts of the stack, I don't see anything we can about it in libcamera right now (apart from not landing a patch like this). Regards, Robert
On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote: > On 01.09.24 13:39, Robert Mader wrote: > > > > On 01.09.24 13:07, Laurent Pinchart wrote: > >> Hans, would you be able to test this on an IPU6-based device, and check > >> the performance impact ? I don't expect expensive cache management > >> operations on an x86 device. > >> > >> Bryan, could you do the same with camss ? > > Heads up that in my initial testing around different Gstreamer pipelines > on arm64 I saw mixed results: > > 1. Cases involving successful dmabuf import to the GPU are (much) less > prone to glitches while not seeming to regress much in terms of frame > rates. This includes running Gnome-Snapshot or waylandsink on devices > like the Librem5, PinePhone or Pixel 3a (generally qcom). > > 2. Cases where Gst mmaps the buffers seem to get a noticeable > performance hit. > > Crucially this applies to common fallback paths like in following example: > > - glupload tries to import the buffer as dmabuf > > - fails due to stride requirements... > > - uses the "raw" importer that mmap the buffer > > This case is almost tragic IMO. The buffer data ends up only getting > accessed by the CPU but we flush the catches/sync to the GPU *twice* - > just to upload a copy in the end. > > And while I see potential to improve this scenario in the other parts of > the stack, I don't see anything we can about it in libcamera right now > (apart from not landing a patch like this). It's a bit late, but maybe there's a possibility to submit a lightning talk/BoF topic for LPC in two weeks ? Cache handling is a topic that crosses many subsystem boundaries, and I think we'll have quite a few people with relevant expertise in Vienna.
On Mon, 2 Sept 2024 at 21:32, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote: > > On 01.09.24 13:39, Robert Mader wrote: > > > > > > On 01.09.24 13:07, Laurent Pinchart wrote: > > >> Hans, would you be able to test this on an IPU6-based device, and check > > >> the performance impact ? I don't expect expensive cache management > > >> operations on an x86 device. > > >> > > >> Bryan, could you do the same with camss ? > > > > Heads up that in my initial testing around different Gstreamer pipelines > > on arm64 I saw mixed results: > > > > 1. Cases involving successful dmabuf import to the GPU are (much) less > > prone to glitches while not seeming to regress much in terms of frame > > rates. This includes running Gnome-Snapshot or waylandsink on devices > > like the Librem5, PinePhone or Pixel 3a (generally qcom). > > > > 2. Cases where Gst mmaps the buffers seem to get a noticeable > > performance hit. > > > > Crucially this applies to common fallback paths like in following example: > > > > - glupload tries to import the buffer as dmabuf > > > > - fails due to stride requirements... > > > > - uses the "raw" importer that mmap the buffer > > > > This case is almost tragic IMO. The buffer data ends up only getting > > accessed by the CPU but we flush the catches/sync to the GPU *twice* - > > just to upload a copy in the end. > > > > And while I see potential to improve this scenario in the other parts of > > the stack, I don't see anything we can about it in libcamera right now > > (apart from not landing a patch like this). > > It's a bit late, but maybe there's a possibility to submit a lightning > talk/BoF topic for LPC in two weeks ? Cache handling is a topic that > crosses many subsystem boundaries, and I think we'll have quite a few > people with relevant expertise in Vienna. > This is quite a complicated topic indeed. The RPi camera stack switched to using cacheable dma bufs for performance reasons (> 10% uplift in certain use cases) and we had to be very careful with how to handle the DMA_BUF_IOCTL_SYNC calls at the application level. However, I don't think handling this in MappedFrameBuffer is the right thing for hardware based ISPs because of unexpected stale data flushing/invalidation. I can expand on this during our F2F in Vienna. Naush > -- > Regards, > > Laurent Pinchart
On Tue, Sep 03, 2024 at 08:21:54AM +0100, Naushir Patuck wrote: > On Mon, 2 Sept 2024 at 21:32, Laurent Pinchart wrote: > > On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote: > > > On 01.09.24 13:39, Robert Mader wrote: > > > > On 01.09.24 13:07, Laurent Pinchart wrote: > > > >> Hans, would you be able to test this on an IPU6-based device, and check > > > >> the performance impact ? I don't expect expensive cache management > > > >> operations on an x86 device. > > > >> > > > >> Bryan, could you do the same with camss ? > > > > > > Heads up that in my initial testing around different Gstreamer pipelines > > > on arm64 I saw mixed results: > > > > > > 1. Cases involving successful dmabuf import to the GPU are (much) less > > > prone to glitches while not seeming to regress much in terms of frame > > > rates. This includes running Gnome-Snapshot or waylandsink on devices > > > like the Librem5, PinePhone or Pixel 3a (generally qcom). > > > > > > 2. Cases where Gst mmaps the buffers seem to get a noticeable > > > performance hit. > > > > > > Crucially this applies to common fallback paths like in following example: > > > > > > - glupload tries to import the buffer as dmabuf > > > > > > - fails due to stride requirements... > > > > > > - uses the "raw" importer that mmap the buffer > > > > > > This case is almost tragic IMO. The buffer data ends up only getting > > > accessed by the CPU but we flush the catches/sync to the GPU *twice* - > > > just to upload a copy in the end. > > > > > > And while I see potential to improve this scenario in the other parts of > > > the stack, I don't see anything we can about it in libcamera right now > > > (apart from not landing a patch like this). > > > > It's a bit late, but maybe there's a possibility to submit a lightning > > talk/BoF topic for LPC in two weeks ? Cache handling is a topic that > > crosses many subsystem boundaries, and I think we'll have quite a few > > people with relevant expertise in Vienna. > > This is quite a complicated topic indeed. The RPi camera stack > switched to using cacheable dma bufs for performance reasons (> 10% > uplift in certain use cases) and we had to be very careful with how to > handle the DMA_BUF_IOCTL_SYNC calls at the application level. > However, I don't think handling this in MappedFrameBuffer is the right > thing for hardware based ISPs because of unexpected stale data > flushing/invalidation. I can expand on this during our F2F in Vienna. That seems a good discussion topic to me.
On 03.09.24 15:27, Laurent Pinchart wrote: > On Tue, Sep 03, 2024 at 08:21:54AM +0100, Naushir Patuck wrote: >> On Mon, 2 Sept 2024 at 21:32, Laurent Pinchart wrote: >>> On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote: >>>> On 01.09.24 13:39, Robert Mader wrote: >>>>> On 01.09.24 13:07, Laurent Pinchart wrote: >>>>>> Hans, would you be able to test this on an IPU6-based device, and check >>>>>> the performance impact ? I don't expect expensive cache management >>>>>> operations on an x86 device. >>>>>> >>>>>> Bryan, could you do the same with camss ? >>>> Heads up that in my initial testing around different Gstreamer pipelines >>>> on arm64 I saw mixed results: >>>> >>>> 1. Cases involving successful dmabuf import to the GPU are (much) less >>>> prone to glitches while not seeming to regress much in terms of frame >>>> rates. This includes running Gnome-Snapshot or waylandsink on devices >>>> like the Librem5, PinePhone or Pixel 3a (generally qcom). >>>> >>>> 2. Cases where Gst mmaps the buffers seem to get a noticeable >>>> performance hit. >>>> >>>> Crucially this applies to common fallback paths like in following example: >>>> >>>> - glupload tries to import the buffer as dmabuf >>>> >>>> - fails due to stride requirements... >>>> >>>> - uses the "raw" importer that mmap the buffer >>>> >>>> This case is almost tragic IMO. The buffer data ends up only getting >>>> accessed by the CPU but we flush the catches/sync to the GPU *twice* - >>>> just to upload a copy in the end. >>>> >>>> And while I see potential to improve this scenario in the other parts of >>>> the stack, I don't see anything we can about it in libcamera right now >>>> (apart from not landing a patch like this). >>> It's a bit late, but maybe there's a possibility to submit a lightning >>> talk/BoF topic for LPC in two weeks ? Cache handling is a topic that >>> crosses many subsystem boundaries, and I think we'll have quite a few >>> people with relevant expertise in Vienna. >> This is quite a complicated topic indeed. The RPi camera stack >> switched to using cacheable dma bufs for performance reasons (> 10% >> uplift in certain use cases) and we had to be very careful with how to >> handle the DMA_BUF_IOCTL_SYNC calls at the application level. >> However, I don't think handling this in MappedFrameBuffer is the right >> thing for hardware based ISPs because of unexpected stale data >> flushing/invalidation. I can expand on this during our F2F in Vienna. > That seems a good discussion topic to me. > Unfortunately I won't be able to join that, but I'd be really interested to hear about ideas what could be done to improve the situation. :/
diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp index 077f7f4b..6c953b03 100644 --- a/src/libcamera/software_isp/debayer_cpu.cpp +++ b/src/libcamera/software_isp/debayer_cpu.cpp @@ -11,7 +11,9 @@ #include "debayer_cpu.h" +#include <linux/dma-buf.h> #include <stdlib.h> +#include <sys/ioctl.h> #include <time.h> #include <libcamera/formats.h> @@ -733,6 +735,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams clock_gettime(CLOCK_MONOTONIC_RAW, &frameStartTime); } + for (const FrameBuffer::Plane &plane : output->planes()) { + const int fd = plane.fd.get(); + struct dma_buf_sync sync = { DMA_BUF_SYNC_START }; + + sync.flags |= DMA_BUF_SYNC_WRITE; + + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; + } + green_ = params.green; red_ = swapRedBlueGains_ ? params.blue : params.red; blue_ = swapRedBlueGains_ ? params.red : params.blue; @@ -760,6 +772,16 @@ void DebayerCpu::process(FrameBuffer *input, FrameBuffer *output, DebayerParams metadata.planes()[0].bytesused = out.planes()[0].size(); + for (const FrameBuffer::Plane &plane : output->planes()) { + const int fd = plane.fd.get(); + struct dma_buf_sync sync = { DMA_BUF_SYNC_END }; + + sync.flags |= DMA_BUF_SYNC_WRITE; + + if (ioctl (fd, DMA_BUF_IOCTL_SYNC, &sync) < 0) + LOG(Debayer, Error) << "Syncing buffer FD " << fd << "failed: " << errno; + } + /* Measure before emitting signals */ if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure && ++measuredFrames_ > DebayerCpu::kFramesToSkip) {
Using `DMA_BUF_IOCTL_SYNC` is required for dmabufs in order to ensure correct results. Not doing so currently results in occasional tearing and/or backlashes in GL/VK clients that use the buffers directly for rendering. Signed-off-by: Robert Mader <robert.mader@collabora.com> --- src/libcamera/software_isp/debayer_cpu.cpp | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)