[03/27] libcamera: dma_buf_allocator: Favour udmabuf over cma heap allocations
diff mbox series

Message ID 20250422215920.4297-4-bryan.odonoghue@linaro.org
State RFC
Headers show
Series
  • RFC: Add in a eGL based GPUISP in libcamera
Related show

Commit Message

Bryan O'Donoghue April 22, 2025, 9:58 p.m. UTC
When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
favour allocation from this type of heap over /dev/udmabuf.

We ought to favour udmabuf though

- udmabuf is the preferred method by various distros for security reasons
- Contiguous memory is a scarce resource

Change the ordering of the allocator lookup so that the udmabuf lookup
comes first.

Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
---
 src/libcamera/dma_buf_allocator.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Laurent Pinchart April 22, 2025, 10:22 p.m. UTC | #1
Hi Bryan,

Thank you for the patch.

On Tue, Apr 22, 2025 at 10:58:56PM +0100, Bryan O'Donoghue wrote:
> When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
> favour allocation from this type of heap over /dev/udmabuf.
> 
> We ought to favour udmabuf though
> 
> - udmabuf is the preferred method by various distros for security reasons
> - Contiguous memory is a scarce resource
> 
> Change the ordering of the allocator lookup so that the udmabuf lookup
> comes first.

This means that on a system where CMA allocation is possible from
userspace, the buffers allocated by libcamera for the virtual pipeline
handler and ISP won't be shareable without copies with a consumer that
requires contiguous memory (e.g. most KMS devices on Arm platforms).
Isn't that an issue ? It seems to even count as a regression.

> Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> ---
>  src/libcamera/dma_buf_allocator.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
> index d8c62dd6..722ffd46 100644
> --- a/src/libcamera/dma_buf_allocator.cpp
> +++ b/src/libcamera/dma_buf_allocator.cpp
> @@ -45,10 +45,10 @@ static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
>  	 * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
>  	 * specified on the kernel command line, this gets renamed to "reserved".
>  	 */
> +	{ DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
>  	{ DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
>  	{ DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
>  	{ DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
> -	{ DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
>  } };
>  
>  LOG_DEFINE_CATEGORY(DmaBufAllocator)
Kieran Bingham April 23, 2025, 9:36 a.m. UTC | #2
Quoting Laurent Pinchart (2025-04-22 23:22:31)
> Hi Bryan,
> 
> Thank you for the patch.
> 
> On Tue, Apr 22, 2025 at 10:58:56PM +0100, Bryan O'Donoghue wrote:
> > When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
> > favour allocation from this type of heap over /dev/udmabuf.
> > 
> > We ought to favour udmabuf though
> > 
> > - udmabuf is the preferred method by various distros for security reasons
> > - Contiguous memory is a scarce resource
> > 
> > Change the ordering of the allocator lookup so that the udmabuf lookup
> > comes first.
> 
> This means that on a system where CMA allocation is possible from
> userspace, the buffers allocated by libcamera for the virtual pipeline
> handler and ISP won't be shareable without copies with a consumer that
> requires contiguous memory (e.g. most KMS devices on Arm platforms).
> Isn't that an issue ? It seems to even count as a regression.
> 

On my X13s, everything works better for me on udmabuf. If we try to use
the cma allocator things fail.

That /could/ be a separate topic/issue - but at least might explain some
of the rationale behind this.

I usually end up with something like this on my system.


> > Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
> > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> > ---
> >  src/libcamera/dma_buf_allocator.cpp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
> > index d8c62dd6..722ffd46 100644
> > --- a/src/libcamera/dma_buf_allocator.cpp
> > +++ b/src/libcamera/dma_buf_allocator.cpp
> > @@ -45,10 +45,10 @@ static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
> >        * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
> >        * specified on the kernel command line, this gets renamed to "reserved".
> >        */
> > +     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
> >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
> >       { DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
> > -     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> >  } };
> >  
> >  LOG_DEFINE_CATEGORY(DmaBufAllocator)
> 
> -- 
> Regards,
> 
> Laurent Pinchart
Nicolas Dufresne April 23, 2025, 12:50 p.m. UTC | #3
Le mercredi 23 avril 2025 à 10:36 +0100, Kieran Bingham a écrit :
> Quoting Laurent Pinchart (2025-04-22 23:22:31)
> > Hi Bryan,
> > 
> > Thank you for the patch.
> > 
> > On Tue, Apr 22, 2025 at 10:58:56PM +0100, Bryan O'Donoghue wrote:
> > > When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
> > > favour allocation from this type of heap over /dev/udmabuf.
> > > 
> > > We ought to favour udmabuf though
> > > 
> > > - udmabuf is the preferred method by various distros for security reasons
> > > - Contiguous memory is a scarce resource
> > > 
> > > Change the ordering of the allocator lookup so that the udmabuf lookup
> > > comes first.
> > 
> > This means that on a system where CMA allocation is possible from
> > userspace, the buffers allocated by libcamera for the virtual pipeline
> > handler and ISP won't be shareable without copies with a consumer that
> > requires contiguous memory (e.g. most KMS devices on Arm platforms).
> > Isn't that an issue ? It seems to even count as a regression.
> > 
> 
> On my X13s, everything works better for me on udmabuf. If we try to use
> the cma allocator things fail.
> 
> That /could/ be a separate topic/issue - but at least might explain some
> of the rationale behind this.
> 
> I usually end up with something like this on my system.

Did you mean to add a link ? I'm quite curious, since here on Meteor
Lake GPU, passing over anything that has been touched by the CPU
results in a big mess. This used to only happen when passing these to
the display controller, now the GPU also does not handle the CPU
coherent memory properly.

Nicolas

> 
> 
> > > Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
> > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> > > ---
> > >  src/libcamera/dma_buf_allocator.cpp | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
> > > index d8c62dd6..722ffd46 100644
> > > --- a/src/libcamera/dma_buf_allocator.cpp
> > > +++ b/src/libcamera/dma_buf_allocator.cpp
> > > @@ -45,10 +45,10 @@ static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
> > >        * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
> > >        * specified on the kernel command line, this gets renamed to "reserved".
> > >        */
> > > +     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
> > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
> > >       { DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
> > > -     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > >  } };
> > >  
> > >  LOG_DEFINE_CATEGORY(DmaBufAllocator)
> > 
> > -- 
> > Regards,
> > 
> > Laurent Pinchart
Kieran Bingham April 23, 2025, 1:07 p.m. UTC | #4
Quoting Nicolas Dufresne (2025-04-23 13:50:03)
> Le mercredi 23 avril 2025 à 10:36 +0100, Kieran Bingham a écrit :
> > Quoting Laurent Pinchart (2025-04-22 23:22:31)
> > > Hi Bryan,
> > > 
> > > Thank you for the patch.
> > > 
> > > On Tue, Apr 22, 2025 at 10:58:56PM +0100, Bryan O'Donoghue wrote:
> > > > When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
> > > > favour allocation from this type of heap over /dev/udmabuf.
> > > > 
> > > > We ought to favour udmabuf though
> > > > 
> > > > - udmabuf is the preferred method by various distros for security reasons
> > > > - Contiguous memory is a scarce resource
> > > > 
> > > > Change the ordering of the allocator lookup so that the udmabuf lookup
> > > > comes first.
> > > 
> > > This means that on a system where CMA allocation is possible from
> > > userspace, the buffers allocated by libcamera for the virtual pipeline
> > > handler and ISP won't be shareable without copies with a consumer that
> > > requires contiguous memory (e.g. most KMS devices on Arm platforms).
> > > Isn't that an issue ? It seems to even count as a regression.
> > > 
> > 
> > On my X13s, everything works better for me on udmabuf. If we try to use
> > the cma allocator things fail.
> > 
> > That /could/ be a separate topic/issue - but at least might explain some
> > of the rationale behind this.
> > 
> > I usually end up with something like this on my system.
> 
> Did you mean to add a link ? I'm quite curious, since here on Meteor

No, sorry - In the text above, I meant "I usually end up with something
like ... 'this patch' ... on my system". Meaning I configure my x13s to
prefer udmabuf over cma because cma allocations frequently fail for me.

It's not clear if it's a resource leak, or indeterminate usage by other
components - (hence might be something else to look into) but udmabuf
always works on softisp for x13s so far, so just use it.


> Lake GPU, passing over anything that has been touched by the CPU
> results in a big mess. This used to only happen when passing these to
> the display controller, now the GPU also does not handle the CPU
> coherent memory properly.

even if the CPU only 'reads' the data ?

For GPU ISP - the only access the CPU should do is read the data to
generate some statistics. And even that - I would hope in the future
will be turned into operations performed by the GPU ...

--
Kiran


> 
> Nicolas
> 
> > 
> > 
> > > > Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
> > > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> > > > ---
> > > >  src/libcamera/dma_buf_allocator.cpp | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
> > > > index d8c62dd6..722ffd46 100644
> > > > --- a/src/libcamera/dma_buf_allocator.cpp
> > > > +++ b/src/libcamera/dma_buf_allocator.cpp
> > > > @@ -45,10 +45,10 @@ static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
> > > >        * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
> > > >        * specified on the kernel command line, this gets renamed to "reserved".
> > > >        */
> > > > +     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
> > > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
> > > >       { DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
> > > > -     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > > >  } };
> > > >  
> > > >  LOG_DEFINE_CATEGORY(DmaBufAllocator)
> > > 
> > > -- 
> > > Regards,
> > > 
> > > Laurent Pinchart
Nicolas Dufresne April 24, 2025, 1:17 p.m. UTC | #5
Le mercredi 23 avril 2025 à 14:07 +0100, Kieran Bingham a écrit :
> Quoting Nicolas Dufresne (2025-04-23 13:50:03)
> > Le mercredi 23 avril 2025 à 10:36 +0100, Kieran Bingham a écrit :
> > > Quoting Laurent Pinchart (2025-04-22 23:22:31)
> > > > Hi Bryan,
> > > > 
> > > > Thank you for the patch.
> > > > 
> > > > On Tue, Apr 22, 2025 at 10:58:56PM +0100, Bryan O'Donoghue wrote:
> > > > > When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
> > > > > favour allocation from this type of heap over /dev/udmabuf.
> > > > > 
> > > > > We ought to favour udmabuf though
> > > > > 
> > > > > - udmabuf is the preferred method by various distros for security reasons
> > > > > - Contiguous memory is a scarce resource
> > > > > 
> > > > > Change the ordering of the allocator lookup so that the udmabuf lookup
> > > > > comes first.
> > > > 
> > > > This means that on a system where CMA allocation is possible from
> > > > userspace, the buffers allocated by libcamera for the virtual pipeline
> > > > handler and ISP won't be shareable without copies with a consumer that
> > > > requires contiguous memory (e.g. most KMS devices on Arm platforms).
> > > > Isn't that an issue ? It seems to even count as a regression.
> > > > 
> > > 
> > > On my X13s, everything works better for me on udmabuf. If we try to use
> > > the cma allocator things fail.
> > > 
> > > That /could/ be a separate topic/issue - but at least might explain some
> > > of the rationale behind this.
> > > 
> > > I usually end up with something like this on my system.
> > 
> > Did you mean to add a link ? I'm quite curious, since here on Meteor
> 
> No, sorry - In the text above, I meant "I usually end up with something
> like ... 'this patch' ... on my system". Meaning I configure my x13s to
> prefer udmabuf over cma because cma allocations frequently fail for me.
> 
> It's not clear if it's a resource leak, or indeterminate usage by other
> components - (hence might be something else to look into) but udmabuf
> always works on softisp for x13s so far, so just use it.
> 
> 
> > Lake GPU, passing over anything that has been touched by the CPU
> > results in a big mess. This used to only happen when passing these to
> > the display controller, now the GPU also does not handle the CPU
> > coherent memory properly.
> 
> even if the CPU only 'reads' the data ?

Its any CPU write -> GPU that in serious trouble on recent Intel. That
being said, if nothing either flush or invalidate the cache, if you
read before, write with GPU, and read again after that same buffer, the
data you'll read may be from old cache. In practice, GPU don't do
anything in-place, so that is non-issue here.

> 
> For GPU ISP - the only access the CPU should do is read the data to
> generate some statistics. And even that - I would hope in the future
> will be turned into operations performed by the GPU ...

So read bayer, process in GPU, output YUV should just work. Thanks for
clarification, running out of CMA is pretty common, defaults CMA
reservation is usually very small. Appart from the buggy drivers,
virtual memory is a much better choice, and this is mostly what this
patch is about here.

Nicolas

> 
> --
> Kiran
> 
> 
> > 
> > Nicolas
> > 
> > > 
> > > 
> > > > > Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
> > > > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
> > > > > ---
> > > > >  src/libcamera/dma_buf_allocator.cpp | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
> > > > > index d8c62dd6..722ffd46 100644
> > > > > --- a/src/libcamera/dma_buf_allocator.cpp
> > > > > +++ b/src/libcamera/dma_buf_allocator.cpp
> > > > > @@ -45,10 +45,10 @@ static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
> > > > >        * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
> > > > >        * specified on the kernel command line, this gets renamed to "reserved".
> > > > >        */
> > > > > +     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > > > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
> > > > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
> > > > >       { DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
> > > > > -     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > > > >  } };
> > > > >  
> > > > >  LOG_DEFINE_CATEGORY(DmaBufAllocator)
> > > > 
> > > > -- 
> > > > Regards,
> > > > 
> > > > Laurent Pinchart
Bryan O'Donoghue April 24, 2025, 1:51 p.m. UTC | #6
On 24/04/2025 14:17, Nicolas Dufresne wrote:
>> even if the CPU only 'reads' the data ?
> Its any CPU write -> GPU that in serious trouble on recent Intel. That
> being said, if nothing either flush or invalidate the cache, if you
> read before, write with GPU, and read again after that same buffer, the
> data you'll read may be from old cache. In practice, GPU don't do
> anything in-place, so that is non-issue here.

Well we have the GPU write to its own frame buffer and currently then 
get a GBM pointer to that surface and memcpy() out into the libcamera 
buffer be that CMA or UDMABuf.

+	gbm_surface_lock_front_buffer(gbm_surface_);
+
+	sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
+	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
+
+	if (data_len > framesize_) {
+		LOG(GBM, Error) << "Invalid read size " << data_len << " max is " << 
framesize_;
+		return -EINVAL;
+	}
+
+	memcpy(data, map_, data_len);
+
+	sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
+	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
+
+	gbm_surface_release_buffer(gbm_surface_, gbm_bo_);
+

That I believe should be cache-coherent for both CMA and UDMAbuf across 
architectures.

We had an interesting discussion on how render-to-texture would work - 
if you created the render texture using eglCreateImageKHR.

// render to texture

dmabuf_handle = handle to CMA or UDMABuf buffer.

EGLint attrs[] = {
	EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_handle
};

mytexture-output = glCreateImageKHR(display, context,
                                     EGL_LINUX_DMA_BUF_EXT,
                                     NULL, attrs);

glBindFramebuffer(GL_FRAMBUFFER, someframebuffer);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
                        GL_TEXTURE_2D, mytexture-output);


if dmabuf_handle points to CMA heap - who is responsible to flush the 
cache before the CPU accessess the content of the DMA buf handle..

actually as I type that, I think the answer is that it is always the 
responsibility of the CPU side to manage its own cache so..

glDraw();

sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);

sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);

something like that ..

>> For GPU ISP - the only access the CPU should do is read the data to
>> generate some statistics. And even that - I would hope in the future
>> will be turned into operations performed by the GPU ...
> So read bayer, process in GPU, output YUV should just work. Thanks for
> clarification, running out of CMA is pretty common, defaults CMA
> reservation is usually very small. Appart from the buggy drivers,
> virtual memory is a much better choice, and this is mostly what this
> patch is about here.
Thinking about this patch.

A user on a system such as imx/hantro is free to configure the system to 
support CMA and UDMABuf.

If you never pass that buffer to the video encoder - where the encoder 
requires phys contig/cma heap memory - then you probably want udmabuf.

The reverse is also true.

Its almost use-case specific. If you want the encoder you need CMA and 
if you just want to say - display the output in cheese you want UDMAbuf.

Probably the right thing to do is to leave CMA heap as default and leave 
it to the system designer to configure the system as they wish.

In my case, switching off the CMA heap or perhaps a libcamera config/env 
variable to specify which to use...

---
bod
Nicolas Dufresne April 25, 2025, 1:01 p.m. UTC | #7
Le jeudi 24 avril 2025 à 17:14 +0300, Laurent Pinchart a écrit :
> On Thu, Apr 24, 2025 at 02:51:42PM +0100, Bryan O'Donoghue wrote:
> > On 24/04/2025 14:17, Nicolas Dufresne wrote:
> > > > even if the CPU only 'reads' the data ?
> > > Its any CPU write -> GPU that in serious trouble on recent Intel. That
> > > being said, if nothing either flush or invalidate the cache, if you
> > > read before, write with GPU, and read again after that same buffer, the
> > > data you'll read may be from old cache. In practice, GPU don't do
> > > anything in-place, so that is non-issue here.
> > 
> > Well we have the GPU write to its own frame buffer and currently then 
> > get a GBM pointer to that surface and memcpy() out into the libcamera 
> > buffer be that CMA or UDMABuf.
> > 
> > +	gbm_surface_lock_front_buffer(gbm_surface_);
> > +
> > +	sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
> > +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
> > +
> > +	if (data_len > framesize_) {
> > +		LOG(GBM, Error) << "Invalid read size " << data_len << " max is " << framesize_;
> > +		return -EINVAL;
> > +	}
> > +
> > +	memcpy(data, map_, data_len);
> > +
> > +	sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
> > +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
> > +
> > +	gbm_surface_release_buffer(gbm_surface_, gbm_bo_);
> > +
> > 
> > That I believe should be cache-coherent for both CMA and UDMAbuf across 
> > architectures.
> > 
> > We had an interesting discussion on how render-to-texture would work - 
> > if you created the render texture using eglCreateImageKHR.
> > 
> > // render to texture
> > 
> > dmabuf_handle = handle to CMA or UDMABuf buffer.
> > 
> > EGLint attrs[] = {
> > 	EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_handle
> > };
> > 
> > mytexture-output = glCreateImageKHR(display, context,
> >                                      EGL_LINUX_DMA_BUF_EXT,
> >                                      NULL, attrs);
> > 
> > glBindFramebuffer(GL_FRAMBUFFER, someframebuffer);
> > glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
> >                         GL_TEXTURE_2D, mytexture-output);
> > 
> > 
> > if dmabuf_handle points to CMA heap - who is responsible to flush the 
> > cache before the CPU accessess the content of the DMA buf handle..
> > 
> > actually as I type that, I think the answer is that it is always the 
> > responsibility of the CPU side to manage its own cache so..
> > 
> > glDraw();
> > 
> > sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
> > ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
> > 
> > sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
> > ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
> > 
> > something like that ..
> > 
> > > > For GPU ISP - the only access the CPU should do is read the data to
> > > > generate some statistics. And even that - I would hope in the future
> > > > will be turned into operations performed by the GPU ...
> > > So read bayer, process in GPU, output YUV should just work. Thanks for
> > > clarification, running out of CMA is pretty common, defaults CMA
> > > reservation is usually very small. Appart from the buggy drivers,
> > > virtual memory is a much better choice, and this is mostly what this
> > > patch is about here.
> > 
> > Thinking about this patch.
> > 
> > A user on a system such as imx/hantro is free to configure the system to 
> > support CMA and UDMABuf.
> > 
> > If you never pass that buffer to the video encoder - where the encoder 
> > requires phys contig/cma heap memory - then you probably want udmabuf.
> 
> Add the display controller here, it's not just the video encoder.
> 
> Have you tried displaying images with cam (using the -D flag) on a
> system without an IOMMU, with this patch applied ? Does it still work ?

In this case glCreateImageKHR() is expected to fail once it discover
that the memory is scattered. In my experience, this part is well
handled. What's missing sometimes is verification that memory
allocation have happened with CPU cached enabled. Without the mmu,
there is nothing to handle cached pages. Recent Intel graphics happily
succeed, resulting in a funny mess on screen.

I assume the code here is just for example, and that there is failure
and fallback in place.

> 
> > The reverse is also true.
> > 
> > Its almost use-case specific. If you want the encoder you need CMA and 
> > if you just want to say - display the output in cheese you want UDMAbuf.
> > 
> > Probably the right thing to do is to leave CMA heap as default and leave 
> > it to the system designer to configure the system as they wish.
> > 
> > In my case, switching off the CMA heap or perhaps a libcamera config/env 
> > variable to specify which to use...
Nicolas Dufresne April 25, 2025, 1:07 p.m. UTC | #8
Le jeudi 24 avril 2025 à 14:51 +0100, Bryan O'Donoghue a écrit :
> Thinking about this patch.
> 
> A user on a system such as imx/hantro is free to configure the system to 
> support CMA and UDMABuf.
> 
> If you never pass that buffer to the video encoder - where the encoder 
> requires phys contig/cma heap memory - then you probably want udmabuf.
> 
> The reverse is also true.
> 
> Its almost use-case specific. If you want the encoder you need CMA and 
> if you just want to say - display the output in cheese you want UDMAbuf.

That is hardware specific. There is some encoders with an iommu that
can emit page fault on CPU cached pages, and so can handle UDMABuf
fully in hardware. i.MX8 series, not so much.

> 
> Probably the right thing to do is to leave CMA heap as default and leave 
> it to the system designer to configure the system as they wish.
> 
> In my case, switching off the CMA heap or perhaps a libcamera config/env 
> variable to specify which to use...

As we target "Desktop like" with the software ISP, there is no system
designer in that process. Unless you consider the user to be as such.
That being said, you can quirk it based on which CSI receiver brand in
use if you don't want to try one and cascade to the other if one fails.

Nicolas
Laurent Pinchart April 25, 2025, 2:37 p.m. UTC | #9
On Fri, Apr 25, 2025 at 09:07:46AM -0400, Nicolas Dufresne wrote:
> Le jeudi 24 avril 2025 à 14:51 +0100, Bryan O'Donoghue a écrit :
> > Thinking about this patch.
> > 
> > A user on a system such as imx/hantro is free to configure the system to 
> > support CMA and UDMABuf.
> > 
> > If you never pass that buffer to the video encoder - where the encoder 
> > requires phys contig/cma heap memory - then you probably want udmabuf.
> > 
> > The reverse is also true.
> > 
> > Its almost use-case specific. If you want the encoder you need CMA and 
> > if you just want to say - display the output in cheese you want UDMAbuf.
> 
> That is hardware specific. There is some encoders with an iommu that
> can emit page fault on CPU cached pages, and so can handle UDMABuf
> fully in hardware. i.MX8 series, not so much.

Is that on IOMMU that can clean dirty cache entries (with support from
the OS, through a fault) ? I wasn't aware of such hardware. It's
interesting (at least in theory, I wonder what performance you can get
out of that). Is there any example you can give ?

> > Probably the right thing to do is to leave CMA heap as default and leave 
> > it to the system designer to configure the system as they wish.
> > 
> > In my case, switching off the CMA heap or perhaps a libcamera config/env 
> > variable to specify which to use...
> 
> As we target "Desktop like" with the software ISP, there is no system
> designer in that process. Unless you consider the user to be as such.
> That being said, you can quirk it based on which CSI receiver brand in
> use if you don't want to try one and cascade to the other if one fails.
Laurent Pinchart April 25, 2025, 3:07 p.m. UTC | #10
On Fri, Apr 25, 2025 at 09:01:33AM -0400, Nicolas Dufresne wrote:
> Le jeudi 24 avril 2025 à 17:14 +0300, Laurent Pinchart a écrit :
> > On Thu, Apr 24, 2025 at 02:51:42PM +0100, Bryan O'Donoghue wrote:
> > > On 24/04/2025 14:17, Nicolas Dufresne wrote:
> > > > > even if the CPU only 'reads' the data ?
> > > > Its any CPU write -> GPU that in serious trouble on recent Intel. That
> > > > being said, if nothing either flush or invalidate the cache, if you
> > > > read before, write with GPU, and read again after that same buffer, the
> > > > data you'll read may be from old cache. In practice, GPU don't do
> > > > anything in-place, so that is non-issue here.
> > > 
> > > Well we have the GPU write to its own frame buffer and currently then 
> > > get a GBM pointer to that surface and memcpy() out into the libcamera 
> > > buffer be that CMA or UDMABuf.
> > > 
> > > +	gbm_surface_lock_front_buffer(gbm_surface_);
> > > +
> > > +	sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
> > > +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
> > > +
> > > +	if (data_len > framesize_) {
> > > +		LOG(GBM, Error) << "Invalid read size " << data_len << " max is " << framesize_;
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	memcpy(data, map_, data_len);
> > > +
> > > +	sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
> > > +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
> > > +
> > > +	gbm_surface_release_buffer(gbm_surface_, gbm_bo_);
> > > +
> > > 
> > > That I believe should be cache-coherent for both CMA and UDMAbuf across 
> > > architectures.
> > > 
> > > We had an interesting discussion on how render-to-texture would work - 
> > > if you created the render texture using eglCreateImageKHR.
> > > 
> > > // render to texture
> > > 
> > > dmabuf_handle = handle to CMA or UDMABuf buffer.
> > > 
> > > EGLint attrs[] = {
> > > 	EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_handle
> > > };
> > > 
> > > mytexture-output = glCreateImageKHR(display, context,
> > >                                      EGL_LINUX_DMA_BUF_EXT,
> > >                                      NULL, attrs);
> > > 
> > > glBindFramebuffer(GL_FRAMBUFFER, someframebuffer);
> > > glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
> > >                         GL_TEXTURE_2D, mytexture-output);
> > > 
> > > 
> > > if dmabuf_handle points to CMA heap - who is responsible to flush the 
> > > cache before the CPU accessess the content of the DMA buf handle..
> > > 
> > > actually as I type that, I think the answer is that it is always the 
> > > responsibility of the CPU side to manage its own cache so..
> > > 
> > > glDraw();
> > > 
> > > sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
> > > ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
> > > 
> > > sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
> > > ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
> > > 
> > > something like that ..
> > > 
> > > > > For GPU ISP - the only access the CPU should do is read the data to
> > > > > generate some statistics. And even that - I would hope in the future
> > > > > will be turned into operations performed by the GPU ...
> > > > So read bayer, process in GPU, output YUV should just work. Thanks for
> > > > clarification, running out of CMA is pretty common, defaults CMA
> > > > reservation is usually very small. Appart from the buggy drivers,
> > > > virtual memory is a much better choice, and this is mostly what this
> > > > patch is about here.
> > > 
> > > Thinking about this patch.
> > > 
> > > A user on a system such as imx/hantro is free to configure the system to 
> > > support CMA and UDMABuf.
> > > 
> > > If you never pass that buffer to the video encoder - where the encoder 
> > > requires phys contig/cma heap memory - then you probably want udmabuf.
> > 
> > Add the display controller here, it's not just the video encoder.
> > 
> > Have you tried displaying images with cam (using the -D flag) on a
> > system without an IOMMU, with this patch applied ? Does it still work ?
> 
> In this case glCreateImageKHR() is expected to fail once it discover
> that the memory is scattered. In my experience, this part is well
> handled.

It will only fail if the GPU has no IOMMU, right ? What if the GPU has
an IOMMU, but the display controller doesn't ?

> What's missing sometimes is verification that memory
> allocation have happened with CPU cached enabled. Without the mmu,
> there is nothing to handle cached pages. Recent Intel graphics happily
> succeed, resulting in a funny mess on screen.
> 
> I assume the code here is just for example, and that there is failure
> and fallback in place.
> 
> > > The reverse is also true.
> > > 
> > > Its almost use-case specific. If you want the encoder you need CMA and 
> > > if you just want to say - display the output in cheese you want UDMAbuf.
> > > 
> > > Probably the right thing to do is to leave CMA heap as default and leave 
> > > it to the system designer to configure the system as they wish.
> > > 
> > > In my case, switching off the CMA heap or perhaps a libcamera config/env 
> > > variable to specify which to use...
Robert Mader April 25, 2025, 3:40 p.m. UTC | #11
On 25.04.25 17:07, Laurent Pinchart wrote:
> On Fri, Apr 25, 2025 at 09:01:33AM -0400, Nicolas Dufresne wrote:
>> Le jeudi 24 avril 2025 à 17:14 +0300, Laurent Pinchart a écrit :
>>> On Thu, Apr 24, 2025 at 02:51:42PM +0100, Bryan O'Donoghue wrote:
>>>> On 24/04/2025 14:17, Nicolas Dufresne wrote:
>>>>>> even if the CPU only 'reads' the data ?
>>>>> Its any CPU write -> GPU that in serious trouble on recent Intel. That
>>>>> being said, if nothing either flush or invalidate the cache, if you
>>>>> read before, write with GPU, and read again after that same buffer, the
>>>>> data you'll read may be from old cache. In practice, GPU don't do
>>>>> anything in-place, so that is non-issue here.
>>>> Well we have the GPU write to its own frame buffer and currently then
>>>> get a GBM pointer to that surface and memcpy() out into the libcamera
>>>> buffer be that CMA or UDMABuf.
>>>>
>>>> +	gbm_surface_lock_front_buffer(gbm_surface_);
>>>> +
>>>> +	sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
>>>> +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
>>>> +
>>>> +	if (data_len > framesize_) {
>>>> +		LOG(GBM, Error) << "Invalid read size " << data_len << " max is " << framesize_;
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	memcpy(data, map_, data_len);
>>>> +
>>>> +	sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
>>>> +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
>>>> +
>>>> +	gbm_surface_release_buffer(gbm_surface_, gbm_bo_);
>>>> +
>>>>
>>>> That I believe should be cache-coherent for both CMA and UDMAbuf across
>>>> architectures.
>>>>
>>>> We had an interesting discussion on how render-to-texture would work -
>>>> if you created the render texture using eglCreateImageKHR.
>>>>
>>>> // render to texture
>>>>
>>>> dmabuf_handle = handle to CMA or UDMABuf buffer.
>>>>
>>>> EGLint attrs[] = {
>>>> 	EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_handle
>>>> };
>>>>
>>>> mytexture-output = glCreateImageKHR(display, context,
>>>>                                       EGL_LINUX_DMA_BUF_EXT,
>>>>                                       NULL, attrs);
>>>>
>>>> glBindFramebuffer(GL_FRAMBUFFER, someframebuffer);
>>>> glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
>>>>                          GL_TEXTURE_2D, mytexture-output);
>>>>
>>>>
>>>> if dmabuf_handle points to CMA heap - who is responsible to flush the
>>>> cache before the CPU accessess the content of the DMA buf handle..
>>>>
>>>> actually as I type that, I think the answer is that it is always the
>>>> responsibility of the CPU side to manage its own cache so..
>>>>
>>>> glDraw();
>>>>
>>>> sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
>>>> ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
>>>>
>>>> sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
>>>> ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
>>>>
>>>> something like that ..
>>>>
>>>>>> For GPU ISP - the only access the CPU should do is read the data to
>>>>>> generate some statistics. And even that - I would hope in the future
>>>>>> will be turned into operations performed by the GPU ...
>>>>> So read bayer, process in GPU, output YUV should just work. Thanks for
>>>>> clarification, running out of CMA is pretty common, defaults CMA
>>>>> reservation is usually very small. Appart from the buggy drivers,
>>>>> virtual memory is a much better choice, and this is mostly what this
>>>>> patch is about here.
>>>> Thinking about this patch.
>>>>
>>>> A user on a system such as imx/hantro is free to configure the system to
>>>> support CMA and UDMABuf.
>>>>
>>>> If you never pass that buffer to the video encoder - where the encoder
>>>> requires phys contig/cma heap memory - then you probably want udmabuf.
>>> Add the display controller here, it's not just the video encoder.
>>>
>>> Have you tried displaying images with cam (using the -D flag) on a
>>> system without an IOMMU, with this patch applied ? Does it still work ?
>> In this case glCreateImageKHR() is expected to fail once it discover
>> that the memory is scattered. In my experience, this part is well
>> handled.
> It will only fail if the GPU has no IOMMU, right ? What if the GPU has
> an IOMMU, but the display controller doesn't ?

This would be the RPi4 case (on the RPi5 the DC has an IOMMU). I've been 
testing this quite a bit in the context of 
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/8540 
- using udmabuf with software video decoders in order to reuse dmabuf 
fastpaths and avoid copies.

The DC in question supports DRM_FORMAT_YUV420, which is the native 
format used by most SW decoders (dav1d, ffpmeg, vpx, openh264, ...) for 
8-bit 420 content. So far the case has been handled perfectly as 
expected - import to GPU (including in Wayland compositors) works but 
KMS offloading fails in the test commit on the RPi4, while succeeding on 
RPi5.

I've also tested several other devices and have yet to hit a case where 
things blow up because some component requiring CMA tries to import an 
udmabuf. The concern that it could happen has been rased by RPi devs as 
well though 
(https://github.com/raspberrypi/linux/issues/6706#issuecomment-2706283848).

---

Either way, do we agree that the commit in question should have its own 
series for further discussion as it's only semi-related to the gpuISP?

>> What's missing sometimes is verification that memory
>> allocation have happened with CPU cached enabled. Without the mmu,
>> there is nothing to handle cached pages. Recent Intel graphics happily
>> succeed, resulting in a funny mess on screen.
>>
>> I assume the code here is just for example, and that there is failure
>> and fallback in place.
>>
>>>> The reverse is also true.
>>>>
>>>> Its almost use-case specific. If you want the encoder you need CMA and
>>>> if you just want to say - display the output in cheese you want UDMAbuf.
>>>>
>>>> Probably the right thing to do is to leave CMA heap as default and leave
>>>> it to the system designer to configure the system as they wish.
>>>>
>>>> In my case, switching off the CMA heap or perhaps a libcamera config/env
>>>> variable to specify which to use...
Nicolas Dufresne April 25, 2025, 7:16 p.m. UTC | #12
Le vendredi 25 avril 2025 à 17:37 +0300, Laurent Pinchart a écrit :
> On Fri, Apr 25, 2025 at 09:07:46AM -0400, Nicolas Dufresne wrote:
> > Le jeudi 24 avril 2025 à 14:51 +0100, Bryan O'Donoghue a écrit :
> > > Thinking about this patch.
> > > 
> > > A user on a system such as imx/hantro is free to configure the system to 
> > > support CMA and UDMABuf.
> > > 
> > > If you never pass that buffer to the video encoder - where the encoder 
> > > requires phys contig/cma heap memory - then you probably want udmabuf.
> > > 
> > > The reverse is also true.
> > > 
> > > Its almost use-case specific. If you want the encoder you need CMA and 
> > > if you just want to say - display the output in cheese you want UDMAbuf.
> > 
> > That is hardware specific. There is some encoders with an iommu that
> > can emit page fault on CPU cached pages, and so can handle UDMABuf
> > fully in hardware. i.MX8 series, not so much.
> 
> Is that on IOMMU that can clean dirty cache entries (with support from
> the OS, through a fault) ? I wasn't aware of such hardware. It's
> interesting (at least in theory, I wonder what performance you can get
> out of that). Is there any example you can give ?

If its a page fault, its not fully HW right ? That is my understanding,
hopefully not too off though. In short, this works with integrated GPU
encoders. The performance would be atrocious if it was simply flushing
ahead time.

> 
> > > Probably the right thing to do is to leave CMA heap as default and leave 
> > > it to the system designer to configure the system as they wish.
> > > 
> > > In my case, switching off the CMA heap or perhaps a libcamera config/env 
> > > variable to specify which to use...
> > 
> > As we target "Desktop like" with the software ISP, there is no system
> > designer in that process. Unless you consider the user to be as such.
> > That being said, you can quirk it based on which CSI receiver brand in
> > use if you don't want to try one and cascade to the other if one fails.

Patch
diff mbox series

diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
index d8c62dd6..722ffd46 100644
--- a/src/libcamera/dma_buf_allocator.cpp
+++ b/src/libcamera/dma_buf_allocator.cpp
@@ -45,10 +45,10 @@  static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
 	 * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
 	 * specified on the kernel command line, this gets renamed to "reserved".
 	 */
+	{ DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
 	{ DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
 	{ DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
 	{ DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
-	{ DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
 } };
 
 LOG_DEFINE_CATEGORY(DmaBufAllocator)