| Message ID | 20260503114002.139255-3-robert.mader@collabora.com |
|---|---|
| State | Superseded |
| Headers | show |
| Series |
|
| Related | show |
2026. 05. 03. 13:40 keltezéssel, Robert Mader írta: > In many cases we can import the GPU-ISP input buffers, dmabufs from v4l2, > directly into EGL instead of mapping and uploading - i.e. copying - them. > > Doing so can have positive effects in multiple areas, including reducing > memory bandwidth and CPU usage, as well as avoiding expensive dmabuf syncs > and syscalls. > > The main reason direct imports may not work are the more demanding stride > alignment requirements many GPUs have - often 128 or 256 bytes - compared > to ISPs - apparently often closer to 32 bytes. > > Thus we first try to import buffers directly and - if that fails - fall back > to the previous upload path. Failing imports should come at low cost as > drivers know the limitations and can bail out early, without causing > additional IO or context switches. > > In the future we might be able to request buffers with a matching stride > from v4l2 drivers in many cases, making direct import the norm instead > of a hit-or-miss. An optional kernel API for that exists, but doesn't > seem to be implemented by any driver tested so far. > > While on it, add some minor code adjustments to make it easier to > follow. > > Signed-off-by: Robert Mader <robert.mader@collabora.com> > --- > src/libcamera/software_isp/debayer_egl.cpp | 28 ++++++++++++---------- > src/libcamera/software_isp/debayer_egl.h | 2 +- > 2 files changed, 17 insertions(+), 13 deletions(-) > > diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp > index 8f0c229fd..624469947 100644 > --- a/src/libcamera/software_isp/debayer_egl.cpp > +++ b/src/libcamera/software_isp/debayer_egl.cpp > @@ -495,16 +495,26 @@ void DebayerEGL::setShaderVariableValues(const DebayerParams ¶ms) > return; > } > > -int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms) > +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms) > { > /* eGL context switch */ > egl_.makeCurrent(); > > /* Create a standard texture input */ > - egl_.createTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); > + if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, input->planes()[0].fd.get()) != 0) { > + LOG(Debayer, Debug) << "Importing input buffer with DMABuf import failed, falling back to upload"; > + > + dmaSyncBegin(dmaSyncers, input, nullptr); > + MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); > + if (!in.isValid()) { > + LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; > + return -ENODEV; > + } > + egl_.createTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); > + } > > /* Generate the output render framebuffer as render to texture */ > - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, out_fd); > + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get()); > > setShaderVariableValues(params); > glViewport(0, 0, width_, height_); > @@ -528,21 +538,13 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output > > std::vector<DmaSyncer> dmaSyncers; > > - dmaSyncBegin(dmaSyncers, input, nullptr); > - > /* Copy metadata from the input buffer */ > FrameMetadata &metadata = output->_d()->metadata(); > metadata.status = input->metadata().status; > metadata.sequence = input->metadata().sequence; > metadata.timestamp = input->metadata().timestamp; > > - MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); > - if (!in.isValid()) { > - LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; > - goto error; > - } > - > - if (debayerGPU(in, output->planes()[0].fd.get(), params)) { > + if (debayerGPU(input, output, dmaSyncers, params)) { > LOG(Debayer, Error) << "debayerGPU failed"; > goto error; > } > @@ -552,6 +554,8 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output > metadata.planes()[0].bytesused = output->planes()[0].length; > > /* Calculate stats for the whole frame */ > + if (dmaSyncers.empty() && (frame % SwStatsCpu::kStatPerNumFrames) == 0) Maybe there should be a dedicated method in `SwStatsCpu` that decides when to skip a frame. > + dmaSyncBegin(dmaSyncers, input, nullptr); This means the same fd can be in `dmaSyncers` multiple times. That means there might be SYNC_START -> SYNC_START -> SYNC_END -> SYNC_END. Is this allowed by the kernel? Does it handle it as expected? But in any case, I think there should just be an `std::optional<DmaSyncer> inputPlane0Syncer`, which is populated accordingly in the slow path and (if needed) before statistics. > stats_->processFrame(frame, 0, input); Maybe I'm missing why it wasn't like that in the first place, but I think this function should take either a `const MappedFrameBuffer&` or just the data pointer. It seems entirely unnecessary to map it again there as well. Although that will now have less of an effect with this fast path. > dmaSyncers.clear(); > > diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h > index fcd281f4c..62cb4f7f1 100644 > --- a/src/libcamera/software_isp/debayer_egl.h > +++ b/src/libcamera/software_isp/debayer_egl.h > @@ -66,7 +66,7 @@ private: > int initBayerShaders(PixelFormat inputFormat, PixelFormat outputFormat); > int getShaderVariableLocations(); > void setShaderVariableValues(const DebayerParams ¶ms); > - int debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms); > + int debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms); > > /* Shader program identifiers */ > GLuint vertexShaderId_ = 0;
Thanks for the review! On 04.05.26 11:45, Barnabás Pőcze wrote: >> @@ -552,6 +554,8 @@ void DebayerEGL::process(uint32_t frame, >> FrameBuffer *input, FrameBuffer *output >> metadata.planes()[0].bytesused = output->planes()[0].length; >> /* Calculate stats for the whole frame */ >> + if (dmaSyncers.empty() && (frame % >> SwStatsCpu::kStatPerNumFrames) == 0) > > Maybe there should be a dedicated method in `SwStatsCpu` that decides > when to skip a frame. Agreed, however as this is a stylistic question resulting in more changes in SwStatsCpu I'd would love to have opinions from Hans and Bryan. My suggestion would be: SwStatsCpu::shouldProcessFrame() > > >> + dmaSyncBegin(dmaSyncers, input, nullptr); > > This means the same fd can be in `dmaSyncers` multiple times. That > means there might > be SYNC_START -> SYNC_START -> SYNC_END -> SYNC_END. Is this allowed > by the kernel? > Does it handle it as expected? That's actually avoided by the dmaSyncers.empty() check. Whether the kernel would handle it correctly - I *think* it does, at least for this use-case, however IIUC it would still cause multiple cache flushes on ARM. > > But in any case, I think there should just be an > `std::optional<DmaSyncer> inputPlane0Syncer`, > which is populated accordingly in the slow path and (if needed) before > statistics. Can do, if that makes things clearer / easier to read. > > >> stats_->processFrame(frame, 0, input); > > Maybe I'm missing why it wasn't like that in the first place, but I > think this > function should take either a `const MappedFrameBuffer&` or just the > data pointer. > It seems entirely unnecessary to map it again there as well. Although > that will now > have less of an effect with this fast path. Re-using the map if already present is certainly desirable. I thought about that before, but it requires more invasive changes - and I was hoping to keep the scope of the series small to land it quickly. In fact I'm wondering if we should just pull `DebayerEGL::debayerGPU()` into `DebayerEGL::process()`, which would make that easier to follow. Given that this is about pre-existing behavior, would you mind deferring that discussion to a follow-up, together with other clean-ups (for e.g. egl.cpp)? > >> dmaSyncers.clear(); >> diff --git a/src/libcamera/software_isp/debayer_egl.h >> b/src/libcamera/software_isp/debayer_egl.h >> index fcd281f4c..62cb4f7f1 100644 >> --- a/src/libcamera/software_isp/debayer_egl.h >> +++ b/src/libcamera/software_isp/debayer_egl.h >> @@ -66,7 +66,7 @@ private: >> int initBayerShaders(PixelFormat inputFormat, PixelFormat >> outputFormat); >> int getShaderVariableLocations(); >> void setShaderVariableValues(const DebayerParams ¶ms); >> - int debayerGPU(MappedFrameBuffer &in, int out_fd, const >> DebayerParams ¶ms); >> + int debayerGPU(FrameBuffer *input, FrameBuffer *output, >> std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms); >> /* Shader program identifiers */ >> GLuint vertexShaderId_ = 0; >
2026. 05. 04. 12:15 keltezéssel, Robert Mader írta: > Thanks for the review! > > On 04.05.26 11:45, Barnabás Pőcze wrote: >>> @@ -552,6 +554,8 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output >>> metadata.planes()[0].bytesused = output->planes()[0].length; >>> /* Calculate stats for the whole frame */ >>> + if (dmaSyncers.empty() && (frame % SwStatsCpu::kStatPerNumFrames) == 0) >> >> Maybe there should be a dedicated method in `SwStatsCpu` that decides when to skip a frame. > Agreed, however as this is a stylistic question resulting in more changes in SwStatsCpu I'd would love to have opinions from Hans and Bryan. My suggestion would be: SwStatsCpu::shouldProcessFrame() >> >> >>> + dmaSyncBegin(dmaSyncers, input, nullptr); >> >> This means the same fd can be in `dmaSyncers` multiple times. That means there might >> be SYNC_START -> SYNC_START -> SYNC_END -> SYNC_END. Is this allowed by the kernel? >> Does it handle it as expected? > > That's actually avoided by the dmaSyncers.empty() check. Ahh, you're right, and I am blind. > > Whether the kernel would handle it correctly - I *think* it does, at least for this use-case, however IIUC it would still cause multiple cache flushes on ARM. > >> >> But in any case, I think there should just be an `std::optional<DmaSyncer> inputPlane0Syncer`, >> which is populated accordingly in the slow path and (if needed) before statistics. > Can do, if that makes things clearer / easier to read. >> >> >>> stats_->processFrame(frame, 0, input); >> >> Maybe I'm missing why it wasn't like that in the first place, but I think this >> function should take either a `const MappedFrameBuffer&` or just the data pointer. >> It seems entirely unnecessary to map it again there as well. Although that will now >> have less of an effect with this fast path. > > Re-using the map if already present is certainly desirable. I thought about that before, but it requires more invasive changes - and I was hoping to keep the scope of the series small to land it quickly. > > In fact I'm wondering if we should just pull `DebayerEGL::debayerGPU()` into `DebayerEGL::process()`, which would make that easier to follow. I agree. > Given that this is about pre-existing behavior, would you mind deferring that discussion to a follow-up, together with other clean-ups (for e.g. egl.cpp)? OK > >> >>> dmaSyncers.clear(); >>> diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h >>> index fcd281f4c..62cb4f7f1 100644 >>> --- a/src/libcamera/software_isp/debayer_egl.h >>> +++ b/src/libcamera/software_isp/debayer_egl.h >>> @@ -66,7 +66,7 @@ private: >>> int initBayerShaders(PixelFormat inputFormat, PixelFormat outputFormat); >>> int getShaderVariableLocations(); >>> void setShaderVariableValues(const DebayerParams ¶ms); >>> - int debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms); >>> + int debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms); >>> /* Shader program identifiers */ >>> GLuint vertexShaderId_ = 0; >>
2026. 05. 03. 13:40 keltezéssel, Robert Mader írta: > In many cases we can import the GPU-ISP input buffers, dmabufs from v4l2, > directly into EGL instead of mapping and uploading - i.e. copying - them. > > Doing so can have positive effects in multiple areas, including reducing > memory bandwidth and CPU usage, as well as avoiding expensive dmabuf syncs > and syscalls. > > The main reason direct imports may not work are the more demanding stride > alignment requirements many GPUs have - often 128 or 256 bytes - compared > to ISPs - apparently often closer to 32 bytes. > > Thus we first try to import buffers directly and - if that fails - fall back > to the previous upload path. Failing imports should come at low cost as > drivers know the limitations and can bail out early, without causing > additional IO or context switches. > > In the future we might be able to request buffers with a matching stride > from v4l2 drivers in many cases, making direct import the norm instead > of a hit-or-miss. An optional kernel API for that exists, but doesn't > seem to be implemented by any driver tested so far. > > While on it, add some minor code adjustments to make it easier to > follow. > > Signed-off-by: Robert Mader <robert.mader@collabora.com> > --- > src/libcamera/software_isp/debayer_egl.cpp | 28 ++++++++++++---------- > src/libcamera/software_isp/debayer_egl.h | 2 +- > 2 files changed, 17 insertions(+), 13 deletions(-) > > diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp > index 8f0c229fd..624469947 100644 > --- a/src/libcamera/software_isp/debayer_egl.cpp > +++ b/src/libcamera/software_isp/debayer_egl.cpp > @@ -495,16 +495,26 @@ void DebayerEGL::setShaderVariableValues(const DebayerParams ¶ms) > return; > } > > -int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms) > +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms) > { > /* eGL context switch */ > egl_.makeCurrent(); > > /* Create a standard texture input */ > - egl_.createTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); > + if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, input->planes()[0].fd.get()) != 0) { I'm looking at the first patch, but seeing how it is used makes me wonder: why not construct `eglImageBayerIn_` with a width of `inputConfig_.stride / bytesPerPixel_` in the first place? And use width/height in all `create*Texture2D()` functions? Only `createDMABufTexture2D()` uses the width/height members, which is not used for `eglImageBayerIn_` before this change, so I think it should be fine? > + LOG(Debayer, Debug) << "Importing input buffer with DMABuf import failed, falling back to upload"; > + > + dmaSyncBegin(dmaSyncers, input, nullptr); > + MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); > + if (!in.isValid()) { > + LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; > + return -ENODEV; > + } > + egl_.createTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); > + } > > /* Generate the output render framebuffer as render to texture */ > - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, out_fd); > + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get()); > > setShaderVariableValues(params); > glViewport(0, 0, width_, height_); > @@ -528,21 +538,13 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output > > std::vector<DmaSyncer> dmaSyncers; > > - dmaSyncBegin(dmaSyncers, input, nullptr); > - > /* Copy metadata from the input buffer */ > FrameMetadata &metadata = output->_d()->metadata(); > metadata.status = input->metadata().status; > metadata.sequence = input->metadata().sequence; > metadata.timestamp = input->metadata().timestamp; > > - MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); > - if (!in.isValid()) { > - LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; > - goto error; > - } > - > - if (debayerGPU(in, output->planes()[0].fd.get(), params)) { > + if (debayerGPU(input, output, dmaSyncers, params)) { > LOG(Debayer, Error) << "debayerGPU failed"; > goto error; > } > @@ -552,6 +554,8 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output > metadata.planes()[0].bytesused = output->planes()[0].length; > > /* Calculate stats for the whole frame */ > + if (dmaSyncers.empty() && (frame % SwStatsCpu::kStatPerNumFrames) == 0) > + dmaSyncBegin(dmaSyncers, input, nullptr); > stats_->processFrame(frame, 0, input); > dmaSyncers.clear(); > > diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h > index fcd281f4c..62cb4f7f1 100644 > --- a/src/libcamera/software_isp/debayer_egl.h > +++ b/src/libcamera/software_isp/debayer_egl.h > @@ -66,7 +66,7 @@ private: > int initBayerShaders(PixelFormat inputFormat, PixelFormat outputFormat); > int getShaderVariableLocations(); > void setShaderVariableValues(const DebayerParams ¶ms); > - int debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms); > + int debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms); > > /* Shader program identifiers */ > GLuint vertexShaderId_ = 0;
On 04.05.26 12:49, Barnabás Pőcze wrote: > 2026. 05. 03. 13:40 keltezéssel, Robert Mader írta: >> In many cases we can import the GPU-ISP input buffers, dmabufs from >> v4l2, >> directly into EGL instead of mapping and uploading - i.e. copying - >> them. >> >> Doing so can have positive effects in multiple areas, including reducing >> memory bandwidth and CPU usage, as well as avoiding expensive dmabuf >> syncs >> and syscalls. >> >> The main reason direct imports may not work are the more demanding >> stride >> alignment requirements many GPUs have - often 128 or 256 bytes - >> compared >> to ISPs - apparently often closer to 32 bytes. >> >> Thus we first try to import buffers directly and - if that fails - >> fall back >> to the previous upload path. Failing imports should come at low cost as >> drivers know the limitations and can bail out early, without causing >> additional IO or context switches. >> >> In the future we might be able to request buffers with a matching stride >> from v4l2 drivers in many cases, making direct import the norm instead >> of a hit-or-miss. An optional kernel API for that exists, but doesn't >> seem to be implemented by any driver tested so far. >> >> While on it, add some minor code adjustments to make it easier to >> follow. >> >> Signed-off-by: Robert Mader <robert.mader@collabora.com> >> --- >> src/libcamera/software_isp/debayer_egl.cpp | 28 ++++++++++++---------- >> src/libcamera/software_isp/debayer_egl.h | 2 +- >> 2 files changed, 17 insertions(+), 13 deletions(-) >> >> diff --git a/src/libcamera/software_isp/debayer_egl.cpp >> b/src/libcamera/software_isp/debayer_egl.cpp >> index 8f0c229fd..624469947 100644 >> --- a/src/libcamera/software_isp/debayer_egl.cpp >> +++ b/src/libcamera/software_isp/debayer_egl.cpp >> @@ -495,16 +495,26 @@ void DebayerEGL::setShaderVariableValues(const >> DebayerParams ¶ms) >> return; >> } >> -int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, >> const DebayerParams ¶ms) >> +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, >> std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms) >> { >> /* eGL context switch */ >> egl_.makeCurrent(); >> /* Create a standard texture input */ >> - egl_.createTexture2D(*eglImageBayerIn_, glFormat_, >> inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); >> + if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, >> glFormat_, inputConfig_.stride / bytesPerPixel_, height_, >> input->planes()[0].fd.get()) != 0) { > > I'm looking at the first patch, but seeing how it is used makes me > wonder: > why not construct `eglImageBayerIn_` with a width of > `inputConfig_.stride / bytesPerPixel_` > in the first place? And use width/height in all `create*Texture2D()` > functions? > > Only `createDMABufTexture2D()` uses the width/height members, which is > not used for > `eglImageBayerIn_` before this change, so I think it should be fine? True, we could do that by calling `initBayerShaders()` before initializing eglImageBayerIn_ and eglImageBayerOut_ - which would also allow us to simplify createTexture2D() accordingly. Can try that for v2. >> + LOG(Debayer, Debug) << "Importing input buffer with DMABuf >> import failed, falling back to upload"; >> + >> + dmaSyncBegin(dmaSyncers, input, nullptr); >> + MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); >> + if (!in.isValid()) { >> + LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; >> + return -ENODEV; >> + } >> + egl_.createTexture2D(*eglImageBayerIn_, glFormat_, >> inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); >> + } >> /* Generate the output render framebuffer as render to >> texture */ >> - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, out_fd); >> + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, >> output->planes()[0].fd.get()); >> setShaderVariableValues(params); >> glViewport(0, 0, width_, height_); >> @@ -528,21 +538,13 @@ void DebayerEGL::process(uint32_t frame, >> FrameBuffer *input, FrameBuffer *output >> std::vector<DmaSyncer> dmaSyncers; >> - dmaSyncBegin(dmaSyncers, input, nullptr); >> - >> /* Copy metadata from the input buffer */ >> FrameMetadata &metadata = output->_d()->metadata(); >> metadata.status = input->metadata().status; >> metadata.sequence = input->metadata().sequence; >> metadata.timestamp = input->metadata().timestamp; >> - MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); >> - if (!in.isValid()) { >> - LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; >> - goto error; >> - } >> - >> - if (debayerGPU(in, output->planes()[0].fd.get(), params)) { >> + if (debayerGPU(input, output, dmaSyncers, params)) { >> LOG(Debayer, Error) << "debayerGPU failed"; >> goto error; >> } >> @@ -552,6 +554,8 @@ void DebayerEGL::process(uint32_t frame, >> FrameBuffer *input, FrameBuffer *output >> metadata.planes()[0].bytesused = output->planes()[0].length; >> /* Calculate stats for the whole frame */ >> + if (dmaSyncers.empty() && (frame % >> SwStatsCpu::kStatPerNumFrames) == 0) >> + dmaSyncBegin(dmaSyncers, input, nullptr); >> stats_->processFrame(frame, 0, input); >> dmaSyncers.clear(); >> diff --git a/src/libcamera/software_isp/debayer_egl.h >> b/src/libcamera/software_isp/debayer_egl.h >> index fcd281f4c..62cb4f7f1 100644 >> --- a/src/libcamera/software_isp/debayer_egl.h >> +++ b/src/libcamera/software_isp/debayer_egl.h >> @@ -66,7 +66,7 @@ private: >> int initBayerShaders(PixelFormat inputFormat, PixelFormat >> outputFormat); >> int getShaderVariableLocations(); >> void setShaderVariableValues(const DebayerParams ¶ms); >> - int debayerGPU(MappedFrameBuffer &in, int out_fd, const >> DebayerParams ¶ms); >> + int debayerGPU(FrameBuffer *input, FrameBuffer *output, >> std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms); >> /* Shader program identifiers */ >> GLuint vertexShaderId_ = 0; >
diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp index 8f0c229fd..624469947 100644 --- a/src/libcamera/software_isp/debayer_egl.cpp +++ b/src/libcamera/software_isp/debayer_egl.cpp @@ -495,16 +495,26 @@ void DebayerEGL::setShaderVariableValues(const DebayerParams ¶ms) return; } -int DebayerEGL::debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms) +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms) { /* eGL context switch */ egl_.makeCurrent(); /* Create a standard texture input */ - egl_.createTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); + if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, input->planes()[0].fd.get()) != 0) { + LOG(Debayer, Debug) << "Importing input buffer with DMABuf import failed, falling back to upload"; + + dmaSyncBegin(dmaSyncers, input, nullptr); + MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); + if (!in.isValid()) { + LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; + return -ENODEV; + } + egl_.createTexture2D(*eglImageBayerIn_, glFormat_, inputConfig_.stride / bytesPerPixel_, height_, in.planes()[0].data()); + } /* Generate the output render framebuffer as render to texture */ - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, out_fd); + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get()); setShaderVariableValues(params); glViewport(0, 0, width_, height_); @@ -528,21 +538,13 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output std::vector<DmaSyncer> dmaSyncers; - dmaSyncBegin(dmaSyncers, input, nullptr); - /* Copy metadata from the input buffer */ FrameMetadata &metadata = output->_d()->metadata(); metadata.status = input->metadata().status; metadata.sequence = input->metadata().sequence; metadata.timestamp = input->metadata().timestamp; - MappedFrameBuffer in(input, MappedFrameBuffer::MapFlag::Read); - if (!in.isValid()) { - LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; - goto error; - } - - if (debayerGPU(in, output->planes()[0].fd.get(), params)) { + if (debayerGPU(input, output, dmaSyncers, params)) { LOG(Debayer, Error) << "debayerGPU failed"; goto error; } @@ -552,6 +554,8 @@ void DebayerEGL::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output metadata.planes()[0].bytesused = output->planes()[0].length; /* Calculate stats for the whole frame */ + if (dmaSyncers.empty() && (frame % SwStatsCpu::kStatPerNumFrames) == 0) + dmaSyncBegin(dmaSyncers, input, nullptr); stats_->processFrame(frame, 0, input); dmaSyncers.clear(); diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h index fcd281f4c..62cb4f7f1 100644 --- a/src/libcamera/software_isp/debayer_egl.h +++ b/src/libcamera/software_isp/debayer_egl.h @@ -66,7 +66,7 @@ private: int initBayerShaders(PixelFormat inputFormat, PixelFormat outputFormat); int getShaderVariableLocations(); void setShaderVariableValues(const DebayerParams ¶ms); - int debayerGPU(MappedFrameBuffer &in, int out_fd, const DebayerParams ¶ms); + int debayerGPU(FrameBuffer *input, FrameBuffer *output, std::vector<DmaSyncer> &dmaSyncers, const DebayerParams ¶ms); /* Shader program identifiers */ GLuint vertexShaderId_ = 0;
In many cases we can import the GPU-ISP input buffers, dmabufs from v4l2, directly into EGL instead of mapping and uploading - i.e. copying - them. Doing so can have positive effects in multiple areas, including reducing memory bandwidth and CPU usage, as well as avoiding expensive dmabuf syncs and syscalls. The main reason direct imports may not work are the more demanding stride alignment requirements many GPUs have - often 128 or 256 bytes - compared to ISPs - apparently often closer to 32 bytes. Thus we first try to import buffers directly and - if that fails - fall back to the previous upload path. Failing imports should come at low cost as drivers know the limitations and can bail out early, without causing additional IO or context switches. In the future we might be able to request buffers with a matching stride from v4l2 drivers in many cases, making direct import the norm instead of a hit-or-miss. An optional kernel API for that exists, but doesn't seem to be implemented by any driver tested so far. While on it, add some minor code adjustments to make it easier to follow. Signed-off-by: Robert Mader <robert.mader@collabora.com> --- src/libcamera/software_isp/debayer_egl.cpp | 28 ++++++++++++---------- src/libcamera/software_isp/debayer_egl.h | 2 +- 2 files changed, 17 insertions(+), 13 deletions(-)