| Message ID | 20260624085849.873784-11-bryan.odonoghue@linaro.org |
|---|---|
| State | New |
| Headers | show |
| Series |
|
| Related | show |
Thanks a lot! I tested this on a FP5 and a Pixel 3a and can not only confirm the improvements on the timings but also observed lower CPU usage. In particular in combination with dmabuf imports the CPU usage seems to essentially collapse when eyeballing htop, see below. So for the whole series I can already give: Tested-by: Robert Mader <robert.mader@collabora.com> Will now go over the individual commits. --- FairPhone 5 cam -c 1 -s width=1920,height=1080 --capture=60 INFO SoftwareIsp software_isp.cpp:299 Input 2048x1536-RGGB-10-CSI2P stride 2560 dmabuf import succeeds before: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 365691us, 12189 us/frame ~39% CPU time in Wireplumber/Pipewire when running Snapshot after: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 204924us, 6830 us/frame ~13% CPU time in Wireplumber/Pipewire when running Snapshot - cam -c 2 -s width=1920,height=1080 --capture=60 INFO SoftwareIsp software_isp.cpp:299 Input 4080x3072-GRBG-10-CSI2P stride 5104 dmabuf import fails before: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 687332us, 22911 us/frame ~63% CPU time in Wireplumber/Pipewire when running Snapshot after: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 579945us, 19331 us/frame ~53% CPU time in Wireplumber/Pipewire when running Snapshot --- Pixel 3a cam -c 1 -s width=1920,height=1080 --capture=60 INFO SoftwareIsp software_isp.cpp:299 Input 1936x1096-RGGB-10-CSI2P stride 2432 dmabuf import succeeds before: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 489368us, 16312 us/frame ~59% CPU time in Wireplumber/Pipewire when running Snapshot after: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 353912us, 11797 us/frame ~17% CPU time in Wireplumber/Pipewire when running Snapshot - cam -c 2 -s width=1920,height=1080 --capture=60 INFO SoftwareIsp software_isp.cpp:299 Input 4032x3024-RGGB-10-CSI2P stride 5040 dmabuf import fails before: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 916702us, 30556 us/frame ~57% CPU time in Wireplumber/Pipewire when running Snapshot after: INFO Benchmark benchmark.cpp:89 Debayer processed 30 frames in 771620us, 25720 us/frame ~47% CPU time in Wireplumber/Pipewire when running Snapshot On 24.06.26 10:58, Bryan O'Donoghue wrote: > Implement a texture caching mechanism for both input and output frames and > for both types of input frame. > > The before/after on a Qualcomm x1e is: > > 9.737ms per frame > 5.691ms per frame > > The before/after on a Qualcomm sm8250 is: > > 21.710ms per frame > 17.336ms per frame > > for i in {1..20} do > cam -c /base/soc@0/cci@ac16000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60 > > Interestingly there appears to be an absolute ~ 4.x ms per frame uplift as > opposed to what intuition might suggest a proportional. > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> > --- > src/libcamera/software_isp/debayer_egl.cpp | 108 +++++++++++++++++---- > src/libcamera/software_isp/debayer_egl.h | 12 ++- > 2 files changed, 100 insertions(+), 20 deletions(-) > > diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp > index 0568c413b..8ac5cb76f 100644 > --- a/src/libcamera/software_isp/debayer_egl.cpp > +++ b/src/libcamera/software_isp/debayer_egl.cpp > @@ -355,6 +355,12 @@ int DebayerEGL::configure(const StreamConfiguration &inputCfg, > */ > stats_->setWindow(Rectangle(window_.size())); > > + inputBufferCache_ = std::make_unique<V4L2BufferCache>(inputCfg.bufferCount); > + outputBufferCache_ = std::make_unique<V4L2BufferCache>(outputCfg.bufferCount); > + > + eglImageBayerIn_.resize(inputCfg.bufferCount); > + eglImageBayerOut_.resize(outputCfg.bufferCount); > + > return 0; > } > > @@ -514,34 +520,106 @@ void DebayerEGL::setShaderVariableValues(eGLImage &eglImageIn, const DebayerPara > return; > } > > -int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams ¶ms, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) > +int DebayerEGL::getBufferCache(V4L2BufferCache &cache, FrameBuffer *framebuffer, bool &cache_hit) > { > - /* eGL context switch */ > - egl_.makeCurrent(); > + int cache_idx; > + > + cache_idx = cache.get(*framebuffer, cache_hit); > + if (cache_idx < 0) { > + LOG(Debayer, Error) << "buffer exceeds configured cache size"; > + return -ENODEV; > + } > + cache.put(cache_idx); > + > + return cache_idx; > +} > + > +eGLImage *DebayerEGL::getCachedInputFrameBuffer(FrameBuffer *input, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) > +{ > + eGLImage *eglImageIn; > + bool cache_hit; > + int cache_idx; > + > + cache_idx = getBufferCache(*inputBufferCache_, input, cache_hit); > + if (cache_idx < 0) > + return nullptr; > + > + if (!cache_hit) { > + eglImageBayerIn_[cache_idx] = std::make_unique<eGLImage>(glFormat_, inputConfig_.stride / bytesPerPixel_, > + height_, inputConfig_.stride, GL_TEXTURE0, 0); > + } > + > + eglImageIn = eglImageBayerIn_[cache_idx].get(); > > /* Try to create texture for input buffer via dmabuf import */ > - if (use_dmabuf_) { > - if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, input->planes()[0].fd.get()) != 0) { > + if (use_dmabuf_ && !cache_hit) { > + if (egl_.createInputDMABufTexture2D(*eglImageIn, input->planes()[0].fd.get()) != 0) { > use_dmabuf_ = false; > LOG(Debayer, Info) << "Importing input buffer with DMABuf import failed, falling back to upload"; > } > } > > + /* Cache hit using dmabuf activate and bind */ > + if (use_dmabuf_ && cache_hit) { > + egl_.activateBindTexture(*eglImageIn); > + } > + > /* Otherwise create texture for input buffer via upload from CPU */ > if (!use_dmabuf_) { > inDmaSyncer->emplace(input->planes()[0].fd, DmaSyncer::SyncType::Read); > inMapped->emplace(input, MappedFrameBuffer::MapFlag::Read); > if (!inMapped->value().isValid()) { > LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; > - return -ENODEV; > + return nullptr; > } > - egl_.createInputTexture2D(*eglImageBayerIn_, inMapped->value().planes()[0].data()); > + if (cache_hit) > + egl_.updateInputTexture2D(*eglImageIn, inMapped->value().planes()[0].data()); > + else > + egl_.createInputTexture2D(*eglImageIn, inMapped->value().planes()[0].data()); > } > > - /* Generate the output render framebuffer as render to texture */ > - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get()); > + return eglImageIn; > +} > + > +eGLImage *DebayerEGL::getCachedOutputFrameBuffer(FrameBuffer *output) > +{ > + eGLImage *eglImageOut; > + bool cache_hit; > + int cache_idx; > + > + cache_idx = getBufferCache(*outputBufferCache_, output, cache_hit); > + if (cache_idx < 0) > + return nullptr; > + > + if (!cache_hit) { > + eglImageBayerOut_[cache_idx] = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, > + outputSize_.height, outputConfig_.stride, GL_TEXTURE1, 1); > + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_[cache_idx], output->planes()[0].fd.get()); > + } > + eglImageOut = eglImageBayerOut_[cache_idx].get(); > + > + return eglImageOut; > +} > + > +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams ¶ms, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) > +{ > + eGLImage *eglImageIn; > + eGLImage *eglImageOut; > + > + /* eGL context switch */ > + egl_.makeCurrent(); > + > + eglImageIn = getCachedInputFrameBuffer(input, inMapped, inDmaSyncer); > + if (!eglImageIn) > + return -ENOMEM; > + > + eglImageOut = getCachedOutputFrameBuffer(output); > + if (!eglImageOut) > + return -ENOMEM; > + > + egl_.attachTextureToFBO(*eglImageOut); > + setShaderVariableValues(*eglImageIn, params); > > - setShaderVariableValues(*eglImageBayerIn_, params); > glViewport(0, 0, width_, height_); > glClear(GL_COLOR_BUFFER_BIT); > glDrawArrays(GL_TRIANGLE_FAN, 0, DEBAYER_OPENGL_COORDS); > @@ -623,19 +701,13 @@ int DebayerEGL::start() > if (initBayerShaders(inputPixelFormat_, outputPixelFormat_)) > return -EINVAL; > > - /* Raw bayer input as texture */ > - eglImageBayerIn_ = std::make_unique<eGLImage>(glFormat_, inputConfig_.stride / bytesPerPixel_, height_, inputConfig_.stride, GL_TEXTURE0, 0); > - > - /* Texture we will render to */ > - eglImageBayerOut_ = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE1, 1); > - > return 0; > } > > void DebayerEGL::stop() > { > - eglImageBayerOut_.reset(); > - eglImageBayerIn_.reset(); > + eglImageBayerOut_.clear(); > + eglImageBayerIn_.clear(); > > if (programId_) > glDeleteProgram(programId_); > diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h > index d8509e9f2..238fe7345 100644 > --- a/src/libcamera/software_isp/debayer_egl.h > +++ b/src/libcamera/software_isp/debayer_egl.h > @@ -22,6 +22,7 @@ > #include "libcamera/internal/mapped_framebuffer.h" > #include "libcamera/internal/software_isp/benchmark.h" > #include "libcamera/internal/software_isp/swstats_cpu.h" > +#include "libcamera/internal/v4l2_videodevice.h" > > #include <EGL/egl.h> > #include <EGL/eglext.h> > @@ -70,14 +71,21 @@ private: > > bool use_dmabuf_; > > + int getBufferCache(V4L2BufferCache &buffercache, FrameBuffer *framebuffer, bool &hit); > + eGLImage *getCachedInputFrameBuffer(FrameBuffer *input, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer); > + eGLImage *getCachedOutputFrameBuffer(FrameBuffer *output); > + > + std::unique_ptr<V4L2BufferCache> inputBufferCache_; > + std::unique_ptr<V4L2BufferCache> outputBufferCache_; > + > /* Shader program identifiers */ > GLuint vertexShaderId_ = 0; > GLuint fragmentShaderId_ = 0; > GLuint programId_ = 0; > > /* Pointer to object representing input texture */ > - std::unique_ptr<eGLImage> eglImageBayerIn_; > - std::unique_ptr<eGLImage> eglImageBayerOut_; > + std::vector<std::unique_ptr<eGLImage>> eglImageBayerIn_; > + std::vector<std::unique_ptr<eGLImage>> eglImageBayerOut_; > > /* Shader parameters */ > float firstRed_x_;
On 24.06.26 10:58, Bryan O'Donoghue wrote: > Implement a texture caching mechanism for both input and output frames and > for both types of input frame. > > The before/after on a Qualcomm x1e is: > > 9.737ms per frame > 5.691ms per frame > > The before/after on a Qualcomm sm8250 is: > > 21.710ms per frame > 17.336ms per frame > > for i in {1..20} do > cam -c /base/soc@0/cci@ac16000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60 > > Interestingly there appears to be an absolute ~ 4.x ms per frame uplift as > opposed to what intuition might suggest a proportional. > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> > --- > src/libcamera/software_isp/debayer_egl.cpp | 108 +++++++++++++++++---- > src/libcamera/software_isp/debayer_egl.h | 12 ++- > 2 files changed, 100 insertions(+), 20 deletions(-) > > diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp > index 0568c413b..8ac5cb76f 100644 > --- a/src/libcamera/software_isp/debayer_egl.cpp > +++ b/src/libcamera/software_isp/debayer_egl.cpp > @@ -355,6 +355,12 @@ int DebayerEGL::configure(const StreamConfiguration &inputCfg, > */ > stats_->setWindow(Rectangle(window_.size())); > > + inputBufferCache_ = std::make_unique<V4L2BufferCache>(inputCfg.bufferCount); > + outputBufferCache_ = std::make_unique<V4L2BufferCache>(outputCfg.bufferCount); > + > + eglImageBayerIn_.resize(inputCfg.bufferCount); > + eglImageBayerOut_.resize(outputCfg.bufferCount); > + > return 0; > } > > @@ -514,34 +520,106 @@ void DebayerEGL::setShaderVariableValues(eGLImage &eglImageIn, const DebayerPara > return; > } > > -int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams ¶ms, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) > +int DebayerEGL::getBufferCache(V4L2BufferCache &cache, FrameBuffer *framebuffer, bool &cache_hit) Maybe something like lookupFramebufferFromBufferCache()? getBufferCache() sounds like it would return a V4L2BufferCache. > { > - /* eGL context switch */ > - egl_.makeCurrent(); > + int cache_idx; > + > + cache_idx = cache.get(*framebuffer, cache_hit); > + if (cache_idx < 0) { > + LOG(Debayer, Error) << "buffer exceeds configured cache size"; > + return -ENODEV; > + } > + cache.put(cache_idx); > + > + return cache_idx; > +} > + > +eGLImage *DebayerEGL::getCachedInputFrameBuffer(FrameBuffer *input, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) > +{ > + eGLImage *eglImageIn; > + bool cache_hit; > + int cache_idx; > + > + cache_idx = getBufferCache(*inputBufferCache_, input, cache_hit); > + if (cache_idx < 0) > + return nullptr; > + > + if (!cache_hit) { > + eglImageBayerIn_[cache_idx] = std::make_unique<eGLImage>(glFormat_, inputConfig_.stride / bytesPerPixel_, > + height_, inputConfig_.stride, GL_TEXTURE0, 0); > + } > + > + eglImageIn = eglImageBayerIn_[cache_idx].get(); > > /* Try to create texture for input buffer via dmabuf import */ > - if (use_dmabuf_) { > - if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, input->planes()[0].fd.get()) != 0) { > + if (use_dmabuf_ && !cache_hit) { > + if (egl_.createInputDMABufTexture2D(*eglImageIn, input->planes()[0].fd.get()) != 0) { > use_dmabuf_ = false; > LOG(Debayer, Info) << "Importing input buffer with DMABuf import failed, falling back to upload"; > } > } > > + /* Cache hit using dmabuf activate and bind */ > + if (use_dmabuf_ && cache_hit) { > + egl_.activateBindTexture(*eglImageIn); > + } > + > /* Otherwise create texture for input buffer via upload from CPU */ > if (!use_dmabuf_) { > inDmaSyncer->emplace(input->planes()[0].fd, DmaSyncer::SyncType::Read); > inMapped->emplace(input, MappedFrameBuffer::MapFlag::Read); > if (!inMapped->value().isValid()) { > LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; > - return -ENODEV; > + return nullptr; > } > - egl_.createInputTexture2D(*eglImageBayerIn_, inMapped->value().planes()[0].data()); > + if (cache_hit) > + egl_.updateInputTexture2D(*eglImageIn, inMapped->value().planes()[0].data()); > + else > + egl_.createInputTexture2D(*eglImageIn, inMapped->value().planes()[0].data()); > } > > - /* Generate the output render framebuffer as render to texture */ > - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get()); > + return eglImageIn; > +} > + > +eGLImage *DebayerEGL::getCachedOutputFrameBuffer(FrameBuffer *output) > +{ > + eGLImage *eglImageOut; > + bool cache_hit; > + int cache_idx; > + > + cache_idx = getBufferCache(*outputBufferCache_, output, cache_hit); > + if (cache_idx < 0) > + return nullptr; > + > + if (!cache_hit) { > + eglImageBayerOut_[cache_idx] = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, > + outputSize_.height, outputConfig_.stride, GL_TEXTURE1, 1); > + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_[cache_idx], output->planes()[0].fd.get()); > + } Don't we need to call egl_.activateBindTexture(*eglImageOut); in the else-case here? It's called in createDMABufTexture2D(). It seems to work for me either way, but I'm confused why. Is it somehow implicit via the attachTextureToFBO() below? > + eglImageOut = eglImageBayerOut_[cache_idx].get(); > + > + return eglImageOut; > +} > + > +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams ¶ms, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) > +{ > + eGLImage *eglImageIn; > + eGLImage *eglImageOut; > + > + /* eGL context switch */ > + egl_.makeCurrent(); > + > + eglImageIn = getCachedInputFrameBuffer(input, inMapped, inDmaSyncer); > + if (!eglImageIn) > + return -ENOMEM; > + > + eglImageOut = getCachedOutputFrameBuffer(output); > + if (!eglImageOut) > + return -ENOMEM; > + > + egl_.attachTextureToFBO(*eglImageOut); > + setShaderVariableValues(*eglImageIn, params); > > - setShaderVariableValues(*eglImageBayerIn_, params); > glViewport(0, 0, width_, height_); > glClear(GL_COLOR_BUFFER_BIT); > glDrawArrays(GL_TRIANGLE_FAN, 0, DEBAYER_OPENGL_COORDS); > @@ -623,19 +701,13 @@ int DebayerEGL::start() > if (initBayerShaders(inputPixelFormat_, outputPixelFormat_)) > return -EINVAL; > > - /* Raw bayer input as texture */ > - eglImageBayerIn_ = std::make_unique<eGLImage>(glFormat_, inputConfig_.stride / bytesPerPixel_, height_, inputConfig_.stride, GL_TEXTURE0, 0); > - > - /* Texture we will render to */ > - eglImageBayerOut_ = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE1, 1); > - > return 0; > } > > void DebayerEGL::stop() > { > - eglImageBayerOut_.reset(); > - eglImageBayerIn_.reset(); > + eglImageBayerOut_.clear(); > + eglImageBayerIn_.clear(); > > if (programId_) > glDeleteProgram(programId_); > diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h > index d8509e9f2..238fe7345 100644 > --- a/src/libcamera/software_isp/debayer_egl.h > +++ b/src/libcamera/software_isp/debayer_egl.h > @@ -22,6 +22,7 @@ > #include "libcamera/internal/mapped_framebuffer.h" > #include "libcamera/internal/software_isp/benchmark.h" > #include "libcamera/internal/software_isp/swstats_cpu.h" > +#include "libcamera/internal/v4l2_videodevice.h" > > #include <EGL/egl.h> > #include <EGL/eglext.h> > @@ -70,14 +71,21 @@ private: > > bool use_dmabuf_; > > + int getBufferCache(V4L2BufferCache &buffercache, FrameBuffer *framebuffer, bool &hit); > + eGLImage *getCachedInputFrameBuffer(FrameBuffer *input, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer); > + eGLImage *getCachedOutputFrameBuffer(FrameBuffer *output); > + > + std::unique_ptr<V4L2BufferCache> inputBufferCache_; > + std::unique_ptr<V4L2BufferCache> outputBufferCache_; > + > /* Shader program identifiers */ > GLuint vertexShaderId_ = 0; > GLuint fragmentShaderId_ = 0; > GLuint programId_ = 0; > > /* Pointer to object representing input texture */ > - std::unique_ptr<eGLImage> eglImageBayerIn_; > - std::unique_ptr<eGLImage> eglImageBayerOut_; > + std::vector<std::unique_ptr<eGLImage>> eglImageBayerIn_; > + std::vector<std::unique_ptr<eGLImage>> eglImageBayerOut_; > > /* Shader parameters */ > float firstRed_x_; Two small comments, otherwise LGTM - and works great! Reviewed-by: Robert Mader <robert.mader@collabora.com>
diff --git a/src/libcamera/software_isp/debayer_egl.cpp b/src/libcamera/software_isp/debayer_egl.cpp index 0568c413b..8ac5cb76f 100644 --- a/src/libcamera/software_isp/debayer_egl.cpp +++ b/src/libcamera/software_isp/debayer_egl.cpp @@ -355,6 +355,12 @@ int DebayerEGL::configure(const StreamConfiguration &inputCfg, */ stats_->setWindow(Rectangle(window_.size())); + inputBufferCache_ = std::make_unique<V4L2BufferCache>(inputCfg.bufferCount); + outputBufferCache_ = std::make_unique<V4L2BufferCache>(outputCfg.bufferCount); + + eglImageBayerIn_.resize(inputCfg.bufferCount); + eglImageBayerOut_.resize(outputCfg.bufferCount); + return 0; } @@ -514,34 +520,106 @@ void DebayerEGL::setShaderVariableValues(eGLImage &eglImageIn, const DebayerPara return; } -int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams ¶ms, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) +int DebayerEGL::getBufferCache(V4L2BufferCache &cache, FrameBuffer *framebuffer, bool &cache_hit) { - /* eGL context switch */ - egl_.makeCurrent(); + int cache_idx; + + cache_idx = cache.get(*framebuffer, cache_hit); + if (cache_idx < 0) { + LOG(Debayer, Error) << "buffer exceeds configured cache size"; + return -ENODEV; + } + cache.put(cache_idx); + + return cache_idx; +} + +eGLImage *DebayerEGL::getCachedInputFrameBuffer(FrameBuffer *input, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) +{ + eGLImage *eglImageIn; + bool cache_hit; + int cache_idx; + + cache_idx = getBufferCache(*inputBufferCache_, input, cache_hit); + if (cache_idx < 0) + return nullptr; + + if (!cache_hit) { + eglImageBayerIn_[cache_idx] = std::make_unique<eGLImage>(glFormat_, inputConfig_.stride / bytesPerPixel_, + height_, inputConfig_.stride, GL_TEXTURE0, 0); + } + + eglImageIn = eglImageBayerIn_[cache_idx].get(); /* Try to create texture for input buffer via dmabuf import */ - if (use_dmabuf_) { - if (egl_.createInputDMABufTexture2D(*eglImageBayerIn_, input->planes()[0].fd.get()) != 0) { + if (use_dmabuf_ && !cache_hit) { + if (egl_.createInputDMABufTexture2D(*eglImageIn, input->planes()[0].fd.get()) != 0) { use_dmabuf_ = false; LOG(Debayer, Info) << "Importing input buffer with DMABuf import failed, falling back to upload"; } } + /* Cache hit using dmabuf activate and bind */ + if (use_dmabuf_ && cache_hit) { + egl_.activateBindTexture(*eglImageIn); + } + /* Otherwise create texture for input buffer via upload from CPU */ if (!use_dmabuf_) { inDmaSyncer->emplace(input->planes()[0].fd, DmaSyncer::SyncType::Read); inMapped->emplace(input, MappedFrameBuffer::MapFlag::Read); if (!inMapped->value().isValid()) { LOG(Debayer, Error) << "mmap-ing buffer(s) failed"; - return -ENODEV; + return nullptr; } - egl_.createInputTexture2D(*eglImageBayerIn_, inMapped->value().planes()[0].data()); + if (cache_hit) + egl_.updateInputTexture2D(*eglImageIn, inMapped->value().planes()[0].data()); + else + egl_.createInputTexture2D(*eglImageIn, inMapped->value().planes()[0].data()); } - /* Generate the output render framebuffer as render to texture */ - egl_.createOutputDMABufTexture2D(*eglImageBayerOut_, output->planes()[0].fd.get()); + return eglImageIn; +} + +eGLImage *DebayerEGL::getCachedOutputFrameBuffer(FrameBuffer *output) +{ + eGLImage *eglImageOut; + bool cache_hit; + int cache_idx; + + cache_idx = getBufferCache(*outputBufferCache_, output, cache_hit); + if (cache_idx < 0) + return nullptr; + + if (!cache_hit) { + eglImageBayerOut_[cache_idx] = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, + outputSize_.height, outputConfig_.stride, GL_TEXTURE1, 1); + egl_.createOutputDMABufTexture2D(*eglImageBayerOut_[cache_idx], output->planes()[0].fd.get()); + } + eglImageOut = eglImageBayerOut_[cache_idx].get(); + + return eglImageOut; +} + +int DebayerEGL::debayerGPU(FrameBuffer *input, FrameBuffer *output, const DebayerParams ¶ms, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer) +{ + eGLImage *eglImageIn; + eGLImage *eglImageOut; + + /* eGL context switch */ + egl_.makeCurrent(); + + eglImageIn = getCachedInputFrameBuffer(input, inMapped, inDmaSyncer); + if (!eglImageIn) + return -ENOMEM; + + eglImageOut = getCachedOutputFrameBuffer(output); + if (!eglImageOut) + return -ENOMEM; + + egl_.attachTextureToFBO(*eglImageOut); + setShaderVariableValues(*eglImageIn, params); - setShaderVariableValues(*eglImageBayerIn_, params); glViewport(0, 0, width_, height_); glClear(GL_COLOR_BUFFER_BIT); glDrawArrays(GL_TRIANGLE_FAN, 0, DEBAYER_OPENGL_COORDS); @@ -623,19 +701,13 @@ int DebayerEGL::start() if (initBayerShaders(inputPixelFormat_, outputPixelFormat_)) return -EINVAL; - /* Raw bayer input as texture */ - eglImageBayerIn_ = std::make_unique<eGLImage>(glFormat_, inputConfig_.stride / bytesPerPixel_, height_, inputConfig_.stride, GL_TEXTURE0, 0); - - /* Texture we will render to */ - eglImageBayerOut_ = std::make_unique<eGLImage>(GL_RGBA, outputSize_.width, outputSize_.height, outputConfig_.stride, GL_TEXTURE1, 1); - return 0; } void DebayerEGL::stop() { - eglImageBayerOut_.reset(); - eglImageBayerIn_.reset(); + eglImageBayerOut_.clear(); + eglImageBayerIn_.clear(); if (programId_) glDeleteProgram(programId_); diff --git a/src/libcamera/software_isp/debayer_egl.h b/src/libcamera/software_isp/debayer_egl.h index d8509e9f2..238fe7345 100644 --- a/src/libcamera/software_isp/debayer_egl.h +++ b/src/libcamera/software_isp/debayer_egl.h @@ -22,6 +22,7 @@ #include "libcamera/internal/mapped_framebuffer.h" #include "libcamera/internal/software_isp/benchmark.h" #include "libcamera/internal/software_isp/swstats_cpu.h" +#include "libcamera/internal/v4l2_videodevice.h" #include <EGL/egl.h> #include <EGL/eglext.h> @@ -70,14 +71,21 @@ private: bool use_dmabuf_; + int getBufferCache(V4L2BufferCache &buffercache, FrameBuffer *framebuffer, bool &hit); + eGLImage *getCachedInputFrameBuffer(FrameBuffer *input, std::optional<MappedFrameBuffer> *inMapped, std::optional<DmaSyncer> *inDmaSyncer); + eGLImage *getCachedOutputFrameBuffer(FrameBuffer *output); + + std::unique_ptr<V4L2BufferCache> inputBufferCache_; + std::unique_ptr<V4L2BufferCache> outputBufferCache_; + /* Shader program identifiers */ GLuint vertexShaderId_ = 0; GLuint fragmentShaderId_ = 0; GLuint programId_ = 0; /* Pointer to object representing input texture */ - std::unique_ptr<eGLImage> eglImageBayerIn_; - std::unique_ptr<eGLImage> eglImageBayerOut_; + std::vector<std::unique_ptr<eGLImage>> eglImageBayerIn_; + std::vector<std::unique_ptr<eGLImage>> eglImageBayerOut_; /* Shader parameters */ float firstRed_x_;
Implement a texture caching mechanism for both input and output frames and for both types of input frame. The before/after on a Qualcomm x1e is: 9.737ms per frame 5.691ms per frame The before/after on a Qualcomm sm8250 is: 21.710ms per frame 17.336ms per frame for i in {1..20} do cam -c /base/soc@0/cci@ac16000/i2c-bus@1/camera@10 -s width=1920,height=1080 --capture=60 Interestingly there appears to be an absolute ~ 4.x ms per frame uplift as opposed to what intuition might suggest a proportional. Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> --- src/libcamera/software_isp/debayer_egl.cpp | 108 +++++++++++++++++---- src/libcamera/software_isp/debayer_egl.h | 12 ++- 2 files changed, 100 insertions(+), 20 deletions(-)