[v1,3/3] libcamera: virtual: Speed up test pattern animation
diff mbox series

Message ID 20241211152542.1095857-3-pobrn@protonmail.com
State Superseded
Headers show
Series
  • [v1,1/3] libcamera: virtual: Avoid some copies
Related show

Commit Message

Barnabás Pőcze Dec. 11, 2024, 3:25 p.m. UTC
After the initial generation, when a frame is requested,
the test pattern generator rotates the image left by 1 column.

The current approach has two shortcomings:
  (1) it allocates a temporary buffer to hold one column;
  (2) it swaps two columns at a time.

The test patterns are simple ARGB images, in row-major order,
so doing (2) works against memory prefetching. This can be
addressed by doing the rotation one row at a time as that way
the image is addressed in a purely linear fashion. Doing so also
eliminates the need for a dynamically allocated temporary buffer,
as the required buffer now only needs to hold one sample,
which is 4 bytes in this case.

In an optimized build, this results in about a 2x increase in the
number of frames per second as reported by `cam`. In an unoptimized,
ASAN and UBSAN intrumented build, the difference is even bigger,
which is useful for running lc-compliance in CI in a reasonable time.

Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
---
 .../virtual/test_pattern_generator.cpp        | 68 +++++++++----------
 .../pipeline/virtual/test_pattern_generator.h |  4 --
 2 files changed, 34 insertions(+), 38 deletions(-)

Comments

Kieran Bingham Dec. 17, 2024, 4:19 p.m. UTC | #1
Quoting Barnabás Pőcze (2024-12-11 15:25:56)
> After the initial generation, when a frame is requested,
> the test pattern generator rotates the image left by 1 column.
> 
> The current approach has two shortcomings:
>   (1) it allocates a temporary buffer to hold one column;
>   (2) it swaps two columns at a time.
> 
> The test patterns are simple ARGB images, in row-major order,
> so doing (2) works against memory prefetching. This can be
> addressed by doing the rotation one row at a time as that way
> the image is addressed in a purely linear fashion. Doing so also
> eliminates the need for a dynamically allocated temporary buffer,
> as the required buffer now only needs to hold one sample,
> which is 4 bytes in this case.
> 
> In an optimized build, this results in about a 2x increase in the
> number of frames per second as reported by `cam`. In an unoptimized,
> ASAN and UBSAN intrumented build, the difference is even bigger,
> which is useful for running lc-compliance in CI in a reasonable time.
> 
> Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>

Hrm, rabbit holes a plenty trying to test this.

First qcam isn't working on my main development PC as Qt6 does not
supply a pkg-config file in the version available on Ubuntu 22.04 LTS.

Next up, I'm running cam with the SDL output which works:

./build/gcc/src/apps/cam/cam -c2 -S -C

136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0


but the virtual pipeline handler is clearly not handling sequence
numbers, timetamps, bytesused - or presumably any of the required
metadata correctly.

But none of that is a fault of your patch which is working so:


Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

> ---
>  .../virtual/test_pattern_generator.cpp        | 68 +++++++++----------
>  .../pipeline/virtual/test_pattern_generator.h |  4 --
>  2 files changed, 34 insertions(+), 38 deletions(-)
> 
> diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> index 47d341919..eeadc1646 100644
> --- a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> @@ -5,6 +5,8 @@
>   * Derived class of FrameGenerator for generating test patterns
>   */
>  
> +#include <string.h>
> +
>  #include "test_pattern_generator.h"
>  
>  #include <libcamera/base/log.h>
> @@ -13,6 +15,37 @@
>  
>  #include <libyuv/convert_from_argb.h>
>  
> +namespace {
> +
> +template<size_t SampleSize>
> +void rotateLeft1Sample(uint8_t *samples, size_t width)
> +{
> +       if (width <= 0)
> +               return;
> +
> +       const size_t stride = width * SampleSize;
> +       uint8_t first[SampleSize];
> +
> +       memcpy(first, &samples[0], SampleSize);
> +       for (size_t i = 0; i < stride - SampleSize; i += SampleSize)
> +               memcpy(&samples[i], &samples[i + SampleSize], SampleSize);
> +       memcpy(&samples[stride - SampleSize], first, SampleSize);
> +}
> +
> +template<size_t SampleSize>
> +void rotateLeft1Column(const libcamera::Size &size, uint8_t *image)
> +{
> +       if (size.width < 2)
> +               return;
> +
> +       const size_t stride = size.width * SampleSize;
> +
> +       for (size_t i = 0; i < size.height; i++, image += stride)
> +               rotateLeft1Sample<SampleSize>(image, size.width);
> +}
> +
> +}
> +
>  namespace libcamera {
>  
>  LOG_DECLARE_CATEGORY(Virtual)
> @@ -27,7 +60,7 @@ int TestPatternGenerator::generateFrame(const Size &size,
>  
>         const auto &planes = mappedFrameBuffer.planes();
>  
> -       shiftLeft(size);
> +       rotateLeft1Column<kARGBSize>(size, template_.get());
>  
>         /* Convert the template_ to the frame buffer */
>         int ret = libyuv::ARGBToNV12(template_.get(), size.width * kARGBSize,
> @@ -40,39 +73,6 @@ int TestPatternGenerator::generateFrame(const Size &size,
>         return ret;
>  }
>  
> -void TestPatternGenerator::shiftLeft(const Size &size)
> -{
> -       /* Store the first column temporarily */
> -       auto firstColumn = std::make_unique<uint8_t[]>(size.height * kARGBSize);
> -       for (size_t h = 0; h < size.height; h++) {
> -               unsigned int index = h * size.width * kARGBSize;
> -               unsigned int index1 = h * kARGBSize;
> -               firstColumn[index1] = template_[index];
> -               firstColumn[index1 + 1] = template_[index + 1];
> -               firstColumn[index1 + 2] = template_[index + 2];
> -               firstColumn[index1 + 3] = 0x00;
> -       }
> -
> -       /* Overwrite template_ */
> -       uint8_t *buf = template_.get();
> -       for (size_t h = 0; h < size.height; h++) {
> -               for (size_t w = 0; w < size.width - 1; w++) {
> -                       /* Overwrite with the pixel on the right */
> -                       unsigned int index = (h * size.width + w + 1) * kARGBSize;
> -                       *buf++ = template_[index]; /* B */
> -                       *buf++ = template_[index + 1]; /* G */
> -                       *buf++ = template_[index + 2]; /* R */
> -                       *buf++ = 0x00; /* A */
> -               }
> -               /* Overwrite the new last column with the original first column */
> -               unsigned int index1 = h * kARGBSize;
> -               *buf++ = firstColumn[index1]; /* B */
> -               *buf++ = firstColumn[index1 + 1]; /* G */
> -               *buf++ = firstColumn[index1 + 2]; /* R */
> -               *buf++ = 0x00; /* A */
> -       }
> -}
> -
>  void ColorBarsGenerator::configure(const Size &size)
>  {
>         constexpr uint8_t kColorBar[8][3] = {
> diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.h b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> index 05f4ab7a7..2a51bd31a 100644
> --- a/src/libcamera/pipeline/virtual/test_pattern_generator.h
> +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> @@ -29,10 +29,6 @@ public:
>  protected:
>         /* Buffer of test pattern template */
>         std::unique_ptr<uint8_t[]> template_;
> -
> -private:
> -       /* Shift the buffer by 1 pixel left each frame */
> -       void shiftLeft(const Size &size);
>  };
>  
>  class ColorBarsGenerator : public TestPatternGenerator
> -- 
> 2.47.1
> 
>
Barnabás Pőcze Dec. 17, 2024, 4:22 p.m. UTC | #2
Regards,
Barnabás Pőcze


2024. december 17., kedd 17:19 keltezéssel, Kieran Bingham <kieran.bingham@ideasonboard.com> írta:

> Quoting Barnabás Pőcze (2024-12-11 15:25:56)
> 
> > After the initial generation, when a frame is requested,
> > the test pattern generator rotates the image left by 1 column.
> > 
> > The current approach has two shortcomings:
> > (1) it allocates a temporary buffer to hold one column;
> > (2) it swaps two columns at a time.
> > 
> > The test patterns are simple ARGB images, in row-major order,
> > so doing (2) works against memory prefetching. This can be
> > addressed by doing the rotation one row at a time as that way
> > the image is addressed in a purely linear fashion. Doing so also
> > eliminates the need for a dynamically allocated temporary buffer,
> > as the required buffer now only needs to hold one sample,
> > which is 4 bytes in this case.
> > 
> > In an optimized build, this results in about a 2x increase in the
> > number of frames per second as reported by `cam`. In an unoptimized,
> > ASAN and UBSAN intrumented build, the difference is even bigger,
> > which is useful for running lc-compliance in CI in a reasonable time.
> > 
> > Signed-off-by: Barnabás Pőcze pobrn@protonmail.com
> 
> 
> Hrm, rabbit holes a plenty trying to test this.
> 
> First qcam isn't working on my main development PC as Qt6 does not
> supply a pkg-config file in the version available on Ubuntu 22.04 LTS.
> 
> Next up, I'm running cam with the SDL output which works:
> 
> ./build/gcc/src/apps/cam/cam -c2 -S -C
> 
> 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> 
> 
> but the virtual pipeline handler is clearly not handling sequence
> numbers, timetamps, bytesused - or presumably any of the required
> metadata correctly.
> 
> But none of that is a fault of your patch which is working so:

I forgot to mention that I used this to test:
https://gitlab.freedesktop.org/pobrn/libcamera/-/commit/cd537fa855bb68c1faeab8fb5b94c4d423899b04

There is also a corresponding bug:
https://bugs.libcamera.org/show_bug.cgi?id=245


Regards,
Barnabás Pőcze

> 
> 
> Tested-by: Kieran Bingham kieran.bingham@ideasonboard.com
> 
> Reviewed-by: Kieran Bingham kieran.bingham@ideasonboard.com
> 
> > ---
> > .../virtual/test_pattern_generator.cpp | 68 +++++++++----------
> > .../pipeline/virtual/test_pattern_generator.h | 4 --
> > 2 files changed, 34 insertions(+), 38 deletions(-)
> > 
> > diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > index 47d341919..eeadc1646 100644
> > --- a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > @@ -5,6 +5,8 @@
> > * Derived class of FrameGenerator for generating test patterns
> > */
> > 
> > +#include <string.h>
> > +
> > #include "test_pattern_generator.h"
> > 
> > #include <libcamera/base/log.h>
> > @@ -13,6 +15,37 @@
> > 
> > #include <libyuv/convert_from_argb.h>
> > 
> > +namespace {
> > +
> > +template<size_t SampleSize>
> > +void rotateLeft1Sample(uint8_t *samples, size_t width)
> > +{
> > + if (width <= 0)
> > + return;
> > +
> > + const size_t stride = width * SampleSize;
> > + uint8_t first[SampleSize];
> > +
> > + memcpy(first, &samples[0], SampleSize);
> > + for (size_t i = 0; i < stride - SampleSize; i += SampleSize)
> > + memcpy(&samples[i], &samples[i + SampleSize], SampleSize);
> > + memcpy(&samples[stride - SampleSize], first, SampleSize);
> > +}
> > +
> > +template<size_t SampleSize>
> > +void rotateLeft1Column(const libcamera::Size &size, uint8_t *image)
> > +{
> > + if (size.width < 2)
> > + return;
> > +
> > + const size_t stride = size.width * SampleSize;
> > +
> > + for (size_t i = 0; i < size.height; i++, image += stride)
> > + rotateLeft1Sample<SampleSize>(image, size.width);
> > +}
> > +
> > +}
> > +
> > namespace libcamera {
> > 
> > LOG_DECLARE_CATEGORY(Virtual)
> > @@ -27,7 +60,7 @@ int TestPatternGenerator::generateFrame(const Size &size,
> > 
> > const auto &planes = mappedFrameBuffer.planes();
> > 
> > - shiftLeft(size);
> > + rotateLeft1Column<kARGBSize>(size, template_.get());
> > 
> > /* Convert the template_ to the frame buffer */
> > int ret = libyuv::ARGBToNV12(template_.get(), size.width * kARGBSize,
> > @@ -40,39 +73,6 @@ int TestPatternGenerator::generateFrame(const Size &size,
> > return ret;
> > }
> > 
> > -void TestPatternGenerator::shiftLeft(const Size &size)
> > -{
> > - /* Store the first column temporarily /
> > - auto firstColumn = std::make_unique<uint8_t[]>(size.height * kARGBSize);
> > - for (size_t h = 0; h < size.height; h++) {
> > - unsigned int index = h * size.width * kARGBSize;
> > - unsigned int index1 = h * kARGBSize;
> > - firstColumn[index1] = template_[index];
> > - firstColumn[index1 + 1] = template_[index + 1];
> > - firstColumn[index1 + 2] = template_[index + 2];
> > - firstColumn[index1 + 3] = 0x00;
> > - }
> > -
> > - / Overwrite template_ */
> > - uint8_t buf = template_.get();
> > - for (size_t h = 0; h < size.height; h++) {
> > - for (size_t w = 0; w < size.width - 1; w++) {
> > - / Overwrite with the pixel on the right */
> > - unsigned int index = (h * size.width + w + 1) * kARGBSize;
> > - buf++ = template_[index]; / B */
> > - buf++ = template_[index + 1]; / G */
> > - buf++ = template_[index + 2]; / R */
> > - buf++ = 0x00; / A /
> > - }
> > - / Overwrite the new last column with the original first column */
> > - unsigned int index1 = h * kARGBSize;
> > - buf++ = firstColumn[index1]; / B */
> > - buf++ = firstColumn[index1 + 1]; / G */
> > - buf++ = firstColumn[index1 + 2]; / R */
> > - buf++ = 0x00; / A /
> > - }
> > -}
> > -
> > void ColorBarsGenerator::configure(const Size &size)
> > {
> > constexpr uint8_t kColorBar[8][3] = {
> > diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.h b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > index 05f4ab7a7..2a51bd31a 100644
> > --- a/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > @@ -29,10 +29,6 @@ public:
> > protected:
> > / Buffer of test pattern template /
> > std::unique_ptr<uint8_t[]> template_;
> > -
> > -private:
> > - / Shift the buffer by 1 pixel left each frame */
> > - void shiftLeft(const Size &size);
> > };
> > 
> > class ColorBarsGenerator : public TestPatternGenerator
> > --
> > 2.47.1
Laurent Pinchart Dec. 17, 2024, 4:39 p.m. UTC | #3
Hi Barnabás,

Thank you for the patch.

On Wed, Dec 11, 2024 at 03:25:56PM +0000, Barnabás Pőcze wrote:
> After the initial generation, when a frame is requested,
> the test pattern generator rotates the image left by 1 column.
> 
> The current approach has two shortcomings:
>   (1) it allocates a temporary buffer to hold one column;
>   (2) it swaps two columns at a time.
> 
> The test patterns are simple ARGB images, in row-major order,
> so doing (2) works against memory prefetching. This can be
> addressed by doing the rotation one row at a time as that way
> the image is addressed in a purely linear fashion. Doing so also
> eliminates the need for a dynamically allocated temporary buffer,
> as the required buffer now only needs to hold one sample,
> which is 4 bytes in this case.
> 
> In an optimized build, this results in about a 2x increase in the
> number of frames per second as reported by `cam`. In an unoptimized,
> ASAN and UBSAN intrumented build, the difference is even bigger,
> which is useful for running lc-compliance in CI in a reasonable time.
> 
> Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
> ---
>  .../virtual/test_pattern_generator.cpp        | 68 +++++++++----------
>  .../pipeline/virtual/test_pattern_generator.h |  4 --
>  2 files changed, 34 insertions(+), 38 deletions(-)
> 
> diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> index 47d341919..eeadc1646 100644
> --- a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> @@ -5,6 +5,8 @@
>   * Derived class of FrameGenerator for generating test patterns
>   */
>  
> +#include <string.h>
> +
>  #include "test_pattern_generator.h"
>  
>  #include <libcamera/base/log.h>
> @@ -13,6 +15,37 @@
>  
>  #include <libyuv/convert_from_argb.h>
>  
> +namespace {
> +
> +template<size_t SampleSize>
> +void rotateLeft1Sample(uint8_t *samples, size_t width)
> +{
> +	if (width <= 0)

width is unsigned, so it will never be smaller than 0. As the caller
already ensures that the width is >= 2, I'd drop this check.

> +		return;
> +
> +	const size_t stride = width * SampleSize;

You could pass the stride value to this function, instead of recomputing
it for each line.

> +	uint8_t first[SampleSize];
> +
> +	memcpy(first, &samples[0], SampleSize);
> +	for (size_t i = 0; i < stride - SampleSize; i += SampleSize)
> +		memcpy(&samples[i], &samples[i + SampleSize], SampleSize);

Have you considered replacing this with

	memmove(&samples[0], &samples[SampleSize], stride - SampleSize);

?

This patch is a good improvement, so

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

with the above comments taken into account (regardless of whether the
suggestions are accepted or rejected).

I think the test pattern generator could do with a whole rewrite. I
would compute the test pattern image at configure time, and store it in
NV12 and in the output resolution, and handle the rotation when copying
it to the frame buffer. That should significantly speed up operation.

> +	memcpy(&samples[stride - SampleSize], first, SampleSize);
> +}
> +
> +template<size_t SampleSize>
> +void rotateLeft1Column(const libcamera::Size &size, uint8_t *image)
> +{
> +	if (size.width < 2)
> +		return;
> +
> +	const size_t stride = size.width * SampleSize;
> +
> +	for (size_t i = 0; i < size.height; i++, image += stride)
> +		rotateLeft1Sample<SampleSize>(image, size.width);
> +}
> +
> +}
> +
>  namespace libcamera {
>  
>  LOG_DECLARE_CATEGORY(Virtual)
> @@ -27,7 +60,7 @@ int TestPatternGenerator::generateFrame(const Size &size,
>  
>  	const auto &planes = mappedFrameBuffer.planes();
>  
> -	shiftLeft(size);
> +	rotateLeft1Column<kARGBSize>(size, template_.get());
>  
>  	/* Convert the template_ to the frame buffer */
>  	int ret = libyuv::ARGBToNV12(template_.get(), size.width * kARGBSize,
> @@ -40,39 +73,6 @@ int TestPatternGenerator::generateFrame(const Size &size,
>  	return ret;
>  }
>  
> -void TestPatternGenerator::shiftLeft(const Size &size)
> -{
> -	/* Store the first column temporarily */
> -	auto firstColumn = std::make_unique<uint8_t[]>(size.height * kARGBSize);
> -	for (size_t h = 0; h < size.height; h++) {
> -		unsigned int index = h * size.width * kARGBSize;
> -		unsigned int index1 = h * kARGBSize;
> -		firstColumn[index1] = template_[index];
> -		firstColumn[index1 + 1] = template_[index + 1];
> -		firstColumn[index1 + 2] = template_[index + 2];
> -		firstColumn[index1 + 3] = 0x00;
> -	}
> -
> -	/* Overwrite template_ */
> -	uint8_t *buf = template_.get();
> -	for (size_t h = 0; h < size.height; h++) {
> -		for (size_t w = 0; w < size.width - 1; w++) {
> -			/* Overwrite with the pixel on the right */
> -			unsigned int index = (h * size.width + w + 1) * kARGBSize;
> -			*buf++ = template_[index]; /* B */
> -			*buf++ = template_[index + 1]; /* G */
> -			*buf++ = template_[index + 2]; /* R */
> -			*buf++ = 0x00; /* A */
> -		}
> -		/* Overwrite the new last column with the original first column */
> -		unsigned int index1 = h * kARGBSize;
> -		*buf++ = firstColumn[index1]; /* B */
> -		*buf++ = firstColumn[index1 + 1]; /* G */
> -		*buf++ = firstColumn[index1 + 2]; /* R */
> -		*buf++ = 0x00; /* A */
> -	}
> -}
> -
>  void ColorBarsGenerator::configure(const Size &size)
>  {
>  	constexpr uint8_t kColorBar[8][3] = {
> diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.h b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> index 05f4ab7a7..2a51bd31a 100644
> --- a/src/libcamera/pipeline/virtual/test_pattern_generator.h
> +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> @@ -29,10 +29,6 @@ public:
>  protected:
>  	/* Buffer of test pattern template */
>  	std::unique_ptr<uint8_t[]> template_;
> -
> -private:
> -	/* Shift the buffer by 1 pixel left each frame */
> -	void shiftLeft(const Size &size);
>  };
>  
>  class ColorBarsGenerator : public TestPatternGenerator
Laurent Pinchart Dec. 17, 2024, 4:57 p.m. UTC | #4
CC'ing Harvey.

On Tue, Dec 17, 2024 at 04:22:42PM +0000, Barnabás Pőcze wrote:
> 2024. december 17., kedd 17:19 keltezéssel, Kieran Bingham írta:
> > Quoting Barnabás Pőcze (2024-12-11 15:25:56)
> > 
> > > After the initial generation, when a frame is requested,
> > > the test pattern generator rotates the image left by 1 column.
> > > 
> > > The current approach has two shortcomings:
> > > (1) it allocates a temporary buffer to hold one column;
> > > (2) it swaps two columns at a time.
> > > 
> > > The test patterns are simple ARGB images, in row-major order,
> > > so doing (2) works against memory prefetching. This can be
> > > addressed by doing the rotation one row at a time as that way
> > > the image is addressed in a purely linear fashion. Doing so also
> > > eliminates the need for a dynamically allocated temporary buffer,
> > > as the required buffer now only needs to hold one sample,
> > > which is 4 bytes in this case.
> > > 
> > > In an optimized build, this results in about a 2x increase in the
> > > number of frames per second as reported by `cam`. In an unoptimized,
> > > ASAN and UBSAN intrumented build, the difference is even bigger,
> > > which is useful for running lc-compliance in CI in a reasonable time.
> > > 
> > > Signed-off-by: Barnabás Pőcze pobrn@protonmail.com
> > 
> > 
> > Hrm, rabbit holes a plenty trying to test this.
> > 
> > First qcam isn't working on my main development PC as Qt6 does not
> > supply a pkg-config file in the version available on Ubuntu 22.04 LTS.
> > 
> > Next up, I'm running cam with the SDL output which works:
> > 
> > ./build/gcc/src/apps/cam/cam -c2 -S -C
> > 
> > 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> > 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> > 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> > 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> > 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> > 0.000000 (0.00 fps) cam0-stream0 seq: 000000 bytesused: 0/0
> > 136838.194958 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 136838.194951 (0.00 fps) cam0-stream0 seq: 031860 bytesused: 0/0
> > 
> > 
> > but the virtual pipeline handler is clearly not handling sequence
> > numbers, timetamps, bytesused - or presumably any of the required
> > metadata correctly.
> > 
> > But none of that is a fault of your patch which is working so:
> 
> I forgot to mention that I used this to test:
> https://gitlab.freedesktop.org/pobrn/libcamera/-/commit/cd537fa855bb68c1faeab8fb5b94c4d423899b04
> 
> There is also a corresponding bug:
> https://bugs.libcamera.org/show_bug.cgi?id=245

That's not great, it should be fixed sooner than later as I expect it
will impact CI. Harvey, when do you think you could have a look at that
? Could you also create an account on bugs.libcamera.org, so that I can
assign bugs for the virtual pipeline handler to you in the future ?

> > Tested-by: Kieran Bingham kieran.bingham@ideasonboard.com
> > 
> > Reviewed-by: Kieran Bingham kieran.bingham@ideasonboard.com
> > 
> > > ---
> > > .../virtual/test_pattern_generator.cpp | 68 +++++++++----------
> > > .../pipeline/virtual/test_pattern_generator.h | 4 --
> > > 2 files changed, 34 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > > index 47d341919..eeadc1646 100644
> > > --- a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > > +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > > @@ -5,6 +5,8 @@
> > > * Derived class of FrameGenerator for generating test patterns
> > > */
> > > 
> > > +#include <string.h>
> > > +
> > > #include "test_pattern_generator.h"
> > > 
> > > #include <libcamera/base/log.h>
> > > @@ -13,6 +15,37 @@
> > > 
> > > #include <libyuv/convert_from_argb.h>
> > > 
> > > +namespace {
> > > +
> > > +template<size_t SampleSize>
> > > +void rotateLeft1Sample(uint8_t *samples, size_t width)
> > > +{
> > > + if (width <= 0)
> > > + return;
> > > +
> > > + const size_t stride = width * SampleSize;
> > > + uint8_t first[SampleSize];
> > > +
> > > + memcpy(first, &samples[0], SampleSize);
> > > + for (size_t i = 0; i < stride - SampleSize; i += SampleSize)
> > > + memcpy(&samples[i], &samples[i + SampleSize], SampleSize);
> > > + memcpy(&samples[stride - SampleSize], first, SampleSize);
> > > +}
> > > +
> > > +template<size_t SampleSize>
> > > +void rotateLeft1Column(const libcamera::Size &size, uint8_t *image)
> > > +{
> > > + if (size.width < 2)
> > > + return;
> > > +
> > > + const size_t stride = size.width * SampleSize;
> > > +
> > > + for (size_t i = 0; i < size.height; i++, image += stride)
> > > + rotateLeft1Sample<SampleSize>(image, size.width);
> > > +}
> > > +
> > > +}
> > > +
> > > namespace libcamera {
> > > 
> > > LOG_DECLARE_CATEGORY(Virtual)
> > > @@ -27,7 +60,7 @@ int TestPatternGenerator::generateFrame(const Size &size,
> > > 
> > > const auto &planes = mappedFrameBuffer.planes();
> > > 
> > > - shiftLeft(size);
> > > + rotateLeft1Column<kARGBSize>(size, template_.get());
> > > 
> > > /* Convert the template_ to the frame buffer */
> > > int ret = libyuv::ARGBToNV12(template_.get(), size.width * kARGBSize,
> > > @@ -40,39 +73,6 @@ int TestPatternGenerator::generateFrame(const Size &size,
> > > return ret;
> > > }
> > > 
> > > -void TestPatternGenerator::shiftLeft(const Size &size)
> > > -{
> > > - /* Store the first column temporarily /
> > > - auto firstColumn = std::make_unique<uint8_t[]>(size.height * kARGBSize);
> > > - for (size_t h = 0; h < size.height; h++) {
> > > - unsigned int index = h * size.width * kARGBSize;
> > > - unsigned int index1 = h * kARGBSize;
> > > - firstColumn[index1] = template_[index];
> > > - firstColumn[index1 + 1] = template_[index + 1];
> > > - firstColumn[index1 + 2] = template_[index + 2];
> > > - firstColumn[index1 + 3] = 0x00;
> > > - }
> > > -
> > > - / Overwrite template_ */
> > > - uint8_t buf = template_.get();
> > > - for (size_t h = 0; h < size.height; h++) {
> > > - for (size_t w = 0; w < size.width - 1; w++) {
> > > - / Overwrite with the pixel on the right */
> > > - unsigned int index = (h * size.width + w + 1) * kARGBSize;
> > > - buf++ = template_[index]; / B */
> > > - buf++ = template_[index + 1]; / G */
> > > - buf++ = template_[index + 2]; / R */
> > > - buf++ = 0x00; / A /
> > > - }
> > > - / Overwrite the new last column with the original first column */
> > > - unsigned int index1 = h * kARGBSize;
> > > - buf++ = firstColumn[index1]; / B */
> > > - buf++ = firstColumn[index1 + 1]; / G */
> > > - buf++ = firstColumn[index1 + 2]; / R */
> > > - buf++ = 0x00; / A /
> > > - }
> > > -}
> > > -
> > > void ColorBarsGenerator::configure(const Size &size)
> > > {
> > > constexpr uint8_t kColorBar[8][3] = {
> > > diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.h b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > > index 05f4ab7a7..2a51bd31a 100644
> > > --- a/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > > +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > > @@ -29,10 +29,6 @@ public:
> > > protected:
> > > / Buffer of test pattern template /
> > > std::unique_ptr<uint8_t[]> template_;
> > > -
> > > -private:
> > > - / Shift the buffer by 1 pixel left each frame */
> > > - void shiftLeft(const Size &size);
> > > };
> > > 
> > > class ColorBarsGenerator : public TestPatternGenerator
Barnabás Pőcze Dec. 17, 2024, 5:38 p.m. UTC | #5
2024. december 17., kedd 17:39 keltezéssel, Laurent Pinchart <laurent.pinchart@ideasonboard.com> írta:

> Hi Barnabás,
> 
> Thank you for the patch.
> 
> On Wed, Dec 11, 2024 at 03:25:56PM +0000, Barnabás Pőcze wrote:
> > After the initial generation, when a frame is requested,
> > the test pattern generator rotates the image left by 1 column.
> >
> > The current approach has two shortcomings:
> >   (1) it allocates a temporary buffer to hold one column;
> >   (2) it swaps two columns at a time.
> >
> > The test patterns are simple ARGB images, in row-major order,
> > so doing (2) works against memory prefetching. This can be
> > addressed by doing the rotation one row at a time as that way
> > the image is addressed in a purely linear fashion. Doing so also
> > eliminates the need for a dynamically allocated temporary buffer,
> > as the required buffer now only needs to hold one sample,
> > which is 4 bytes in this case.
> >
> > In an optimized build, this results in about a 2x increase in the
> > number of frames per second as reported by `cam`. In an unoptimized,
> > ASAN and UBSAN intrumented build, the difference is even bigger,
> > which is useful for running lc-compliance in CI in a reasonable time.
> >
> > Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
> > ---
> >  .../virtual/test_pattern_generator.cpp        | 68 +++++++++----------
> >  .../pipeline/virtual/test_pattern_generator.h |  4 --
> >  2 files changed, 34 insertions(+), 38 deletions(-)
> >
> > diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > index 47d341919..eeadc1646 100644
> > --- a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
> > @@ -5,6 +5,8 @@
> >   * Derived class of FrameGenerator for generating test patterns
> >   */
> >
> > +#include <string.h>
> > +
> >  #include "test_pattern_generator.h"
> >
> >  #include <libcamera/base/log.h>
> > @@ -13,6 +15,37 @@
> >
> >  #include <libyuv/convert_from_argb.h>
> >
> > +namespace {
> > +
> > +template<size_t SampleSize>
> > +void rotateLeft1Sample(uint8_t *samples, size_t width)
> > +{
> > +	if (width <= 0)
> 
> width is unsigned, so it will never be smaller than 0. As the caller
> already ensures that the width is >= 2, I'd drop this check.
> 
> > +		return;
> > +
> > +	const size_t stride = width * SampleSize;
> 
> You could pass the stride value to this function, instead of recomputing
> it for each line.
> 
> > +	uint8_t first[SampleSize];
> > +
> > +	memcpy(first, &samples[0], SampleSize);
> > +	for (size_t i = 0; i < stride - SampleSize; i += SampleSize)
> > +		memcpy(&samples[i], &samples[i + SampleSize], SampleSize);
> 
> Have you considered replacing this with
> 
> 	memmove(&samples[0], &samples[SampleSize], stride - SampleSize);
> 
> ?

No... but that's a good idea. I'll apply this in the next version.


Regards,
Barnabás Pőcze

> 
> This patch is a good improvement, so
> 
> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> 
> with the above comments taken into account (regardless of whether the
> suggestions are accepted or rejected).
> 
> I think the test pattern generator could do with a whole rewrite. I
> would compute the test pattern image at configure time, and store it in
> NV12 and in the output resolution, and handle the rotation when copying
> it to the frame buffer. That should significantly speed up operation.
> 
> > +	memcpy(&samples[stride - SampleSize], first, SampleSize);
> > +}
> > +
> > +template<size_t SampleSize>
> > +void rotateLeft1Column(const libcamera::Size &size, uint8_t *image)
> > +{
> > +	if (size.width < 2)
> > +		return;
> > +
> > +	const size_t stride = size.width * SampleSize;
> > +
> > +	for (size_t i = 0; i < size.height; i++, image += stride)
> > +		rotateLeft1Sample<SampleSize>(image, size.width);
> > +}
> > +
> > +}
> > +
> >  namespace libcamera {
> >
> >  LOG_DECLARE_CATEGORY(Virtual)
> > @@ -27,7 +60,7 @@ int TestPatternGenerator::generateFrame(const Size &size,
> >
> >  	const auto &planes = mappedFrameBuffer.planes();
> >
> > -	shiftLeft(size);
> > +	rotateLeft1Column<kARGBSize>(size, template_.get());
> >
> >  	/* Convert the template_ to the frame buffer */
> >  	int ret = libyuv::ARGBToNV12(template_.get(), size.width * kARGBSize,
> > @@ -40,39 +73,6 @@ int TestPatternGenerator::generateFrame(const Size &size,
> >  	return ret;
> >  }
> >
> > -void TestPatternGenerator::shiftLeft(const Size &size)
> > -{
> > -	/* Store the first column temporarily */
> > -	auto firstColumn = std::make_unique<uint8_t[]>(size.height * kARGBSize);
> > -	for (size_t h = 0; h < size.height; h++) {
> > -		unsigned int index = h * size.width * kARGBSize;
> > -		unsigned int index1 = h * kARGBSize;
> > -		firstColumn[index1] = template_[index];
> > -		firstColumn[index1 + 1] = template_[index + 1];
> > -		firstColumn[index1 + 2] = template_[index + 2];
> > -		firstColumn[index1 + 3] = 0x00;
> > -	}
> > -
> > -	/* Overwrite template_ */
> > -	uint8_t *buf = template_.get();
> > -	for (size_t h = 0; h < size.height; h++) {
> > -		for (size_t w = 0; w < size.width - 1; w++) {
> > -			/* Overwrite with the pixel on the right */
> > -			unsigned int index = (h * size.width + w + 1) * kARGBSize;
> > -			*buf++ = template_[index]; /* B */
> > -			*buf++ = template_[index + 1]; /* G */
> > -			*buf++ = template_[index + 2]; /* R */
> > -			*buf++ = 0x00; /* A */
> > -		}
> > -		/* Overwrite the new last column with the original first column */
> > -		unsigned int index1 = h * kARGBSize;
> > -		*buf++ = firstColumn[index1]; /* B */
> > -		*buf++ = firstColumn[index1 + 1]; /* G */
> > -		*buf++ = firstColumn[index1 + 2]; /* R */
> > -		*buf++ = 0x00; /* A */
> > -	}
> > -}
> > -
> >  void ColorBarsGenerator::configure(const Size &size)
> >  {
> >  	constexpr uint8_t kColorBar[8][3] = {
> > diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.h b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > index 05f4ab7a7..2a51bd31a 100644
> > --- a/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > +++ b/src/libcamera/pipeline/virtual/test_pattern_generator.h
> > @@ -29,10 +29,6 @@ public:
> >  protected:
> >  	/* Buffer of test pattern template */
> >  	std::unique_ptr<uint8_t[]> template_;
> > -
> > -private:
> > -	/* Shift the buffer by 1 pixel left each frame */
> > -	void shiftLeft(const Size &size);
> >  };
> >
> >  class ColorBarsGenerator : public TestPatternGenerator
> 
> --
> Regards,
> 
> Laurent Pinchart
>

Patch
diff mbox series

diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
index 47d341919..eeadc1646 100644
--- a/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
+++ b/src/libcamera/pipeline/virtual/test_pattern_generator.cpp
@@ -5,6 +5,8 @@ 
  * Derived class of FrameGenerator for generating test patterns
  */
 
+#include <string.h>
+
 #include "test_pattern_generator.h"
 
 #include <libcamera/base/log.h>
@@ -13,6 +15,37 @@ 
 
 #include <libyuv/convert_from_argb.h>
 
+namespace {
+
+template<size_t SampleSize>
+void rotateLeft1Sample(uint8_t *samples, size_t width)
+{
+	if (width <= 0)
+		return;
+
+	const size_t stride = width * SampleSize;
+	uint8_t first[SampleSize];
+
+	memcpy(first, &samples[0], SampleSize);
+	for (size_t i = 0; i < stride - SampleSize; i += SampleSize)
+		memcpy(&samples[i], &samples[i + SampleSize], SampleSize);
+	memcpy(&samples[stride - SampleSize], first, SampleSize);
+}
+
+template<size_t SampleSize>
+void rotateLeft1Column(const libcamera::Size &size, uint8_t *image)
+{
+	if (size.width < 2)
+		return;
+
+	const size_t stride = size.width * SampleSize;
+
+	for (size_t i = 0; i < size.height; i++, image += stride)
+		rotateLeft1Sample<SampleSize>(image, size.width);
+}
+
+}
+
 namespace libcamera {
 
 LOG_DECLARE_CATEGORY(Virtual)
@@ -27,7 +60,7 @@  int TestPatternGenerator::generateFrame(const Size &size,
 
 	const auto &planes = mappedFrameBuffer.planes();
 
-	shiftLeft(size);
+	rotateLeft1Column<kARGBSize>(size, template_.get());
 
 	/* Convert the template_ to the frame buffer */
 	int ret = libyuv::ARGBToNV12(template_.get(), size.width * kARGBSize,
@@ -40,39 +73,6 @@  int TestPatternGenerator::generateFrame(const Size &size,
 	return ret;
 }
 
-void TestPatternGenerator::shiftLeft(const Size &size)
-{
-	/* Store the first column temporarily */
-	auto firstColumn = std::make_unique<uint8_t[]>(size.height * kARGBSize);
-	for (size_t h = 0; h < size.height; h++) {
-		unsigned int index = h * size.width * kARGBSize;
-		unsigned int index1 = h * kARGBSize;
-		firstColumn[index1] = template_[index];
-		firstColumn[index1 + 1] = template_[index + 1];
-		firstColumn[index1 + 2] = template_[index + 2];
-		firstColumn[index1 + 3] = 0x00;
-	}
-
-	/* Overwrite template_ */
-	uint8_t *buf = template_.get();
-	for (size_t h = 0; h < size.height; h++) {
-		for (size_t w = 0; w < size.width - 1; w++) {
-			/* Overwrite with the pixel on the right */
-			unsigned int index = (h * size.width + w + 1) * kARGBSize;
-			*buf++ = template_[index]; /* B */
-			*buf++ = template_[index + 1]; /* G */
-			*buf++ = template_[index + 2]; /* R */
-			*buf++ = 0x00; /* A */
-		}
-		/* Overwrite the new last column with the original first column */
-		unsigned int index1 = h * kARGBSize;
-		*buf++ = firstColumn[index1]; /* B */
-		*buf++ = firstColumn[index1 + 1]; /* G */
-		*buf++ = firstColumn[index1 + 2]; /* R */
-		*buf++ = 0x00; /* A */
-	}
-}
-
 void ColorBarsGenerator::configure(const Size &size)
 {
 	constexpr uint8_t kColorBar[8][3] = {
diff --git a/src/libcamera/pipeline/virtual/test_pattern_generator.h b/src/libcamera/pipeline/virtual/test_pattern_generator.h
index 05f4ab7a7..2a51bd31a 100644
--- a/src/libcamera/pipeline/virtual/test_pattern_generator.h
+++ b/src/libcamera/pipeline/virtual/test_pattern_generator.h
@@ -29,10 +29,6 @@  public:
 protected:
 	/* Buffer of test pattern template */
 	std::unique_ptr<uint8_t[]> template_;
-
-private:
-	/* Shift the buffer by 1 pixel left each frame */
-	void shiftLeft(const Size &size);
 };
 
 class ColorBarsGenerator : public TestPatternGenerator