[{"id":29837,"web_url":"https://patchwork.libcamera.org/comment/29837/","msgid":"<878qzbeeo9.fsf@redhat.com>","date":"2024-06-11T14:29:10","subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","submitter":{"id":177,"url":"https://patchwork.libcamera.org/api/people/177/","name":"Milan Zamazal","email":"mzamazal@redhat.com"},"content":"Hi Robert,\n\nthank you for the patch.\n\nRobert Mader <robert.mader@collabora.com> writes:\n\n> In order to be more compatible with modern hardware and APIs. This\n> notably allows GL implementations to directly import the buffers more\n> often and seems to be required for Wayland.\n>\n> Further more, as we already enforce a 8 byte stride, these formats work\n> better for clients that don't support padding - such as libwebrtc at the\n> time of writing.\n>\n> Tested on the Librem5 and PinePhone.\n>\n> Signed-off-by: Robert Mader <robert.mader@collabora.com>\n> ---\n>  src/libcamera/software_isp/debayer_cpu.cpp | 244 +++++++++++++++++++--\n>  src/libcamera/software_isp/debayer_cpu.h   |  10 +\n>  2 files changed, 238 insertions(+), 16 deletions(-)\n>\n> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp\n> index c038eed4..73c66a88 100644\n> --- a/src/libcamera/software_isp/debayer_cpu.cpp\n> +++ b/src/libcamera/software_isp/debayer_cpu.cpp\n> @@ -76,6 +76,13 @@ DebayerCpu::~DebayerCpu()\n>  \t*dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n>  \tx++;\n>  \n> +#define BGGR_XBGR8888(p, n, div)                                                              \\\n> +\t*dst++ = blue_[curr[x] / (div)];                                                      \\\n> +\t*dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];       \\\n> +\t*dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> +\t*dst++ = 255;                                                                         \\\n> +\tx++;\n> +\n\nThe level of code duplication here starts to exceed reasonable limits.\nMaybe adding an argument to the macro deciding whether to append the\nlast `dst' assignment or not would have a significant performance\nimpact.  But even then I guess there must be a way to let the compiler\ngenerate the duplicate code (an inline function template?) rather than\ndoing it manually.  What do the C++ experts around think?\n\nIf nothing better is possible then `*dst++ = 255' can be applied in the\ncallers rather than defining the alternative macro versions.\n\n>  /*\n>   * GBG\n>   * RGR\n> @@ -87,6 +94,13 @@ DebayerCpu::~DebayerCpu()\n>  \t*dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n>  \tx++;\n>  \n> +#define GRBG_XBGR8888(p, n, div)                                  \\\n> +\t*dst++ = blue_[(prev[x] + next[x]) / (2 * (div))];        \\\n> +\t*dst++ = green_[curr[x] / (div)];                         \\\n> +\t*dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> +\t*dst++ = 255;                                             \\\n> +\tx++;\n> +\n>  /*\n>   * GRG\n>   * BGB\n> @@ -98,6 +112,13 @@ DebayerCpu::~DebayerCpu()\n>  \t*dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n>  \tx++;\n>  \n> +#define GBRG_XBGR8888(p, n, div)                                   \\\n> +\t*dst++ = blue_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> +\t*dst++ = green_[curr[x] / (div)];                          \\\n> +\t*dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n> +\t*dst++ = 255;                                              \\\n> +\tx++;\n> +\n>  /*\n>   * BGB\n>   * GRG\n> @@ -109,6 +130,13 @@ DebayerCpu::~DebayerCpu()\n>  \t*dst++ = red_[curr[x] / (div)];                                                        \\\n>  \tx++;\n>  \n> +#define RGGB_XBGR8888(p, n, div)                                                               \\\n> +\t*dst++ = blue_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> +\t*dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];        \\\n> +\t*dst++ = red_[curr[x] / (div)];                                                        \\\n> +\t*dst++ = 255;                                                                          \\\n> +\tx++;\n> +\n>  void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tDECLARE_SRC_POINTERS(uint8_t)\n> @@ -119,6 +147,16 @@ void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tDECLARE_SRC_POINTERS(uint8_t)\n> +\n> +\tfor (int x = 0; x < (int)window_.width;) {\n> +\t\tBGGR_XBGR8888(1, 1, 1)\n> +\t\tGBRG_XBGR8888(1, 1, 1)\n> +\t}\n> +}\n> +\n\n... and then X and non-X versions of these methods could be merged too.\nUsing a template for the purpose should have no performance impact, I\nsuppose.\n\n>  void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tDECLARE_SRC_POINTERS(uint8_t)\n> @@ -129,6 +167,16 @@ void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tDECLARE_SRC_POINTERS(uint8_t)\n> +\n> +\tfor (int x = 0; x < (int)window_.width;) {\n> +\t\tGRBG_XBGR8888(1, 1, 1)\n> +\t\tRGGB_XBGR8888(1, 1, 1)\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tDECLARE_SRC_POINTERS(uint16_t)\n> @@ -140,6 +188,17 @@ void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tDECLARE_SRC_POINTERS(uint16_t)\n> +\n> +\tfor (int x = 0; x < (int)window_.width;) {\n> +\t\t/* divide values by 4 for 10 -> 8 bpp value */\n> +\t\tBGGR_XBGR8888(1, 1, 4)\n> +\t\tGBRG_XBGR8888(1, 1, 4)\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tDECLARE_SRC_POINTERS(uint16_t)\n> @@ -151,6 +210,17 @@ void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tDECLARE_SRC_POINTERS(uint16_t)\n> +\n> +\tfor (int x = 0; x < (int)window_.width;) {\n> +\t\t/* divide values by 4 for 10 -> 8 bpp value */\n> +\t\tGRBG_XBGR8888(1, 1, 4)\n> +\t\tRGGB_XBGR8888(1, 1, 4)\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tDECLARE_SRC_POINTERS(uint16_t)\n> @@ -162,6 +232,17 @@ void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tDECLARE_SRC_POINTERS(uint16_t)\n> +\n> +\tfor (int x = 0; x < (int)window_.width;) {\n> +\t\t/* divide values by 16 for 12 -> 8 bpp value */\n> +\t\tBGGR_XBGR8888(1, 1, 16)\n> +\t\tGBRG_XBGR8888(1, 1, 16)\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tDECLARE_SRC_POINTERS(uint16_t)\n> @@ -173,6 +254,17 @@ void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tDECLARE_SRC_POINTERS(uint16_t)\n> +\n> +\tfor (int x = 0; x < (int)window_.width;) {\n> +\t\t/* divide values by 16 for 12 -> 8 bpp value */\n> +\t\tGRBG_XBGR8888(1, 1, 16)\n> +\t\tRGGB_XBGR8888(1, 1, 16)\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tconst int widthInBytes = window_.width * 5 / 4;\n> @@ -198,6 +290,31 @@ void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tconst int widthInBytes = window_.width * 5 / 4;\n> +\tconst uint8_t *prev = src[0];\n> +\tconst uint8_t *curr = src[1];\n> +\tconst uint8_t *next = src[2];\n> +\n> +\t/*\n> +\t * For the first pixel getting a pixel from the previous column uses\n> +\t * x - 2 to skip the 5th byte with least-significant bits for 4 pixels.\n> +\t * Same for last pixel (uses x + 2) and looking at the next column.\n> +\t */\n> +\tfor (int x = 0; x < widthInBytes;) {\n> +\t\t/* First pixel */\n> +\t\tBGGR_XBGR8888(2, 1, 1)\n> +\t\t/* Second pixel BGGR -> GBRG */\n> +\t\tGBRG_XBGR8888(1, 1, 1)\n> +\t\t/* Same thing for third and fourth pixels */\n> +\t\tBGGR_XBGR8888(1, 1, 1)\n> +\t\tGBRG_XBGR8888(1, 2, 1)\n> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +\t\tx++;\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tconst int widthInBytes = window_.width * 5 / 4;\n> @@ -218,6 +335,26 @@ void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tconst int widthInBytes = window_.width * 5 / 4;\n> +\tconst uint8_t *prev = src[0];\n> +\tconst uint8_t *curr = src[1];\n> +\tconst uint8_t *next = src[2];\n> +\n> +\tfor (int x = 0; x < widthInBytes;) {\n> +\t\t/* First pixel */\n> +\t\tGRBG_XBGR8888(2, 1, 1)\n> +\t\t/* Second pixel GRBG -> RGGB */\n> +\t\tRGGB_XBGR8888(1, 1, 1)\n> +\t\t/* Same thing for third and fourth pixels */\n> +\t\tGRBG_XBGR8888(1, 1, 1)\n> +\t\tRGGB_XBGR8888(1, 2, 1)\n> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +\t\tx++;\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tconst int widthInBytes = window_.width * 5 / 4;\n> @@ -238,6 +375,26 @@ void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tconst int widthInBytes = window_.width * 5 / 4;\n> +\tconst uint8_t *prev = src[0];\n> +\tconst uint8_t *curr = src[1];\n> +\tconst uint8_t *next = src[2];\n> +\n> +\tfor (int x = 0; x < widthInBytes;) {\n> +\t\t/* Even pixel */\n> +\t\tGBRG_XBGR8888(2, 1, 1)\n> +\t\t/* Odd pixel GBGR -> BGGR */\n> +\t\tBGGR_XBGR8888(1, 1, 1)\n> +\t\t/* Same thing for next 2 pixels */\n> +\t\tGBRG_XBGR8888(1, 1, 1)\n> +\t\tBGGR_XBGR8888(1, 2, 1)\n> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +\t\tx++;\n> +\t}\n> +}\n> +\n>  void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>  \tconst int widthInBytes = window_.width * 5 / 4;\n> @@ -258,6 +415,26 @@ void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  \t}\n>  }\n>  \n> +void DebayerCpu::debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +\tconst int widthInBytes = window_.width * 5 / 4;\n> +\tconst uint8_t *prev = src[0];\n> +\tconst uint8_t *curr = src[1];\n> +\tconst uint8_t *next = src[2];\n> +\n> +\tfor (int x = 0; x < widthInBytes;) {\n> +\t\t/* Even pixel */\n> +\t\tRGGB_XBGR8888(2, 1, 1)\n> +\t\t/* Odd pixel RGGB -> GRBG */\n> +\t\tGRBG_XBGR8888(1, 1, 1)\n> +\t\t/* Same thing for next 2 pixels */\n> +\t\tRGGB_XBGR8888(1, 1, 1)\n> +\t\tGRBG_XBGR8888(1, 2, 1)\n> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +\t\tx++;\n> +\t}\n> +}\n> +\n>  static bool isStandardBayerOrder(BayerFormat::Order order)\n>  {\n>  \treturn order == BayerFormat::BGGR || order == BayerFormat::GBRG ||\n> @@ -280,7 +457,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n>  \t\tconfig.bpp = (bayerFormat.bitDepth + 7) & ~7;\n>  \t\tconfig.patternSize.width = 2;\n>  \t\tconfig.patternSize.height = 2;\n> -\t\tconfig.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n> +\t\tconfig.outputFormats = std::vector<PixelFormat>({\n> +\t\t\tformats::RGB888,\n> +\t\t\tformats::XRGB8888,\n> +\t\t\tformats::ARGB8888,\n> +\t\t\tformats::BGR888,\n> +\t\t\tformats::XBGR8888,\n> +\t\t\tformats::ABGR8888\n> +\t\t});\n>  \t\treturn 0;\n>  \t}\n>  \n> @@ -290,7 +474,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n>  \t\tconfig.bpp = 10;\n>  \t\tconfig.patternSize.width = 4; /* 5 bytes per *4* pixels */\n>  \t\tconfig.patternSize.height = 2;\n> -\t\tconfig.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n> +\t\tconfig.outputFormats = std::vector<PixelFormat>({\n> +\t\t\tformats::RGB888,\n> +\t\t\tformats::XRGB8888,\n> +\t\t\tformats::ARGB8888,\n> +\t\t\tformats::BGR888,\n> +\t\t\tformats::XBGR8888,\n> +\t\t\tformats::ABGR8888\n> +\t\t});\n>  \t\treturn 0;\n>  \t}\n>  \n> @@ -306,6 +497,12 @@ int DebayerCpu::getOutputConfig(PixelFormat outputFormat, DebayerOutputConfig &c\n>  \t\treturn 0;\n>  \t}\n>  \n> +\tif (outputFormat == formats::XRGB8888 || outputFormat == formats::ARGB8888 ||\n> +\t    outputFormat == formats::XBGR8888 || outputFormat == formats::ABGR8888) {\n> +\t\tconfig.bpp = 32;\n> +\t\treturn 0;\n> +\t}\n> +\n>  \tLOG(Debayer, Info)\n>  \t\t<< \"Unsupported output format \" << outputFormat.toString();\n>  \treturn -EINVAL;\n> @@ -341,6 +538,7 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>  {\n>  \tBayerFormat bayerFormat =\n>  \t\tBayerFormat::fromPixelFormat(inputFormat);\n> +\tbool is_aligned = false;\n\ncamelCase should be used for variable names.\n\n>  \txShift_ = 0;\n>  \tswapRedBlueGains_ = false;\n> @@ -351,8 +549,16 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>  \t};\n>  \n>  \tswitch (outputFormat) {\n> +\tcase formats::XRGB8888:\n> +\tcase formats::ARGB8888:\n> +\t  is_aligned = true;\n> +\t  [[fallthrough]];\n>  \tcase formats::RGB888:\n>  \t\tbreak;\n> +\tcase formats::XBGR8888:\n> +\tcase formats::ABGR8888:\n> +\t  is_aligned = true;\n> +\t  [[fallthrough]];\n>  \tcase formats::BGR888:\n>  \t\t/* Swap R and B in bayer order to generate BGR888 instead of RGB888 */\n>  \t\tswapRedBlueGains_ = true;\n> @@ -383,16 +589,19 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>  \t    isStandardBayerOrder(bayerFormat.order)) {\n>  \t\tswitch (bayerFormat.bitDepth) {\n>  \t\tcase 8:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer8_BGBG_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer8_GRGR_BGR888;\n> +\t\t  LOG(Debayer, Warning) << \"8bit no packing\";\n\nIs this a debugging leftover or do you really mean to log something\nhere?  If the latter, it should probably be Debug rather than Warning\nand the message should be improved.  The same applies to the other LOGs\nbelow. \n\n> +\t\t  debayer0_ = is_aligned ? &DebayerCpu::debayer8_BGBG_XBGR8888 : &DebayerCpu::debayer8_BGBG_BGR888;\n\nIf the two methods were unified and extended with isAligned argument,\nI'm not sure what would be the best way to pass the argument to them.\nlambda, a template parameter or anything else?\n\n> +\t\t  debayer1_ = is_aligned ? &DebayerCpu::debayer8_GRGR_XBGR8888 : &DebayerCpu::debayer8_GRGR_BGR888;\n>  \t\t\tbreak;\n>  \t\tcase 10:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer10_BGBG_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer10_GRGR_BGR888;\n> +\t\t  LOG(Debayer, Warning) << \"10bit no packing\";\n> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10_BGBG_XBGR8888 : &DebayerCpu::debayer10_BGBG_BGR888;\n> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10_GRGR_XBGR8888 : &DebayerCpu::debayer10_GRGR_BGR888;\n>  \t\t\tbreak;\n>  \t\tcase 12:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer12_BGBG_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer12_GRGR_BGR888;\n> +\t\t  LOG(Debayer, Warning) << \"12bit no packing\";\n> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer12_BGBG_XBGR8888 : &DebayerCpu::debayer12_BGBG_BGR888;\n> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer12_GRGR_XBGR8888 : &DebayerCpu::debayer12_GRGR_BGR888;\n>  \t\t\tbreak;\n>  \t\t}\n>  \t\tsetupStandardBayerOrder(bayerFormat.order);\n> @@ -401,22 +610,23 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>  \n>  \tif (bayerFormat.bitDepth == 10 &&\n>  \t    bayerFormat.packing == BayerFormat::Packing::CSI2) {\n> +\t  LOG(Debayer, Warning) << \"10bit csi2\";\n>  \t\tswitch (bayerFormat.order) {\n>  \t\tcase BayerFormat::BGGR:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n>  \t\t\treturn 0;\n>  \t\tcase BayerFormat::GBRG:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n>  \t\t\treturn 0;\n>  \t\tcase BayerFormat::GRBG:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n>  \t\t\treturn 0;\n>  \t\tcase BayerFormat::RGGB:\n> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n>  \t\t\treturn 0;\n>  \t\tdefault:\n>  \t\t\tbreak;\n> @@ -533,6 +743,8 @@ DebayerCpu::strideAndFrameSize(const PixelFormat &outputFormat, const Size &size\n>  \t/* round up to multiple of 8 for 64 bits alignment */\n>  \tunsigned int stride = (size.width * config.bpp / 8 + 7) & ~7;\n>  \n> +\tLOG(Debayer, Warning) << outputFormat.toString() << \" \" << size.width << \" \" << size.height << \" \" << config.bpp << \" \" << stride << \" \" << stride * size.height;\n> +\n>  \treturn std::make_tuple(stride, stride * size.height);\n>  }\n>  \n> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h\n> index be7dcdca..c30f44aa 100644\n> --- a/src/libcamera/software_isp/debayer_cpu.h\n> +++ b/src/libcamera/software_isp/debayer_cpu.h\n> @@ -86,18 +86,28 @@ private:\n>  \n>  \t/* 8-bit raw bayer format */\n>  \tvoid debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \tvoid debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \t/* unpacked 10-bit raw bayer format */\n>  \tvoid debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \tvoid debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \t/* unpacked 12-bit raw bayer format */\n>  \tvoid debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \tvoid debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \t/* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */\n>  \tvoid debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \tvoid debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \tvoid debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \tvoid debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +\tvoid debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \n>  \tstruct DebayerInputConfig {\n>  \t\tSize patternSize;","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id B0F7ABD87C\n\tfor <parsemail@patchwork.libcamera.org>;\n\tTue, 11 Jun 2024 14:29:20 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 9AD3D65466;\n\tTue, 11 Jun 2024 16:29:19 +0200 (CEST)","from us-smtp-delivery-124.mimecast.com\n\t(us-smtp-delivery-124.mimecast.com [170.10.133.124])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id CC62B65455\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tTue, 11 Jun 2024 16:29:17 +0200 (CEST)","from mail-wm1-f70.google.com (mail-wm1-f70.google.com\n\t[209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS\n\t(version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n\tus-mta-668-lgSBa7lnNmuHxcLcCJYI6Q-1; Tue, 11 Jun 2024 10:29:14 -0400","by mail-wm1-f70.google.com with SMTP id\n\t5b1f17b1804b1-4212a20c447so6658035e9.3\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tTue, 11 Jun 2024 07:29:14 -0700 (PDT)","from nuthatch (nat-pool-brq-t.redhat.com. [213.175.37.10])\n\tby smtp.gmail.com with ESMTPSA id\n\tffacd0b85a97d-35f10c62f0esm9701000f8f.25.2024.06.11.07.29.11\n\t(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n\tTue, 11 Jun 2024 07:29:11 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=redhat.com header.i=@redhat.com\n\theader.b=\"V0VtMMMq\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n\ts=mimecast20190719; t=1718116156;\n\th=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n\tto:to:cc:cc:mime-version:mime-version:content-type:content-type:\n\tin-reply-to:in-reply-to:references:references;\n\tbh=chPQYQ/9tS32c/hGyD9h9a2/UT7wnRIm5bty9x47JGA=;\n\tb=V0VtMMMq6/4KbWQgcs0BpG//6eTqgmpwapNkqV3s3fKu4O3SUxNNrhLEr9jW3zHXaNpn4X\n\tQb97hDdiyybWEtxlbCFQI+zgpDZL4j0aImz5NwaOMVYWnt+KLrUyCZOnR/y4q4+i51mvII\n\tZbGXod1aZgttj16gIrfGnyA4/jsUeG8=","X-MC-Unique":"lgSBa7lnNmuHxcLcCJYI6Q-1","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20230601; t=1718116153; x=1718720953;\n\th=mime-version:user-agent:message-id:date:references:in-reply-to\n\t:subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date\n\t:message-id:reply-to;\n\tbh=chPQYQ/9tS32c/hGyD9h9a2/UT7wnRIm5bty9x47JGA=;\n\tb=AXpc6PraRKZMGcwKI8Rde2GArgmvDtXIGhr53C+FyTUz6UnkawAr9PYVLgIdRbVrWK\n\tSqBZALvcZHy7PgqfQpsE3W3Xkk4kH/dEnev7YqwazTe/f7BYQhWTx5qQ7TCZaBasJN9n\n\tEDIEA1FGeen63gepc/ntxF0jegVogd2vhSN7gzH+qLMnB5oK34SSfn0TMfqpL2+5vLIT\n\tV6ua5Ymg57lao0eyhn1248wnDkkZTb3EuAmhxBjqacyf67K2bSIpi7gXzl0VXkOTmL1W\n\tNeII46iE2edrurr9uMbbD/MdxOpphPhVs2NOaLeV3qyBUi8rgZJg+KAPRF0WYIivJ5g5\n\tnDxA==","X-Gm-Message-State":"AOJu0YyiSjesF90mOKsofogb6nncfWpefSQQEB2jJjNbOuty7geX5s9L\n\tseevK7E5dPjH2c17vvOjWH/FYBoE0VP8b6Wl3X5Nq6qxl6heJr8iME7uDG5Sm1HQcEqPyeThlmp\n\t39KMp5Dlnoo/owRtsvBYk1lRdqLjQn9mnGOiobZimDRGspH/Vy/mfghsJVOMPNRxx+OY8mYiJLh\n\tSG0AABN38/X6FbWHkX1ZQMujIAaSBSXSJLl5NUqWejtGb9klte9GyZTCk=","X-Received":["by 2002:a05:600c:1909:b0:421:7ee4:bbeb with SMTP id\n\t5b1f17b1804b1-4217ee4be43mr64812765e9.0.1718116152676; \n\tTue, 11 Jun 2024 07:29:12 -0700 (PDT)","by 2002:a05:600c:1909:b0:421:7ee4:bbeb with SMTP id\n\t5b1f17b1804b1-4217ee4be43mr64812445e9.0.1718116152045; \n\tTue, 11 Jun 2024 07:29:12 -0700 (PDT)"],"X-Google-Smtp-Source":"AGHT+IF6HibPK7KPMlJeUo38UQkdohIpha0Jb5+fYiR5tbzJLX5X/aDhUoSDqEvh6X+R3SYSEcihOA==","From":"Milan Zamazal <mzamazal@redhat.com>","To":"Robert Mader <robert.mader@collabora.com>","Cc":"libcamera-devel@lists.libcamera.org","Subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","In-Reply-To":"<20240611110721.10690-1-robert.mader@collabora.com> (Robert\n\tMader's message of \"Tue, 11 Jun 2024 13:07:01 +0200\")","References":"<20240611110721.10690-1-robert.mader@collabora.com>","Date":"Tue, 11 Jun 2024 16:29:10 +0200","Message-ID":"<878qzbeeo9.fsf@redhat.com>","User-Agent":"Gnus/5.13 (Gnus v5.13)","MIME-Version":"1.0","X-Mimecast-Spam-Score":"0","X-Mimecast-Originator":"redhat.com","Content-Type":"text/plain","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":29838,"web_url":"https://patchwork.libcamera.org/comment/29838/","msgid":"<20240611143108.GB10397@pendragon.ideasonboard.com>","date":"2024-06-11T14:31:08","subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"On Tue, Jun 11, 2024 at 04:29:10PM +0200, Milan Zamazal wrote:\n> Hi Robert,\n> \n> thank you for the patch.\n> \n> Robert Mader <robert.mader@collabora.com> writes:\n> \n> > In order to be more compatible with modern hardware and APIs. This\n> > notably allows GL implementations to directly import the buffers more\n> > often and seems to be required for Wayland.\n> >\n> > Further more, as we already enforce a 8 byte stride, these formats work\n> > better for clients that don't support padding - such as libwebrtc at the\n> > time of writing.\n> >\n> > Tested on the Librem5 and PinePhone.\n> >\n> > Signed-off-by: Robert Mader <robert.mader@collabora.com>\n> > ---\n> >  src/libcamera/software_isp/debayer_cpu.cpp | 244 +++++++++++++++++++--\n> >  src/libcamera/software_isp/debayer_cpu.h   |  10 +\n> >  2 files changed, 238 insertions(+), 16 deletions(-)\n> >\n> > diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp\n> > index c038eed4..73c66a88 100644\n> > --- a/src/libcamera/software_isp/debayer_cpu.cpp\n> > +++ b/src/libcamera/software_isp/debayer_cpu.cpp\n> > @@ -76,6 +76,13 @@ DebayerCpu::~DebayerCpu()\n> >  \t*dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> >  \tx++;\n> >  \n> > +#define BGGR_XBGR8888(p, n, div)                                                              \\\n> > +\t*dst++ = blue_[curr[x] / (div)];                                                      \\\n> > +\t*dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];       \\\n> > +\t*dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> > +\t*dst++ = 255;                                                                         \\\n> > +\tx++;\n> > +\n> \n> The level of code duplication here starts to exceed reasonable limits.\n> Maybe adding an argument to the macro deciding whether to append the\n> last `dst' assignment or not would have a significant performance\n> impact.  But even then I guess there must be a way to let the compiler\n> generate the duplicate code (an inline function template?) rather than\n> doing it manually.  What do the C++ experts around think?\n\nC++ templates are the C++ replacement of the C code-generation macros.\nThis should at least be considered as an option.\n\n> If nothing better is possible then `*dst++ = 255' can be applied in the\n> callers rather than defining the alternative macro versions.\n> \n> >  /*\n> >   * GBG\n> >   * RGR\n> > @@ -87,6 +94,13 @@ DebayerCpu::~DebayerCpu()\n> >  \t*dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> >  \tx++;\n> >  \n> > +#define GRBG_XBGR8888(p, n, div)                                  \\\n> > +\t*dst++ = blue_[(prev[x] + next[x]) / (2 * (div))];        \\\n> > +\t*dst++ = green_[curr[x] / (div)];                         \\\n> > +\t*dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> > +\t*dst++ = 255;                                             \\\n> > +\tx++;\n> > +\n> >  /*\n> >   * GRG\n> >   * BGB\n> > @@ -98,6 +112,13 @@ DebayerCpu::~DebayerCpu()\n> >  \t*dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n> >  \tx++;\n> >  \n> > +#define GBRG_XBGR8888(p, n, div)                                   \\\n> > +\t*dst++ = blue_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> > +\t*dst++ = green_[curr[x] / (div)];                          \\\n> > +\t*dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n> > +\t*dst++ = 255;                                              \\\n> > +\tx++;\n> > +\n> >  /*\n> >   * BGB\n> >   * GRG\n> > @@ -109,6 +130,13 @@ DebayerCpu::~DebayerCpu()\n> >  \t*dst++ = red_[curr[x] / (div)];                                                        \\\n> >  \tx++;\n> >  \n> > +#define RGGB_XBGR8888(p, n, div)                                                               \\\n> > +\t*dst++ = blue_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> > +\t*dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];        \\\n> > +\t*dst++ = red_[curr[x] / (div)];                                                        \\\n> > +\t*dst++ = 255;                                                                          \\\n> > +\tx++;\n> > +\n> >  void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tDECLARE_SRC_POINTERS(uint8_t)\n> > @@ -119,6 +147,16 @@ void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tDECLARE_SRC_POINTERS(uint8_t)\n> > +\n> > +\tfor (int x = 0; x < (int)window_.width;) {\n> > +\t\tBGGR_XBGR8888(1, 1, 1)\n> > +\t\tGBRG_XBGR8888(1, 1, 1)\n> > +\t}\n> > +}\n> > +\n> \n> ... and then X and non-X versions of these methods could be merged too.\n> Using a template for the purpose should have no performance impact, I\n> suppose.\n> \n> >  void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tDECLARE_SRC_POINTERS(uint8_t)\n> > @@ -129,6 +167,16 @@ void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tDECLARE_SRC_POINTERS(uint8_t)\n> > +\n> > +\tfor (int x = 0; x < (int)window_.width;) {\n> > +\t\tGRBG_XBGR8888(1, 1, 1)\n> > +\t\tRGGB_XBGR8888(1, 1, 1)\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tDECLARE_SRC_POINTERS(uint16_t)\n> > @@ -140,6 +188,17 @@ void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tDECLARE_SRC_POINTERS(uint16_t)\n> > +\n> > +\tfor (int x = 0; x < (int)window_.width;) {\n> > +\t\t/* divide values by 4 for 10 -> 8 bpp value */\n> > +\t\tBGGR_XBGR8888(1, 1, 4)\n> > +\t\tGBRG_XBGR8888(1, 1, 4)\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tDECLARE_SRC_POINTERS(uint16_t)\n> > @@ -151,6 +210,17 @@ void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tDECLARE_SRC_POINTERS(uint16_t)\n> > +\n> > +\tfor (int x = 0; x < (int)window_.width;) {\n> > +\t\t/* divide values by 4 for 10 -> 8 bpp value */\n> > +\t\tGRBG_XBGR8888(1, 1, 4)\n> > +\t\tRGGB_XBGR8888(1, 1, 4)\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tDECLARE_SRC_POINTERS(uint16_t)\n> > @@ -162,6 +232,17 @@ void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tDECLARE_SRC_POINTERS(uint16_t)\n> > +\n> > +\tfor (int x = 0; x < (int)window_.width;) {\n> > +\t\t/* divide values by 16 for 12 -> 8 bpp value */\n> > +\t\tBGGR_XBGR8888(1, 1, 16)\n> > +\t\tGBRG_XBGR8888(1, 1, 16)\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tDECLARE_SRC_POINTERS(uint16_t)\n> > @@ -173,6 +254,17 @@ void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tDECLARE_SRC_POINTERS(uint16_t)\n> > +\n> > +\tfor (int x = 0; x < (int)window_.width;) {\n> > +\t\t/* divide values by 16 for 12 -> 8 bpp value */\n> > +\t\tGRBG_XBGR8888(1, 1, 16)\n> > +\t\tRGGB_XBGR8888(1, 1, 16)\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tconst int widthInBytes = window_.width * 5 / 4;\n> > @@ -198,6 +290,31 @@ void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tconst int widthInBytes = window_.width * 5 / 4;\n> > +\tconst uint8_t *prev = src[0];\n> > +\tconst uint8_t *curr = src[1];\n> > +\tconst uint8_t *next = src[2];\n> > +\n> > +\t/*\n> > +\t * For the first pixel getting a pixel from the previous column uses\n> > +\t * x - 2 to skip the 5th byte with least-significant bits for 4 pixels.\n> > +\t * Same for last pixel (uses x + 2) and looking at the next column.\n> > +\t */\n> > +\tfor (int x = 0; x < widthInBytes;) {\n> > +\t\t/* First pixel */\n> > +\t\tBGGR_XBGR8888(2, 1, 1)\n> > +\t\t/* Second pixel BGGR -> GBRG */\n> > +\t\tGBRG_XBGR8888(1, 1, 1)\n> > +\t\t/* Same thing for third and fourth pixels */\n> > +\t\tBGGR_XBGR8888(1, 1, 1)\n> > +\t\tGBRG_XBGR8888(1, 2, 1)\n> > +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> > +\t\tx++;\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tconst int widthInBytes = window_.width * 5 / 4;\n> > @@ -218,6 +335,26 @@ void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tconst int widthInBytes = window_.width * 5 / 4;\n> > +\tconst uint8_t *prev = src[0];\n> > +\tconst uint8_t *curr = src[1];\n> > +\tconst uint8_t *next = src[2];\n> > +\n> > +\tfor (int x = 0; x < widthInBytes;) {\n> > +\t\t/* First pixel */\n> > +\t\tGRBG_XBGR8888(2, 1, 1)\n> > +\t\t/* Second pixel GRBG -> RGGB */\n> > +\t\tRGGB_XBGR8888(1, 1, 1)\n> > +\t\t/* Same thing for third and fourth pixels */\n> > +\t\tGRBG_XBGR8888(1, 1, 1)\n> > +\t\tRGGB_XBGR8888(1, 2, 1)\n> > +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> > +\t\tx++;\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tconst int widthInBytes = window_.width * 5 / 4;\n> > @@ -238,6 +375,26 @@ void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tconst int widthInBytes = window_.width * 5 / 4;\n> > +\tconst uint8_t *prev = src[0];\n> > +\tconst uint8_t *curr = src[1];\n> > +\tconst uint8_t *next = src[2];\n> > +\n> > +\tfor (int x = 0; x < widthInBytes;) {\n> > +\t\t/* Even pixel */\n> > +\t\tGBRG_XBGR8888(2, 1, 1)\n> > +\t\t/* Odd pixel GBGR -> BGGR */\n> > +\t\tBGGR_XBGR8888(1, 1, 1)\n> > +\t\t/* Same thing for next 2 pixels */\n> > +\t\tGBRG_XBGR8888(1, 1, 1)\n> > +\t\tBGGR_XBGR8888(1, 2, 1)\n> > +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> > +\t\tx++;\n> > +\t}\n> > +}\n> > +\n> >  void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  {\n> >  \tconst int widthInBytes = window_.width * 5 / 4;\n> > @@ -258,6 +415,26 @@ void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n> >  \t}\n> >  }\n> >  \n> > +void DebayerCpu::debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> > +{\n> > +\tconst int widthInBytes = window_.width * 5 / 4;\n> > +\tconst uint8_t *prev = src[0];\n> > +\tconst uint8_t *curr = src[1];\n> > +\tconst uint8_t *next = src[2];\n> > +\n> > +\tfor (int x = 0; x < widthInBytes;) {\n> > +\t\t/* Even pixel */\n> > +\t\tRGGB_XBGR8888(2, 1, 1)\n> > +\t\t/* Odd pixel RGGB -> GRBG */\n> > +\t\tGRBG_XBGR8888(1, 1, 1)\n> > +\t\t/* Same thing for next 2 pixels */\n> > +\t\tRGGB_XBGR8888(1, 1, 1)\n> > +\t\tGRBG_XBGR8888(1, 2, 1)\n> > +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n> > +\t\tx++;\n> > +\t}\n> > +}\n> > +\n> >  static bool isStandardBayerOrder(BayerFormat::Order order)\n> >  {\n> >  \treturn order == BayerFormat::BGGR || order == BayerFormat::GBRG ||\n> > @@ -280,7 +457,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n> >  \t\tconfig.bpp = (bayerFormat.bitDepth + 7) & ~7;\n> >  \t\tconfig.patternSize.width = 2;\n> >  \t\tconfig.patternSize.height = 2;\n> > -\t\tconfig.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n> > +\t\tconfig.outputFormats = std::vector<PixelFormat>({\n> > +\t\t\tformats::RGB888,\n> > +\t\t\tformats::XRGB8888,\n> > +\t\t\tformats::ARGB8888,\n> > +\t\t\tformats::BGR888,\n> > +\t\t\tformats::XBGR8888,\n> > +\t\t\tformats::ABGR8888\n> > +\t\t});\n> >  \t\treturn 0;\n> >  \t}\n> >  \n> > @@ -290,7 +474,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n> >  \t\tconfig.bpp = 10;\n> >  \t\tconfig.patternSize.width = 4; /* 5 bytes per *4* pixels */\n> >  \t\tconfig.patternSize.height = 2;\n> > -\t\tconfig.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n> > +\t\tconfig.outputFormats = std::vector<PixelFormat>({\n> > +\t\t\tformats::RGB888,\n> > +\t\t\tformats::XRGB8888,\n> > +\t\t\tformats::ARGB8888,\n> > +\t\t\tformats::BGR888,\n> > +\t\t\tformats::XBGR8888,\n> > +\t\t\tformats::ABGR8888\n> > +\t\t});\n> >  \t\treturn 0;\n> >  \t}\n> >  \n> > @@ -306,6 +497,12 @@ int DebayerCpu::getOutputConfig(PixelFormat outputFormat, DebayerOutputConfig &c\n> >  \t\treturn 0;\n> >  \t}\n> >  \n> > +\tif (outputFormat == formats::XRGB8888 || outputFormat == formats::ARGB8888 ||\n> > +\t    outputFormat == formats::XBGR8888 || outputFormat == formats::ABGR8888) {\n> > +\t\tconfig.bpp = 32;\n> > +\t\treturn 0;\n> > +\t}\n> > +\n> >  \tLOG(Debayer, Info)\n> >  \t\t<< \"Unsupported output format \" << outputFormat.toString();\n> >  \treturn -EINVAL;\n> > @@ -341,6 +538,7 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n> >  {\n> >  \tBayerFormat bayerFormat =\n> >  \t\tBayerFormat::fromPixelFormat(inputFormat);\n> > +\tbool is_aligned = false;\n> \n> camelCase should be used for variable names.\n> \n> >  \txShift_ = 0;\n> >  \tswapRedBlueGains_ = false;\n> > @@ -351,8 +549,16 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n> >  \t};\n> >  \n> >  \tswitch (outputFormat) {\n> > +\tcase formats::XRGB8888:\n> > +\tcase formats::ARGB8888:\n> > +\t  is_aligned = true;\n> > +\t  [[fallthrough]];\n> >  \tcase formats::RGB888:\n> >  \t\tbreak;\n> > +\tcase formats::XBGR8888:\n> > +\tcase formats::ABGR8888:\n> > +\t  is_aligned = true;\n> > +\t  [[fallthrough]];\n> >  \tcase formats::BGR888:\n> >  \t\t/* Swap R and B in bayer order to generate BGR888 instead of RGB888 */\n> >  \t\tswapRedBlueGains_ = true;\n> > @@ -383,16 +589,19 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n> >  \t    isStandardBayerOrder(bayerFormat.order)) {\n> >  \t\tswitch (bayerFormat.bitDepth) {\n> >  \t\tcase 8:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer8_BGBG_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer8_GRGR_BGR888;\n> > +\t\t  LOG(Debayer, Warning) << \"8bit no packing\";\n> \n> Is this a debugging leftover or do you really mean to log something\n> here?  If the latter, it should probably be Debug rather than Warning\n> and the message should be improved.  The same applies to the other LOGs\n> below. \n> \n> > +\t\t  debayer0_ = is_aligned ? &DebayerCpu::debayer8_BGBG_XBGR8888 : &DebayerCpu::debayer8_BGBG_BGR888;\n> \n> If the two methods were unified and extended with isAligned argument,\n> I'm not sure what would be the best way to pass the argument to them.\n> lambda, a template parameter or anything else?\n> \n> > +\t\t  debayer1_ = is_aligned ? &DebayerCpu::debayer8_GRGR_XBGR8888 : &DebayerCpu::debayer8_GRGR_BGR888;\n> >  \t\t\tbreak;\n> >  \t\tcase 10:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer10_BGBG_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer10_GRGR_BGR888;\n> > +\t\t  LOG(Debayer, Warning) << \"10bit no packing\";\n> > +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10_BGBG_XBGR8888 : &DebayerCpu::debayer10_BGBG_BGR888;\n> > +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10_GRGR_XBGR8888 : &DebayerCpu::debayer10_GRGR_BGR888;\n> >  \t\t\tbreak;\n> >  \t\tcase 12:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer12_BGBG_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer12_GRGR_BGR888;\n> > +\t\t  LOG(Debayer, Warning) << \"12bit no packing\";\n> > +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer12_BGBG_XBGR8888 : &DebayerCpu::debayer12_BGBG_BGR888;\n> > +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer12_GRGR_XBGR8888 : &DebayerCpu::debayer12_GRGR_BGR888;\n> >  \t\t\tbreak;\n> >  \t\t}\n> >  \t\tsetupStandardBayerOrder(bayerFormat.order);\n> > @@ -401,22 +610,23 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n> >  \n> >  \tif (bayerFormat.bitDepth == 10 &&\n> >  \t    bayerFormat.packing == BayerFormat::Packing::CSI2) {\n> > +\t  LOG(Debayer, Warning) << \"10bit csi2\";\n> >  \t\tswitch (bayerFormat.order) {\n> >  \t\tcase BayerFormat::BGGR:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n> > +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n> > +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n> >  \t\t\treturn 0;\n> >  \t\tcase BayerFormat::GBRG:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n> > +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n> > +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n> >  \t\t\treturn 0;\n> >  \t\tcase BayerFormat::GRBG:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n> > +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n> > +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n> >  \t\t\treturn 0;\n> >  \t\tcase BayerFormat::RGGB:\n> > -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n> > -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n> > +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n> > +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n> >  \t\t\treturn 0;\n> >  \t\tdefault:\n> >  \t\t\tbreak;\n> > @@ -533,6 +743,8 @@ DebayerCpu::strideAndFrameSize(const PixelFormat &outputFormat, const Size &size\n> >  \t/* round up to multiple of 8 for 64 bits alignment */\n> >  \tunsigned int stride = (size.width * config.bpp / 8 + 7) & ~7;\n> >  \n> > +\tLOG(Debayer, Warning) << outputFormat.toString() << \" \" << size.width << \" \" << size.height << \" \" << config.bpp << \" \" << stride << \" \" << stride * size.height;\n> > +\n> >  \treturn std::make_tuple(stride, stride * size.height);\n> >  }\n> >  \n> > diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h\n> > index be7dcdca..c30f44aa 100644\n> > --- a/src/libcamera/software_isp/debayer_cpu.h\n> > +++ b/src/libcamera/software_isp/debayer_cpu.h\n> > @@ -86,18 +86,28 @@ private:\n> >  \n> >  \t/* 8-bit raw bayer format */\n> >  \tvoid debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \tvoid debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \t/* unpacked 10-bit raw bayer format */\n> >  \tvoid debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \tvoid debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \t/* unpacked 12-bit raw bayer format */\n> >  \tvoid debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \tvoid debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \t/* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */\n> >  \tvoid debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \tvoid debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \tvoid debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \tvoid debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> > +\tvoid debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n> >  \n> >  \tstruct DebayerInputConfig {\n> >  \t\tSize patternSize;","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id C0815BD87C\n\tfor <parsemail@patchwork.libcamera.org>;\n\tTue, 11 Jun 2024 14:31:30 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id E2F3A65466;\n\tTue, 11 Jun 2024 16:31:29 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[213.167.242.64])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 6AE1865455\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tTue, 11 Jun 2024 16:31:28 +0200 (CEST)","from pendragon.ideasonboard.com (81-175-209-231.bb.dnainternet.fi\n\t[81.175.209.231])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 7217429A;\n\tTue, 11 Jun 2024 16:31:15 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"mi1AcYwK\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1718116275;\n\tbh=8hvkoZXtHMEzFceJTF+lEHiNnkaQnXgEtIACFkWOh7o=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=mi1AcYwK7DPlxpxPmuzD/v/nP3I/rCnq8YyfS6uHWb8bGmIWFbEFNSIBfux94EWme\n\tVLTjas0t4omLXygF1OCnrpyL24crLf7B0buEjipi4tvEVa0pkjl7JXLiHiP0VzJ80a\n\tdVL5/WhG+X3E4VN6nPHXghOn0PSHYsRfyBdX816M=","Date":"Tue, 11 Jun 2024 17:31:08 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Milan Zamazal <mzamazal@redhat.com>","Cc":"Robert Mader <robert.mader@collabora.com>,\n\tlibcamera-devel@lists.libcamera.org","Subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","Message-ID":"<20240611143108.GB10397@pendragon.ideasonboard.com>","References":"<20240611110721.10690-1-robert.mader@collabora.com>\n\t<878qzbeeo9.fsf@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<878qzbeeo9.fsf@redhat.com>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":29878,"web_url":"https://patchwork.libcamera.org/comment/29878/","msgid":"<171823221702.2248009.3589184439568817380@ping.linuxembedded.co.uk>","date":"2024-06-12T22:43:37","subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","submitter":{"id":4,"url":"https://patchwork.libcamera.org/api/people/4/","name":"Kieran Bingham","email":"kieran.bingham@ideasonboard.com"},"content":"Quoting Robert Mader (2024-06-11 12:07:01)\n> In order to be more compatible with modern hardware and APIs. This\n> notably allows GL implementations to directly import the buffers more\n> often and seems to be required for Wayland.\n> \n> Further more, as we already enforce a 8 byte stride, these formats work\n> better for clients that don't support padding - such as libwebrtc at the\n> time of writing.\n> \n> Tested on the Librem5 and PinePhone.\n> \n> Signed-off-by: Robert Mader <robert.mader@collabora.com>\n\nSome quite minor white space issues which must be tabs/spaces mixed up\nreported in :\n\n - https://gitlab.freedesktop.org/camera/libcamera/-/jobs/59751572\n\nBut otherwise the CI for this is green for the compiler matrix:\n\n - https://gitlab.freedesktop.org/camera/libcamera/-/pipelines/1198765\n\nI agree, there's a lot of duplication here, but I think ... I can also\nadd this already:\n\nTested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n\nAs I've used this on the x13s with meet.jit.si / firefox / pipewire.\n\n\n> ---\n>  src/libcamera/software_isp/debayer_cpu.cpp | 244 +++++++++++++++++++--\n>  src/libcamera/software_isp/debayer_cpu.h   |  10 +\n>  2 files changed, 238 insertions(+), 16 deletions(-)\n> \n> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp\n> index c038eed4..73c66a88 100644\n> --- a/src/libcamera/software_isp/debayer_cpu.cpp\n> +++ b/src/libcamera/software_isp/debayer_cpu.cpp\n> @@ -76,6 +76,13 @@ DebayerCpu::~DebayerCpu()\n>         *dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n>         x++;\n>  \n> +#define BGGR_XBGR8888(p, n, div)                                                              \\\n> +       *dst++ = blue_[curr[x] / (div)];                                                      \\\n> +       *dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];       \\\n> +       *dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> +       *dst++ = 255;                                                                         \\\n> +       x++;\n> +\n>  /*\n>   * GBG\n>   * RGR\n> @@ -87,6 +94,13 @@ DebayerCpu::~DebayerCpu()\n>         *dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n>         x++;\n>  \n> +#define GRBG_XBGR8888(p, n, div)                                  \\\n> +       *dst++ = blue_[(prev[x] + next[x]) / (2 * (div))];        \\\n> +       *dst++ = green_[curr[x] / (div)];                         \\\n> +       *dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> +       *dst++ = 255;                                             \\\n> +       x++;\n> +\n>  /*\n>   * GRG\n>   * BGB\n> @@ -98,6 +112,13 @@ DebayerCpu::~DebayerCpu()\n>         *dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n>         x++;\n>  \n> +#define GBRG_XBGR8888(p, n, div)                                   \\\n> +       *dst++ = blue_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n> +       *dst++ = green_[curr[x] / (div)];                          \\\n> +       *dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n> +       *dst++ = 255;                                              \\\n> +       x++;\n> +\n>  /*\n>   * BGB\n>   * GRG\n> @@ -109,6 +130,13 @@ DebayerCpu::~DebayerCpu()\n>         *dst++ = red_[curr[x] / (div)];                                                        \\\n>         x++;\n>  \n> +#define RGGB_XBGR8888(p, n, div)                                                               \\\n> +       *dst++ = blue_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n> +       *dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];        \\\n> +       *dst++ = red_[curr[x] / (div)];                                                        \\\n> +       *dst++ = 255;                                                                          \\\n> +       x++;\n> +\n>  void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         DECLARE_SRC_POINTERS(uint8_t)\n> @@ -119,6 +147,16 @@ void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       DECLARE_SRC_POINTERS(uint8_t)\n> +\n> +       for (int x = 0; x < (int)window_.width;) {\n> +               BGGR_XBGR8888(1, 1, 1)\n> +               GBRG_XBGR8888(1, 1, 1)\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         DECLARE_SRC_POINTERS(uint8_t)\n> @@ -129,6 +167,16 @@ void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       DECLARE_SRC_POINTERS(uint8_t)\n> +\n> +       for (int x = 0; x < (int)window_.width;) {\n> +               GRBG_XBGR8888(1, 1, 1)\n> +               RGGB_XBGR8888(1, 1, 1)\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         DECLARE_SRC_POINTERS(uint16_t)\n> @@ -140,6 +188,17 @@ void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       DECLARE_SRC_POINTERS(uint16_t)\n> +\n> +       for (int x = 0; x < (int)window_.width;) {\n> +               /* divide values by 4 for 10 -> 8 bpp value */\n> +               BGGR_XBGR8888(1, 1, 4)\n> +               GBRG_XBGR8888(1, 1, 4)\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         DECLARE_SRC_POINTERS(uint16_t)\n> @@ -151,6 +210,17 @@ void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       DECLARE_SRC_POINTERS(uint16_t)\n> +\n> +       for (int x = 0; x < (int)window_.width;) {\n> +               /* divide values by 4 for 10 -> 8 bpp value */\n> +               GRBG_XBGR8888(1, 1, 4)\n> +               RGGB_XBGR8888(1, 1, 4)\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         DECLARE_SRC_POINTERS(uint16_t)\n> @@ -162,6 +232,17 @@ void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       DECLARE_SRC_POINTERS(uint16_t)\n> +\n> +       for (int x = 0; x < (int)window_.width;) {\n> +               /* divide values by 16 for 12 -> 8 bpp value */\n> +               BGGR_XBGR8888(1, 1, 16)\n> +               GBRG_XBGR8888(1, 1, 16)\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         DECLARE_SRC_POINTERS(uint16_t)\n> @@ -173,6 +254,17 @@ void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       DECLARE_SRC_POINTERS(uint16_t)\n> +\n> +       for (int x = 0; x < (int)window_.width;) {\n> +               /* divide values by 16 for 12 -> 8 bpp value */\n> +               GRBG_XBGR8888(1, 1, 16)\n> +               RGGB_XBGR8888(1, 1, 16)\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         const int widthInBytes = window_.width * 5 / 4;\n> @@ -198,6 +290,31 @@ void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       const int widthInBytes = window_.width * 5 / 4;\n> +       const uint8_t *prev = src[0];\n> +       const uint8_t *curr = src[1];\n> +       const uint8_t *next = src[2];\n> +\n> +       /*\n> +        * For the first pixel getting a pixel from the previous column uses\n> +        * x - 2 to skip the 5th byte with least-significant bits for 4 pixels.\n> +        * Same for last pixel (uses x + 2) and looking at the next column.\n> +        */\n> +       for (int x = 0; x < widthInBytes;) {\n> +               /* First pixel */\n> +               BGGR_XBGR8888(2, 1, 1)\n> +               /* Second pixel BGGR -> GBRG */\n> +               GBRG_XBGR8888(1, 1, 1)\n> +               /* Same thing for third and fourth pixels */\n> +               BGGR_XBGR8888(1, 1, 1)\n> +               GBRG_XBGR8888(1, 2, 1)\n> +               /* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +               x++;\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         const int widthInBytes = window_.width * 5 / 4;\n> @@ -218,6 +335,26 @@ void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       const int widthInBytes = window_.width * 5 / 4;\n> +       const uint8_t *prev = src[0];\n> +       const uint8_t *curr = src[1];\n> +       const uint8_t *next = src[2];\n> +\n> +       for (int x = 0; x < widthInBytes;) {\n> +               /* First pixel */\n> +               GRBG_XBGR8888(2, 1, 1)\n> +               /* Second pixel GRBG -> RGGB */\n> +               RGGB_XBGR8888(1, 1, 1)\n> +               /* Same thing for third and fourth pixels */\n> +               GRBG_XBGR8888(1, 1, 1)\n> +               RGGB_XBGR8888(1, 2, 1)\n> +               /* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +               x++;\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         const int widthInBytes = window_.width * 5 / 4;\n> @@ -238,6 +375,26 @@ void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       const int widthInBytes = window_.width * 5 / 4;\n> +       const uint8_t *prev = src[0];\n> +       const uint8_t *curr = src[1];\n> +       const uint8_t *next = src[2];\n> +\n> +       for (int x = 0; x < widthInBytes;) {\n> +               /* Even pixel */\n> +               GBRG_XBGR8888(2, 1, 1)\n> +               /* Odd pixel GBGR -> BGGR */\n> +               BGGR_XBGR8888(1, 1, 1)\n> +               /* Same thing for next 2 pixels */\n> +               GBRG_XBGR8888(1, 1, 1)\n> +               BGGR_XBGR8888(1, 2, 1)\n> +               /* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +               x++;\n> +       }\n> +}\n> +\n>  void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n>  {\n>         const int widthInBytes = window_.width * 5 / 4;\n> @@ -258,6 +415,26 @@ void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n>         }\n>  }\n>  \n> +void DebayerCpu::debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n> +{\n> +       const int widthInBytes = window_.width * 5 / 4;\n> +       const uint8_t *prev = src[0];\n> +       const uint8_t *curr = src[1];\n> +       const uint8_t *next = src[2];\n> +\n> +       for (int x = 0; x < widthInBytes;) {\n> +               /* Even pixel */\n> +               RGGB_XBGR8888(2, 1, 1)\n> +               /* Odd pixel RGGB -> GRBG */\n> +               GRBG_XBGR8888(1, 1, 1)\n> +               /* Same thing for next 2 pixels */\n> +               RGGB_XBGR8888(1, 1, 1)\n> +               GRBG_XBGR8888(1, 2, 1)\n> +               /* Skip 5th src byte with 4 x 2 least-significant-bits */\n> +               x++;\n> +       }\n> +}\n> +\n>  static bool isStandardBayerOrder(BayerFormat::Order order)\n>  {\n>         return order == BayerFormat::BGGR || order == BayerFormat::GBRG ||\n> @@ -280,7 +457,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n>                 config.bpp = (bayerFormat.bitDepth + 7) & ~7;\n>                 config.patternSize.width = 2;\n>                 config.patternSize.height = 2;\n> -               config.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n> +               config.outputFormats = std::vector<PixelFormat>({\n> +                       formats::RGB888,\n> +                       formats::XRGB8888,\n> +                       formats::ARGB8888,\n> +                       formats::BGR888,\n> +                       formats::XBGR8888,\n> +                       formats::ABGR8888\n> +               });\n>                 return 0;\n>         }\n>  \n> @@ -290,7 +474,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n>                 config.bpp = 10;\n>                 config.patternSize.width = 4; /* 5 bytes per *4* pixels */\n>                 config.patternSize.height = 2;\n> -               config.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n> +               config.outputFormats = std::vector<PixelFormat>({\n> +                       formats::RGB888,\n> +                       formats::XRGB8888,\n> +                       formats::ARGB8888,\n> +                       formats::BGR888,\n> +                       formats::XBGR8888,\n> +                       formats::ABGR8888\n> +               });\n>                 return 0;\n>         }\n>  \n> @@ -306,6 +497,12 @@ int DebayerCpu::getOutputConfig(PixelFormat outputFormat, DebayerOutputConfig &c\n>                 return 0;\n>         }\n>  \n> +       if (outputFormat == formats::XRGB8888 || outputFormat == formats::ARGB8888 ||\n> +           outputFormat == formats::XBGR8888 || outputFormat == formats::ABGR8888) {\n> +               config.bpp = 32;\n> +               return 0;\n> +       }\n> +\n>         LOG(Debayer, Info)\n>                 << \"Unsupported output format \" << outputFormat.toString();\n>         return -EINVAL;\n> @@ -341,6 +538,7 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>  {\n>         BayerFormat bayerFormat =\n>                 BayerFormat::fromPixelFormat(inputFormat);\n> +       bool is_aligned = false;\n\nWhy are we calling this 'is_aligned' rather than just differentiation on\nthe bits per pixel. I don't think the alignment is the key thing here -\nit's producing a different output pixel format! (which ... may be more\ndesirable for 32 bit alignments?)\n\nUltimately - this code path is going to be more and more tricky the more\ninput and output formats we add ...\n\n>  \n>         xShift_ = 0;\n>         swapRedBlueGains_ = false;\n> @@ -351,8 +549,16 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>         };\n>  \n>         switch (outputFormat) {\n> +       case formats::XRGB8888:\n> +       case formats::ARGB8888:\n> +         is_aligned = true;\n> +         [[fallthrough]];\n>         case formats::RGB888:\n>                 break;\n> +       case formats::XBGR8888:\n> +       case formats::ABGR8888:\n> +         is_aligned = true;\n> +         [[fallthrough]];\n>         case formats::BGR888:\n>                 /* Swap R and B in bayer order to generate BGR888 instead of RGB888 */\n>                 swapRedBlueGains_ = true;\n> @@ -383,16 +589,19 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>             isStandardBayerOrder(bayerFormat.order)) {\n>                 switch (bayerFormat.bitDepth) {\n>                 case 8:\n> -                       debayer0_ = &DebayerCpu::debayer8_BGBG_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer8_GRGR_BGR888;\n> +                 LOG(Debayer, Warning) << \"8bit no packing\";\n> +                 debayer0_ = is_aligned ? &DebayerCpu::debayer8_BGBG_XBGR8888 : &DebayerCpu::debayer8_BGBG_BGR888;\n> +                 debayer1_ = is_aligned ? &DebayerCpu::debayer8_GRGR_XBGR8888 : &DebayerCpu::debayer8_GRGR_BGR888;\n\nWhitespace issues here as reported by the linter.\n\n>                         break;\n>                 case 10:\n> -                       debayer0_ = &DebayerCpu::debayer10_BGBG_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer10_GRGR_BGR888;\n> +                 LOG(Debayer, Warning) << \"10bit no packing\";\n\nand here..\n\nBut we shouldn't merge code that adds these 'Warning's. Perhaps debug\nbut likely we should just remove the debug prints here.\n\n> +                       debayer0_ = is_aligned ? &DebayerCpu::debayer10_BGBG_XBGR8888 : &DebayerCpu::debayer10_BGBG_BGR888;\n> +                       debayer1_ = is_aligned ? &DebayerCpu::debayer10_GRGR_XBGR8888 : &DebayerCpu::debayer10_GRGR_BGR888;\n>                         break;\n>                 case 12:\n> -                       debayer0_ = &DebayerCpu::debayer12_BGBG_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer12_GRGR_BGR888;\n> +                 LOG(Debayer, Warning) << \"12bit no packing\";\n> +                       debayer0_ = is_aligned ? &DebayerCpu::debayer12_BGBG_XBGR8888 : &DebayerCpu::debayer12_BGBG_BGR888;\n> +                       debayer1_ = is_aligned ? &DebayerCpu::debayer12_GRGR_XBGR8888 : &DebayerCpu::debayer12_GRGR_BGR888;\n\n>                         break;\n>                 }\n>                 setupStandardBayerOrder(bayerFormat.order);\n> @@ -401,22 +610,23 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>  \n>         if (bayerFormat.bitDepth == 10 &&\n>             bayerFormat.packing == BayerFormat::Packing::CSI2) {\n> +         LOG(Debayer, Warning) << \"10bit csi2\";\n>                 switch (bayerFormat.order) {\n>                 case BayerFormat::BGGR:\n> -                       debayer0_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n> +                       debayer0_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n> +                       debayer1_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n>                         return 0;\n>                 case BayerFormat::GBRG:\n> -                       debayer0_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n> +                       debayer0_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n> +                       debayer1_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n>                         return 0;\n>                 case BayerFormat::GRBG:\n> -                       debayer0_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n> +                       debayer0_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n> +                       debayer1_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n>                         return 0;\n>                 case BayerFormat::RGGB:\n> -                       debayer0_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n> -                       debayer1_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n> +                       debayer0_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n> +                       debayer1_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n>                         return 0;\n>                 default:\n>                         break;\n> @@ -533,6 +743,8 @@ DebayerCpu::strideAndFrameSize(const PixelFormat &outputFormat, const Size &size\n>         /* round up to multiple of 8 for 64 bits alignment */\n>         unsigned int stride = (size.width * config.bpp / 8 + 7) & ~7;\n>  \n> +       LOG(Debayer, Warning) << outputFormat.toString() << \" \" << size.width << \" \" << size.height << \" \" << config.bpp << \" \" << stride << \" \" << stride * size.height;\n> +\n>         return std::make_tuple(stride, stride * size.height);\n>  }\n>  \n> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h\n> index be7dcdca..c30f44aa 100644\n> --- a/src/libcamera/software_isp/debayer_cpu.h\n> +++ b/src/libcamera/software_isp/debayer_cpu.h\n> @@ -86,18 +86,28 @@ private:\n>  \n>         /* 8-bit raw bayer format */\n>         void debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         void debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         /* unpacked 10-bit raw bayer format */\n>         void debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         void debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         /* unpacked 12-bit raw bayer format */\n>         void debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         void debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         /* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */\n>         void debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         void debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         void debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>         void debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[]);\n> +       void debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>  \n>         struct DebayerInputConfig {\n>                 Size patternSize;\n> -- \n> 2.45.2\n>","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 19C21BD87C\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 12 Jun 2024 22:43:42 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 456F965446;\n\tThu, 13 Jun 2024 00:43:41 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id D1B3565446\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 13 Jun 2024 00:43:39 +0200 (CEST)","from pendragon.ideasonboard.com\n\t(cpc89244-aztw30-2-0-cust6594.18-1.cable.virginm.net [86.31.185.195])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 1714D4CF;\n\tThu, 13 Jun 2024 00:43:26 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"kIM7iHs/\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1718232206;\n\tbh=0t+z95TLoQG4gh4F39vuyTCBxUH1AetA83wY2/M4bYU=;\n\th=In-Reply-To:References:Subject:From:Cc:To:Date:From;\n\tb=kIM7iHs/DbwvulVtbTkuWV3xzVYOHpMmB/DDTcQWe+NbY6IH7NIwhY8N/cEZjoSi2\n\t40+e8/wF2H5jh+jRTpMITSdnzAKnpNyLBk1yinH7+HX+ZYP/RbZYt9tLqLIJhVMs0E\n\tFOxhSBSdk/1Q67fPWoNj/UvHXmGCd1S4+Pihrg5g=","Content-Type":"text/plain; charset=\"utf-8\"","MIME-Version":"1.0","Content-Transfer-Encoding":"quoted-printable","In-Reply-To":"<20240611110721.10690-1-robert.mader@collabora.com>","References":"<20240611110721.10690-1-robert.mader@collabora.com>","Subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","From":"Kieran Bingham <kieran.bingham@ideasonboard.com>","Cc":"Robert Mader <robert.mader@collabora.com>","To":"Robert Mader <robert.mader@collabora.com>,\n\tlibcamera-devel@lists.libcamera.org","Date":"Wed, 12 Jun 2024 23:43:37 +0100","Message-ID":"<171823221702.2248009.3589184439568817380@ping.linuxembedded.co.uk>","User-Agent":"alot/0.10","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":30008,"web_url":"https://patchwork.libcamera.org/comment/30008/","msgid":"<9482ce5d-d694-48a2-844e-54895e86fa2d@collabora.com>","date":"2024-06-17T17:41:04","subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","submitter":{"id":140,"url":"https://patchwork.libcamera.org/api/people/140/","name":"Robert Mader","email":"robert.mader@collabora.com"},"content":"Hi!\n\nOn 11.06.24 16:29, Milan Zamazal wrote:\n> Hi Robert,\n>\n> thank you for the patch.\n>\n> Robert Mader<robert.mader@collabora.com>  writes:\n>\n>> In order to be more compatible with modern hardware and APIs. This\n>> notably allows GL implementations to directly import the buffers more\n>> often and seems to be required for Wayland.\n>>\n>> Further more, as we already enforce a 8 byte stride, these formats work\n>> better for clients that don't support padding - such as libwebrtc at the\n>> time of writing.\n>>\n>> Tested on the Librem5 and PinePhone.\n>>\n>> Signed-off-by: Robert Mader<robert.mader@collabora.com>\n>> ---\n>>   src/libcamera/software_isp/debayer_cpu.cpp | 244 +++++++++++++++++++--\n>>   src/libcamera/software_isp/debayer_cpu.h   |  10 +\n>>   2 files changed, 238 insertions(+), 16 deletions(-)\n>>\n>> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp\n>> index c038eed4..73c66a88 100644\n>> --- a/src/libcamera/software_isp/debayer_cpu.cpp\n>> +++ b/src/libcamera/software_isp/debayer_cpu.cpp\n>> @@ -76,6 +76,13 @@ DebayerCpu::~DebayerCpu()\n>>   \t*dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n>>   \tx++;\n>>   \n>> +#define BGGR_XBGR8888(p, n, div)                                                              \\\n>> +\t*dst++ = blue_[curr[x] / (div)];                                                      \\\n>> +\t*dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];       \\\n>> +\t*dst++ = red_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n>> +\t*dst++ = 255;                                                                         \\\n>> +\tx++;\n>> +\n> The level of code duplication here starts to exceed reasonable limits.\n> Maybe adding an argument to the macro deciding whether to append the\n> last `dst' assignment or not would have a significant performance\n> impact.  But even then I guess there must be a way to let the compiler\n> generate the duplicate code (an inline function template?) rather than\n> doing it manually.  What do the C++ experts around think?\n>\n> If nothing better is possible then `*dst++ = 255' can be applied in the\n> callers rather than defining the alternative macro versions.\nYeah, this was exactly I hoped to get feedback about. I'm going with the \nsimple solution of adding an additional variable in v2, assuming that \nthe performance impact will be small with modern branch prediction and \nnot seeing one when quickly testing.\n>>   /*\n>>    * GBG\n>>    * RGR\n>> @@ -87,6 +94,13 @@ DebayerCpu::~DebayerCpu()\n>>   \t*dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n>>   \tx++;\n>>   \n>> +#define GRBG_XBGR8888(p, n, div)                                  \\\n>> +\t*dst++ = blue_[(prev[x] + next[x]) / (2 * (div))];        \\\n>> +\t*dst++ = green_[curr[x] / (div)];                         \\\n>> +\t*dst++ = red_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n>> +\t*dst++ = 255;                                             \\\n>> +\tx++;\n>> +\n>>   /*\n>>    * GRG\n>>    * BGB\n>> @@ -98,6 +112,13 @@ DebayerCpu::~DebayerCpu()\n>>   \t*dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n>>   \tx++;\n>>   \n>> +#define GBRG_XBGR8888(p, n, div)                                   \\\n>> +\t*dst++ = blue_[(curr[x - p] + curr[x + n]) / (2 * (div))]; \\\n>> +\t*dst++ = green_[curr[x] / (div)];                          \\\n>> +\t*dst++ = red_[(prev[x] + next[x]) / (2 * (div))];          \\\n>> +\t*dst++ = 255;                                              \\\n>> +\tx++;\n>> +\n>>   /*\n>>    * BGB\n>>    * GRG\n>> @@ -109,6 +130,13 @@ DebayerCpu::~DebayerCpu()\n>>   \t*dst++ = red_[curr[x] / (div)];                                                        \\\n>>   \tx++;\n>>   \n>> +#define RGGB_XBGR8888(p, n, div)                                                               \\\n>> +\t*dst++ = blue_[(prev[x - p] + prev[x + n] + next[x - p] + next[x + n]) / (4 * (div))]; \\\n>> +\t*dst++ = green_[(prev[x] + curr[x - p] + curr[x + n] + next[x]) / (4 * (div))];        \\\n>> +\t*dst++ = red_[curr[x] / (div)];                                                        \\\n>> +\t*dst++ = 255;                                                                          \\\n>> +\tx++;\n>> +\n>>   void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tDECLARE_SRC_POINTERS(uint8_t)\n>> @@ -119,6 +147,16 @@ void DebayerCpu::debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tDECLARE_SRC_POINTERS(uint8_t)\n>> +\n>> +\tfor (int x = 0; x < (int)window_.width;) {\n>> +\t\tBGGR_XBGR8888(1, 1, 1)\n>> +\t\tGBRG_XBGR8888(1, 1, 1)\n>> +\t}\n>> +}\n>> +\n> ... and then X and non-X versions of these methods could be merged too.\n> Using a template for the purpose should have no performance impact, I\n> suppose.\n>\n>>   void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tDECLARE_SRC_POINTERS(uint8_t)\n>> @@ -129,6 +167,16 @@ void DebayerCpu::debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tDECLARE_SRC_POINTERS(uint8_t)\n>> +\n>> +\tfor (int x = 0; x < (int)window_.width;) {\n>> +\t\tGRBG_XBGR8888(1, 1, 1)\n>> +\t\tRGGB_XBGR8888(1, 1, 1)\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tDECLARE_SRC_POINTERS(uint16_t)\n>> @@ -140,6 +188,17 @@ void DebayerCpu::debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tDECLARE_SRC_POINTERS(uint16_t)\n>> +\n>> +\tfor (int x = 0; x < (int)window_.width;) {\n>> +\t\t/* divide values by 4 for 10 -> 8 bpp value */\n>> +\t\tBGGR_XBGR8888(1, 1, 4)\n>> +\t\tGBRG_XBGR8888(1, 1, 4)\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tDECLARE_SRC_POINTERS(uint16_t)\n>> @@ -151,6 +210,17 @@ void DebayerCpu::debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tDECLARE_SRC_POINTERS(uint16_t)\n>> +\n>> +\tfor (int x = 0; x < (int)window_.width;) {\n>> +\t\t/* divide values by 4 for 10 -> 8 bpp value */\n>> +\t\tGRBG_XBGR8888(1, 1, 4)\n>> +\t\tRGGB_XBGR8888(1, 1, 4)\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tDECLARE_SRC_POINTERS(uint16_t)\n>> @@ -162,6 +232,17 @@ void DebayerCpu::debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tDECLARE_SRC_POINTERS(uint16_t)\n>> +\n>> +\tfor (int x = 0; x < (int)window_.width;) {\n>> +\t\t/* divide values by 16 for 12 -> 8 bpp value */\n>> +\t\tBGGR_XBGR8888(1, 1, 16)\n>> +\t\tGBRG_XBGR8888(1, 1, 16)\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tDECLARE_SRC_POINTERS(uint16_t)\n>> @@ -173,6 +254,17 @@ void DebayerCpu::debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tDECLARE_SRC_POINTERS(uint16_t)\n>> +\n>> +\tfor (int x = 0; x < (int)window_.width;) {\n>> +\t\t/* divide values by 16 for 12 -> 8 bpp value */\n>> +\t\tGRBG_XBGR8888(1, 1, 16)\n>> +\t\tRGGB_XBGR8888(1, 1, 16)\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tconst int widthInBytes = window_.width * 5 / 4;\n>> @@ -198,6 +290,31 @@ void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tconst int widthInBytes = window_.width * 5 / 4;\n>> +\tconst uint8_t *prev = src[0];\n>> +\tconst uint8_t *curr = src[1];\n>> +\tconst uint8_t *next = src[2];\n>> +\n>> +\t/*\n>> +\t * For the first pixel getting a pixel from the previous column uses\n>> +\t * x - 2 to skip the 5th byte with least-significant bits for 4 pixels.\n>> +\t * Same for last pixel (uses x + 2) and looking at the next column.\n>> +\t */\n>> +\tfor (int x = 0; x < widthInBytes;) {\n>> +\t\t/* First pixel */\n>> +\t\tBGGR_XBGR8888(2, 1, 1)\n>> +\t\t/* Second pixel BGGR -> GBRG */\n>> +\t\tGBRG_XBGR8888(1, 1, 1)\n>> +\t\t/* Same thing for third and fourth pixels */\n>> +\t\tBGGR_XBGR8888(1, 1, 1)\n>> +\t\tGBRG_XBGR8888(1, 2, 1)\n>> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n>> +\t\tx++;\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tconst int widthInBytes = window_.width * 5 / 4;\n>> @@ -218,6 +335,26 @@ void DebayerCpu::debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tconst int widthInBytes = window_.width * 5 / 4;\n>> +\tconst uint8_t *prev = src[0];\n>> +\tconst uint8_t *curr = src[1];\n>> +\tconst uint8_t *next = src[2];\n>> +\n>> +\tfor (int x = 0; x < widthInBytes;) {\n>> +\t\t/* First pixel */\n>> +\t\tGRBG_XBGR8888(2, 1, 1)\n>> +\t\t/* Second pixel GRBG -> RGGB */\n>> +\t\tRGGB_XBGR8888(1, 1, 1)\n>> +\t\t/* Same thing for third and fourth pixels */\n>> +\t\tGRBG_XBGR8888(1, 1, 1)\n>> +\t\tRGGB_XBGR8888(1, 2, 1)\n>> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n>> +\t\tx++;\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tconst int widthInBytes = window_.width * 5 / 4;\n>> @@ -238,6 +375,26 @@ void DebayerCpu::debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tconst int widthInBytes = window_.width * 5 / 4;\n>> +\tconst uint8_t *prev = src[0];\n>> +\tconst uint8_t *curr = src[1];\n>> +\tconst uint8_t *next = src[2];\n>> +\n>> +\tfor (int x = 0; x < widthInBytes;) {\n>> +\t\t/* Even pixel */\n>> +\t\tGBRG_XBGR8888(2, 1, 1)\n>> +\t\t/* Odd pixel GBGR -> BGGR */\n>> +\t\tBGGR_XBGR8888(1, 1, 1)\n>> +\t\t/* Same thing for next 2 pixels */\n>> +\t\tGBRG_XBGR8888(1, 1, 1)\n>> +\t\tBGGR_XBGR8888(1, 2, 1)\n>> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n>> +\t\tx++;\n>> +\t}\n>> +}\n>> +\n>>   void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   {\n>>   \tconst int widthInBytes = window_.width * 5 / 4;\n>> @@ -258,6 +415,26 @@ void DebayerCpu::debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[])\n>>   \t}\n>>   }\n>>   \n>> +void DebayerCpu::debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[])\n>> +{\n>> +\tconst int widthInBytes = window_.width * 5 / 4;\n>> +\tconst uint8_t *prev = src[0];\n>> +\tconst uint8_t *curr = src[1];\n>> +\tconst uint8_t *next = src[2];\n>> +\n>> +\tfor (int x = 0; x < widthInBytes;) {\n>> +\t\t/* Even pixel */\n>> +\t\tRGGB_XBGR8888(2, 1, 1)\n>> +\t\t/* Odd pixel RGGB -> GRBG */\n>> +\t\tGRBG_XBGR8888(1, 1, 1)\n>> +\t\t/* Same thing for next 2 pixels */\n>> +\t\tRGGB_XBGR8888(1, 1, 1)\n>> +\t\tGRBG_XBGR8888(1, 2, 1)\n>> +\t\t/* Skip 5th src byte with 4 x 2 least-significant-bits */\n>> +\t\tx++;\n>> +\t}\n>> +}\n>> +\n>>   static bool isStandardBayerOrder(BayerFormat::Order order)\n>>   {\n>>   \treturn order == BayerFormat::BGGR || order == BayerFormat::GBRG ||\n>> @@ -280,7 +457,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n>>   \t\tconfig.bpp = (bayerFormat.bitDepth + 7) & ~7;\n>>   \t\tconfig.patternSize.width = 2;\n>>   \t\tconfig.patternSize.height = 2;\n>> -\t\tconfig.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n>> +\t\tconfig.outputFormats = std::vector<PixelFormat>({\n>> +\t\t\tformats::RGB888,\n>> +\t\t\tformats::XRGB8888,\n>> +\t\t\tformats::ARGB8888,\n>> +\t\t\tformats::BGR888,\n>> +\t\t\tformats::XBGR8888,\n>> +\t\t\tformats::ABGR8888\n>> +\t\t});\n>>   \t\treturn 0;\n>>   \t}\n>>   \n>> @@ -290,7 +474,14 @@ int DebayerCpu::getInputConfig(PixelFormat inputFormat, DebayerInputConfig &conf\n>>   \t\tconfig.bpp = 10;\n>>   \t\tconfig.patternSize.width = 4; /* 5 bytes per *4* pixels */\n>>   \t\tconfig.patternSize.height = 2;\n>> -\t\tconfig.outputFormats = std::vector<PixelFormat>({ formats::RGB888, formats::BGR888 });\n>> +\t\tconfig.outputFormats = std::vector<PixelFormat>({\n>> +\t\t\tformats::RGB888,\n>> +\t\t\tformats::XRGB8888,\n>> +\t\t\tformats::ARGB8888,\n>> +\t\t\tformats::BGR888,\n>> +\t\t\tformats::XBGR8888,\n>> +\t\t\tformats::ABGR8888\n>> +\t\t});\n>>   \t\treturn 0;\n>>   \t}\n>>   \n>> @@ -306,6 +497,12 @@ int DebayerCpu::getOutputConfig(PixelFormat outputFormat, DebayerOutputConfig &c\n>>   \t\treturn 0;\n>>   \t}\n>>   \n>> +\tif (outputFormat == formats::XRGB8888 || outputFormat == formats::ARGB8888 ||\n>> +\t    outputFormat == formats::XBGR8888 || outputFormat == formats::ABGR8888) {\n>> +\t\tconfig.bpp = 32;\n>> +\t\treturn 0;\n>> +\t}\n>> +\n>>   \tLOG(Debayer, Info)\n>>   \t\t<< \"Unsupported output format \" << outputFormat.toString();\n>>   \treturn -EINVAL;\n>> @@ -341,6 +538,7 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>>   {\n>>   \tBayerFormat bayerFormat =\n>>   \t\tBayerFormat::fromPixelFormat(inputFormat);\n>> +\tbool is_aligned = false;\n> camelCase should be used for variable names.\nRight, fixed!\n>\n>>   \txShift_ = 0;\n>>   \tswapRedBlueGains_ = false;\n>> @@ -351,8 +549,16 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>>   \t};\n>>   \n>>   \tswitch (outputFormat) {\n>> +\tcase formats::XRGB8888:\n>> +\tcase formats::ARGB8888:\n>> +\t  is_aligned = true;\n>> +\t  [[fallthrough]];\n>>   \tcase formats::RGB888:\n>>   \t\tbreak;\n>> +\tcase formats::XBGR8888:\n>> +\tcase formats::ABGR8888:\n>> +\t  is_aligned = true;\n>> +\t  [[fallthrough]];\n>>   \tcase formats::BGR888:\n>>   \t\t/* Swap R and B in bayer order to generate BGR888 instead of RGB888 */\n>>   \t\tswapRedBlueGains_ = true;\n>> @@ -383,16 +589,19 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>>   \t    isStandardBayerOrder(bayerFormat.order)) {\n>>   \t\tswitch (bayerFormat.bitDepth) {\n>>   \t\tcase 8:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer8_BGBG_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer8_GRGR_BGR888;\n>> +\t\t  LOG(Debayer, Warning) << \"8bit no packing\";\n> Is this a debugging leftover or do you really mean to log something\n> here?  If the latter, it should probably be Debug rather than Warning\n> and the message should be improved.  The same applies to the other LOGs\n> below.\nUrgh, that was a left-over - removed!\n>\n>> +\t\t  debayer0_ = is_aligned ? &DebayerCpu::debayer8_BGBG_XBGR8888 : &DebayerCpu::debayer8_BGBG_BGR888;\n> If the two methods were unified and extended with isAligned argument,\n> I'm not sure what would be the best way to pass the argument to them.\n> lambda, a template parameter or anything else?\n>\n>> +\t\t  debayer1_ = is_aligned ? &DebayerCpu::debayer8_GRGR_XBGR8888 : &DebayerCpu::debayer8_GRGR_BGR888;\n>>   \t\t\tbreak;\n>>   \t\tcase 10:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer10_BGBG_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer10_GRGR_BGR888;\n>> +\t\t  LOG(Debayer, Warning) << \"10bit no packing\";\n>> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10_BGBG_XBGR8888 : &DebayerCpu::debayer10_BGBG_BGR888;\n>> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10_GRGR_XBGR8888 : &DebayerCpu::debayer10_GRGR_BGR888;\n>>   \t\t\tbreak;\n>>   \t\tcase 12:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer12_BGBG_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer12_GRGR_BGR888;\n>> +\t\t  LOG(Debayer, Warning) << \"12bit no packing\";\n>> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer12_BGBG_XBGR8888 : &DebayerCpu::debayer12_BGBG_BGR888;\n>> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer12_GRGR_XBGR8888 : &DebayerCpu::debayer12_GRGR_BGR888;\n>>   \t\t\tbreak;\n>>   \t\t}\n>>   \t\tsetupStandardBayerOrder(bayerFormat.order);\n>> @@ -401,22 +610,23 @@ int DebayerCpu::setDebayerFunctions(PixelFormat inputFormat, PixelFormat outputF\n>>   \n>>   \tif (bayerFormat.bitDepth == 10 &&\n>>   \t    bayerFormat.packing == BayerFormat::Packing::CSI2) {\n>> +\t  LOG(Debayer, Warning) << \"10bit csi2\";\n>>   \t\tswitch (bayerFormat.order) {\n>>   \t\tcase BayerFormat::BGGR:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n>> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n>> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n>>   \t\t\treturn 0;\n>>   \t\tcase BayerFormat::GBRG:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n>> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n>> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n>>   \t\t\treturn 0;\n>>   \t\tcase BayerFormat::GRBG:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_GRGR_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_BGBG_BGR888;\n>> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_GRGR_XBGR8888 : &DebayerCpu::debayer10P_GRGR_BGR888;\n>> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_BGBG_XBGR8888 : &DebayerCpu::debayer10P_BGBG_BGR888;\n>>   \t\t\treturn 0;\n>>   \t\tcase BayerFormat::RGGB:\n>> -\t\t\tdebayer0_ = &DebayerCpu::debayer10P_RGRG_BGR888;\n>> -\t\t\tdebayer1_ = &DebayerCpu::debayer10P_GBGB_BGR888;\n>> +\t\t\tdebayer0_ = is_aligned ? &DebayerCpu::debayer10P_RGRG_XBGR8888 : &DebayerCpu::debayer10P_RGRG_BGR888;\n>> +\t\t\tdebayer1_ = is_aligned ? &DebayerCpu::debayer10P_GBGB_XBGR8888 : &DebayerCpu::debayer10P_GBGB_BGR888;\n>>   \t\t\treturn 0;\n>>   \t\tdefault:\n>>   \t\t\tbreak;\n>> @@ -533,6 +743,8 @@ DebayerCpu::strideAndFrameSize(const PixelFormat &outputFormat, const Size &size\n>>   \t/* round up to multiple of 8 for 64 bits alignment */\n>>   \tunsigned int stride = (size.width * config.bpp / 8 + 7) & ~7;\n>>   \n>> +\tLOG(Debayer, Warning) << outputFormat.toString() << \" \" << size.width << \" \" << size.height << \" \" << config.bpp << \" \" << stride << \" \" << stride * size.height;\n>> +\n>>   \treturn std::make_tuple(stride, stride * size.height);\n>>   }\n>>   \n>> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h\n>> index be7dcdca..c30f44aa 100644\n>> --- a/src/libcamera/software_isp/debayer_cpu.h\n>> +++ b/src/libcamera/software_isp/debayer_cpu.h\n>> @@ -86,18 +86,28 @@ private:\n>>   \n>>   \t/* 8-bit raw bayer format */\n>>   \tvoid debayer8_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer8_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \tvoid debayer8_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer8_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \t/* unpacked 10-bit raw bayer format */\n>>   \tvoid debayer10_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer10_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \tvoid debayer10_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer10_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \t/* unpacked 12-bit raw bayer format */\n>>   \tvoid debayer12_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer12_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \tvoid debayer12_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer12_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \t/* CSI-2 packed 10-bit raw bayer format (all the 4 orders) */\n>>   \tvoid debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer10P_BGBG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \tvoid debayer10P_GRGR_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer10P_GRGR_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \tvoid debayer10P_GBGB_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer10P_GBGB_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \tvoid debayer10P_RGRG_BGR888(uint8_t *dst, const uint8_t *src[]);\n>> +\tvoid debayer10P_RGRG_XBGR8888(uint8_t *dst, const uint8_t *src[]);\n>>   \n>>   \tstruct DebayerInputConfig {\n>>   \t\tSize patternSize;","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id B6407C3237\n\tfor <parsemail@patchwork.libcamera.org>;\n\tMon, 17 Jun 2024 17:41:09 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 8F37F6548B;\n\tMon, 17 Jun 2024 19:41:08 +0200 (CEST)","from madrid.collaboradmins.com (madrid.collaboradmins.com\n\t[IPv6:2a00:1098:ed:100::25])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 46A9F61A1C\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tMon, 17 Jun 2024 19:41:06 +0200 (CEST)","from [IPV6:fd00::22:e5c5] (cola.collaboradmins.com\n\t[IPv6:2a01:4f8:1c1c:5717::1])\n\t(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)\n\tkey-exchange X25519 server-signature RSA-PSS (4096 bits)\n\tserver-digest SHA256)\n\t(No client certificate requested) (Authenticated sender: rmader)\n\tby madrid.collaboradmins.com (Postfix) with ESMTPSA id 348F2378213C; \n\tMon, 17 Jun 2024 17:41:05 +0000 (UTC)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (2048-bit key;\n\tunprotected) header.d=collabora.com header.i=@collabora.com\n\theader.b=\"tXlN+vQa\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com;\n\ts=mail; t=1718646065;\n\tbh=llZpajJcHNGyTrO+ga+F3c3NG5epJmHTKYC+l1vOrvs=;\n\th=Date:Subject:To:References:From:Cc:In-Reply-To:From;\n\tb=tXlN+vQaCoCF3vnbHAMRf0R3lSQ7bAGizfoteEducQcUsS+sFjKz3a2aR5MKInkFH\n\tClRE0GAtQ43wkir0IqjdphY+XVgehu1yYWEL3R9t/Q8iLb9IDK/fDp+Y+HMKoxapmA\n\tnL6HjhSZ80r4vmS8noJVMSzCfkUGrYRsc2GOr1PbasEBU8l7yMnDEklT/+14iFMKqf\n\t8RYjC5W5XFkJfD7v47NbMYOXjyor/LQ5HGdi0BWXWk46Tf5pDZT2QC3eLcSUvA/OWJ\n\tFY559BS/uPmvcGdKKkfuCK+4VxG92FY2Vu7OCACf2xK4MCrLPE51XqLtJ8cFB7BLjA\n\tAvw3I0drZcMhQ==","Content-Type":"multipart/alternative;\n\tboundary=\"------------7F7ndB1iT87h0v8eHhvIYfsT\"","Message-ID":"<9482ce5d-d694-48a2-844e-54895e86fa2d@collabora.com>","Date":"Mon, 17 Jun 2024 19:41:04 +0200","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH] libcamera: debayer_cpu: Add 32bits/aligned output\n\tformats","To":"Milan Zamazal <mzamazal@redhat.com>","References":"<20240611110721.10690-1-robert.mader@collabora.com>\n\t<878qzbeeo9.fsf@redhat.com>","Content-Language":"en-US, de-DE","From":"Robert Mader <robert.mader@collabora.com>","Cc":"libcamera-devel@lists.libcamera.org","In-Reply-To":"<878qzbeeo9.fsf@redhat.com>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}}]