[{"id":37682,"web_url":"https://patchwork.libcamera.org/comment/37682/","msgid":"<7f271b8a-40bd-4d19-8a58-f9d320e3fa58@ideasonboard.com>","date":"2026-01-15T16:28:45","subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","submitter":{"id":216,"url":"https://patchwork.libcamera.org/api/people/216/","name":"Barnabás Pőcze","email":"barnabas.pocze@ideasonboard.com"},"content":"2026. 01. 14. 18:39 keltezéssel, Kieran Bingham írta:\n> Extend the new Quantized type infrastructure by providing a\n> FixedPointQTraits template.\n> \n> This allows construction of fixed point types with a Quantized storage\n> that allows easy reading of both the underlying quantized type value and\n> a floating point representation of that same value.\n> \n> Reviewed-by: Isaac Scott <isaac.scott@ideasonboard.com>\n> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n> \n> ---\n> v4:\n> - Assert that the given type has enough bits for the usage\n> - Use unsigned types for calculating qmin/qmax\n> - Reorder toFloat/fromFloat and min/max for future inlining\n> - Make toFloat and fromFloat constexpr\n> \n> v5:\n> - Make UT, Bits and Bitmask private (and remove doxygen)\n> - Remove constexpr from fromFloat which uses std::round (only constexpr\n>    in C++23)\n> - static_assert that min<max when converted\n> - Provide new Q and UQ automatic width types (Thanks Barnabás)\n> - Convert types to shortened Q/UQ automatic widths\n> - Use automatic width Q/UQ for 12,4\n> - change qmin->qMin qmax->qMax Bits->bits BitMask->bitMask\n> - Remove typedefs for Q1_7 etc\n> \n> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n> ---\n>   src/ipa/libipa/fixedpoint.cpp | 89 +++++++++++++++++++++++++++++++++++\n>   src/ipa/libipa/fixedpoint.h   | 69 +++++++++++++++++++++++++++\n>   2 files changed, 158 insertions(+)\n> \n> diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp\n> index 6b698fc5d680..43d76f745d8a 100644\n> --- a/src/ipa/libipa/fixedpoint.cpp\n> +++ b/src/ipa/libipa/fixedpoint.cpp\n> @@ -37,6 +37,95 @@ namespace ipa {\n>    * \\return The converted value\n>    */\n>   \n> +/**\n> + * \\struct libcamera::ipa::FixedPointQTraits\n> + * \\brief Traits type implementing fixed-point quantisation conversions\n\nIn a different commit you used \"quanti[z]...\", I think consistency would be better.\n\n\n> + *\n> + * The FixedPointQTraits structure defines a policy for mapping floating-point\n> + * values to and from fixed-point integer representations. It is parameterised\n> + * by the number of integer bits \\a I, fractional bits \\a F, and the integral\n> + * storage type \\a T. The traits are used with Quantized<Traits> to create a\n> + * quantised type that stores both the fixed-point representation and the\n> + * corresponding floating-point value.\n> + *\n> + * The trait exposes compile-time constants describing the bit layout, limits,\n> + * and scaling factors used in the fixed-point representation.\n\n   \"The sign of the value is determined by the sign of \\a T.\"\n\nI would add something like this.\n\n\n> + *\n> + * \\tparam I Number of integer bits\n> + * \\tparam F Number of fractional bits\n> + * \\tparam T Integral type used to store the quantised value\n> + */\n> +\n> +/**\n> + * \\typedef FixedPointQTraits::QuantizedType\n> + * \\brief The integral storage type used for the fixed-point representation\n> + */\n> +\n> +/**\n> + * \\var FixedPointQTraits::qMin\n> + * \\brief Minimum representable quantised integer value\n> + *\n> + * This corresponds to the most negative value for signed formats or zero for\n> + * unsigned formats.\n> + */\n> +\n> +/**\n> + * \\var FixedPointQTraits::qMax\n> + * \\brief Maximum representable quantised integer value\n> + */\n> +\n> +/**\n> + * \\var FixedPointQTraits::min\n> + * \\brief Minimum representable floating-point value corresponding to qMin\n> + */\n> +\n> +/**\n> + * \\var FixedPointQTraits::max\n> + * \\brief Maximum representable floating-point value corresponding to qMax\n> + */\n> +\n> +/**\n> + * \\fn FixedPointQTraits::fromFloat(float v)\n> + * \\brief Convert a floating-point value to a fixed-point integer\n> + * \\param[in] v The floating-point value to be converted\n> + * \\return The quantised fixed-point integer representation\n> + *\n> + * The conversion rounds the floating-point input \\a v to the nearest integer\n> + * according to the scaling factor defined by the number of fractional bits F.\n\nI think it's worth mentioning that `v` is clamped first.\n\n\n> + */\n> +\n> +/**\n> + * \\fn FixedPointQTraits::toFloat(QuantizedType q)\n> + * \\brief Convert a fixed-point integer to a floating-point value\n> + * \\param[in] q The fixed-point integer value to be converted\n> + * \\return The corresponding floating-point value\n> + *\n> + * The conversion sign-extends the integer value if required and divides by the\n> + * scaling factor defined by the number of fractional bits F.\n> + */\n> +\n> +/**\n> + * \\typedef Q\n> + * \\brief Define a signed fixed-point quantised type with automatic storage width\n> + * \\tparam I The number of integer bits\n> + * \\tparam F The number of fractional bits\n> + *\n> + * This alias defines a signed fixed-point quantised type using the\n> + * \\ref FixedPointQTraits trait and a suitable signed integer storage type\n> + * automatically selected based on the total number of bits \\a (I + F).\n> + */\n> +\n> +/**\n> + * \\typedef UQ\n> + * \\brief Define an unsigned fixed-point quantised type with automatic storage width\n> + * \\tparam I The number of integer bits\n> + * \\tparam F The number of fractional bits\n> + *\n> + * This alias defines an unsigned fixed-point quantised type using the\n> + * \\ref FixedPointQTraits trait and a suitable unsigned integer storage type\n> + * automatically selected based on the total number of bits \\a (I + F).\n> + */\n> +\n>   } /* namespace ipa */\n>   \n>   } /* namespace libcamera */\n> diff --git a/src/ipa/libipa/fixedpoint.h b/src/ipa/libipa/fixedpoint.h\n> index b4a7fa5e0ecd..4f6ee081604b 100644\n> --- a/src/ipa/libipa/fixedpoint.h\n> +++ b/src/ipa/libipa/fixedpoint.h\n> @@ -10,6 +10,8 @@\n>   #include <cmath>\n>   #include <type_traits>\n>   \n> +#include \"quantized.h\"\n> +\n>   namespace libcamera {\n>   \n>   namespace ipa {\n> @@ -63,6 +65,73 @@ constexpr R fixedToFloatingPoint(T number)\n>   \treturn static_cast<R>(t) / static_cast<R>(1 << F);\n>   }\n>   \n> +template<unsigned int I, unsigned int F, typename T>\n> +struct FixedPointQTraits {\n> +private:\n> +\tstatic_assert(std::is_integral_v<T>, \"FixedPointQTraits: T must be integral\");\n> +\tusing UT = std::make_unsigned_t<T>;\n> +\n> +\tstatic constexpr unsigned int bits = I + F;\n> +\tstatic_assert(bits <= sizeof(T) * 8, \"FixedPointQTraits: too many bits for type T\");\n> +\n> +\tstatic constexpr T bitMask = (bits < sizeof(T) * 8)\n> +\t\t\t\t   ? static_cast<T>((UT{1} << bits) - 1)\n> +\t\t\t\t   : static_cast<T>(~UT{0});\n\nI think `static_cast<T>((UT{1} << bits) - 1)` should work in every case.\nIf `I+F` is the full width, then `(UT{1} << bits) == 0`, subtracting one\nyields `~UT{0}`. (Unless the usual integer promotions apply, but then `bits`\nis less than the width of promoted-to type, so no overflow/wraparound is possible\nas far as I can tell.)\n\n\n> +\n> +public:\n> +\tusing QuantizedType = T;\n> +\n> +\tstatic constexpr T qMin = std::is_signed_v<T>\n> +\t\t\t\t? static_cast<T>(-(UT{1} << (bits - 1)))\n> +\t\t\t\t: static_cast<T>(0);\n> +\n> +\tstatic constexpr T qMax = std::is_signed_v<T>\n> +\t\t\t\t? static_cast<T>((UT{1} << (bits - 1)) - 1)\n> +\t\t\t\t: static_cast<T>((UT{1} << bits) - 1);\n> +\n> +\tstatic constexpr float toFloat(QuantizedType q)\n> +\t{\n> +\t\treturn fixedToFloatingPoint<I, F, float, QuantizedType>(q);\n> +\t}\n> +\n> +\tstatic constexpr float min = fixedToFloatingPoint<I, F, float>(qMin);\n> +\tstatic constexpr float max = fixedToFloatingPoint<I, F, float>(qMax);\n> +\n> +\tstatic_assert(min < max, \"FixedPointQTraits: Minimum must be less than maximum\");\n> +\n> +\t/* Conversion functions required by Quantized<Traits> */\n> +\tstatic QuantizedType fromFloat(float v)\n> +\t{\n> +\t\tv = std::clamp(v, min, max);\n> +\t\treturn floatingToFixedPoint<I, F, QuantizedType, float>(v);\n> +\t}\n> +};\n> +\n> +namespace details {\n> +\n> +template<unsigned int Bits>\n> +constexpr auto qtype()\n> +{\n> +\tstatic_assert(Bits <= 64);\n> +\n> +\tif constexpr (Bits <= 8)\n> +\t\treturn int8_t();\n> +\telse if constexpr (Bits <= 16)\n> +\t\treturn int16_t();\n> +\telse if constexpr (Bits <= 32)\n> +\t\treturn int32_t();\n> +\telse if constexpr (Bits <= 64)\n> +\t\treturn int64_t();\n> +}\n> +\n> +} /* namespace details */\n> +\n> +template<unsigned int I, unsigned int F>\n> +using Q = Quantized<FixedPointQTraits<I, F, decltype(details::qtype<I + F>())>>;\n> +\n> +template<unsigned int I, unsigned int F>\n> +using UQ = Quantized<FixedPointQTraits<I, F, std::make_unsigned_t<decltype(details::qtype<I + F>())>>>;\n> +\n>   } /* namespace ipa */\n>   \n>   } /* namespace libcamera */","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 523D5C3226\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 15 Jan 2026 16:28:49 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 0A72061FC0;\n\tThu, 15 Jan 2026 17:28:49 +0100 (CET)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[213.167.242.64])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 3D31561F84\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 15 Jan 2026 17:28:48 +0100 (CET)","from [192.168.33.20] (185.221.143.114.nat.pool.zt.hu\n\t[185.221.143.114])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id A22CB229;\n\tThu, 15 Jan 2026 17:28:20 +0100 (CET)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"dy79P3kv\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1768494500;\n\tbh=mierYUKmUh7RUPWRQuATdKx55HrXwlkkSntB5av0/dE=;\n\th=Date:Subject:To:Cc:References:From:In-Reply-To:From;\n\tb=dy79P3kvS3t5RKfcKmajAx6KcgETrDxIXfH45XMjG9TwdRliGA0mIeolHIBUmP1W8\n\tF/qZFmx2pGB0exuU9qJQfnKIvpHW+1wZU3u3fKGRtg5KIS63PSOo/ipov8D8hOCqJK\n\tdLSXM2XjJ2AB4c8XAvD7W+zhJ7FU8doC5kQAiaOw=","Message-ID":"<7f271b8a-40bd-4d19-8a58-f9d320e3fa58@ideasonboard.com>","Date":"Thu, 15 Jan 2026 17:28:45 +0100","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","To":"Kieran Bingham <kieran.bingham@ideasonboard.com>,\n\tlibcamera devel <libcamera-devel@lists.libcamera.org>","Cc":"Isaac Scott <isaac.scott@ideasonboard.com>","References":"<20260114173918.1744023-1-kieran.bingham@ideasonboard.com>\n\t<20260114173918.1744023-5-kieran.bingham@ideasonboard.com>","From":"=?utf-8?q?Barnab=C3=A1s_P=C5=91cze?= <barnabas.pocze@ideasonboard.com>","Content-Language":"en-US, hu-HU","In-Reply-To":"<20260114173918.1744023-5-kieran.bingham@ideasonboard.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":37790,"web_url":"https://patchwork.libcamera.org/comment/37790/","msgid":"<176899994091.1693075.17653379271066445227@ping.linuxembedded.co.uk>","date":"2026-01-21T12:52:20","subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","submitter":{"id":4,"url":"https://patchwork.libcamera.org/api/people/4/","name":"Kieran Bingham","email":"kieran.bingham@ideasonboard.com"},"content":"Quoting Barnabás Pőcze (2026-01-15 16:28:45)\n> 2026. 01. 14. 18:39 keltezéssel, Kieran Bingham írta:\n> > Extend the new Quantized type infrastructure by providing a\n> > FixedPointQTraits template.\n> > \n> > This allows construction of fixed point types with a Quantized storage\n> > that allows easy reading of both the underlying quantized type value and\n> > a floating point representation of that same value.\n> > \n> > Reviewed-by: Isaac Scott <isaac.scott@ideasonboard.com>\n> > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n> > \n> > ---\n> > v4:\n> > - Assert that the given type has enough bits for the usage\n> > - Use unsigned types for calculating qmin/qmax\n> > - Reorder toFloat/fromFloat and min/max for future inlining\n> > - Make toFloat and fromFloat constexpr\n> > \n> > v5:\n> > - Make UT, Bits and Bitmask private (and remove doxygen)\n> > - Remove constexpr from fromFloat which uses std::round (only constexpr\n> >    in C++23)\n> > - static_assert that min<max when converted\n> > - Provide new Q and UQ automatic width types (Thanks Barnabás)\n> > - Convert types to shortened Q/UQ automatic widths\n> > - Use automatic width Q/UQ for 12,4\n> > - change qmin->qMin qmax->qMax Bits->bits BitMask->bitMask\n> > - Remove typedefs for Q1_7 etc\n> > \n> > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n> > ---\n> >   src/ipa/libipa/fixedpoint.cpp | 89 +++++++++++++++++++++++++++++++++++\n> >   src/ipa/libipa/fixedpoint.h   | 69 +++++++++++++++++++++++++++\n> >   2 files changed, 158 insertions(+)\n> > \n> > diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp\n> > index 6b698fc5d680..43d76f745d8a 100644\n> > --- a/src/ipa/libipa/fixedpoint.cpp\n> > +++ b/src/ipa/libipa/fixedpoint.cpp\n> > @@ -37,6 +37,95 @@ namespace ipa {\n> >    * \\return The converted value\n> >    */\n> >   \n> > +/**\n> > + * \\struct libcamera::ipa::FixedPointQTraits\n> > + * \\brief Traits type implementing fixed-point quantisation conversions\n> \n> In a different commit you used \"quanti[z]...\", I think consistency would be better.\n> \n> \n> > + *\n> > + * The FixedPointQTraits structure defines a policy for mapping floating-point\n> > + * values to and from fixed-point integer representations. It is parameterised\n> > + * by the number of integer bits \\a I, fractional bits \\a F, and the integral\n> > + * storage type \\a T. The traits are used with Quantized<Traits> to create a\n> > + * quantised type that stores both the fixed-point representation and the\n> > + * corresponding floating-point value.\n> > + *\n> > + * The trait exposes compile-time constants describing the bit layout, limits,\n> > + * and scaling factors used in the fixed-point representation.\n> \n>    \"The sign of the value is determined by the sign of \\a T.\"\n> \n> I would add something like this.\n> \n\nAlso confirmed that the I contains the sign bit for signed types:\n\n+ * The sign of the value is determined by the sign of \\a T. For signed types,\n+ * the number of integer bits in \\a I includes the sign bit.\n\n\n\n> > + *\n> > + * \\tparam I Number of integer bits\n> > + * \\tparam F Number of fractional bits\n> > + * \\tparam T Integral type used to store the quantised value\n> > + */\n> > +\n> > +/**\n> > + * \\typedef FixedPointQTraits::QuantizedType\n> > + * \\brief The integral storage type used for the fixed-point representation\n> > + */\n> > +\n> > +/**\n> > + * \\var FixedPointQTraits::qMin\n> > + * \\brief Minimum representable quantised integer value\n> > + *\n> > + * This corresponds to the most negative value for signed formats or zero for\n> > + * unsigned formats.\n> > + */\n> > +\n> > +/**\n> > + * \\var FixedPointQTraits::qMax\n> > + * \\brief Maximum representable quantised integer value\n> > + */\n> > +\n> > +/**\n> > + * \\var FixedPointQTraits::min\n> > + * \\brief Minimum representable floating-point value corresponding to qMin\n> > + */\n> > +\n> > +/**\n> > + * \\var FixedPointQTraits::max\n> > + * \\brief Maximum representable floating-point value corresponding to qMax\n> > + */\n> > +\n> > +/**\n> > + * \\fn FixedPointQTraits::fromFloat(float v)\n> > + * \\brief Convert a floating-point value to a fixed-point integer\n> > + * \\param[in] v The floating-point value to be converted\n> > + * \\return The quantised fixed-point integer representation\n> > + *\n> > + * The conversion rounds the floating-point input \\a v to the nearest integer\n> > + * according to the scaling factor defined by the number of fractional bits F.\n> \n> I think it's worth mentioning that `v` is clamped first.\n\nAdded.\n\n\n- * The conversion rounds the floating-point input \\a v to the nearest integer\n- * according to the scaling factor defined by the number of fractional bits F.\n+ * The conversion first clamps the floating-point input \\a v to the range [min,\n+ * max] and then rounds it to the nearest integer according to the scaling\n+ * factor defined by the number of fractional bits F.\n\n\n> \n> \n> > + */\n> > +\n> > +/**\n> > + * \\fn FixedPointQTraits::toFloat(QuantizedType q)\n> > + * \\brief Convert a fixed-point integer to a floating-point value\n> > + * \\param[in] q The fixed-point integer value to be converted\n> > + * \\return The corresponding floating-point value\n> > + *\n> > + * The conversion sign-extends the integer value if required and divides by the\n> > + * scaling factor defined by the number of fractional bits F.\n> > + */\n> > +\n> > +/**\n> > + * \\typedef Q\n> > + * \\brief Define a signed fixed-point quantised type with automatic storage width\n> > + * \\tparam I The number of integer bits\n> > + * \\tparam F The number of fractional bits\n> > + *\n> > + * This alias defines a signed fixed-point quantised type using the\n> > + * \\ref FixedPointQTraits trait and a suitable signed integer storage type\n> > + * automatically selected based on the total number of bits \\a (I + F).\n> > + */\n> > +\n> > +/**\n> > + * \\typedef UQ\n> > + * \\brief Define an unsigned fixed-point quantised type with automatic storage width\n> > + * \\tparam I The number of integer bits\n> > + * \\tparam F The number of fractional bits\n> > + *\n> > + * This alias defines an unsigned fixed-point quantised type using the\n> > + * \\ref FixedPointQTraits trait and a suitable unsigned integer storage type\n> > + * automatically selected based on the total number of bits \\a (I + F).\n> > + */\n> > +\n> >   } /* namespace ipa */\n> >   \n> >   } /* namespace libcamera */\n> > diff --git a/src/ipa/libipa/fixedpoint.h b/src/ipa/libipa/fixedpoint.h\n> > index b4a7fa5e0ecd..4f6ee081604b 100644\n> > --- a/src/ipa/libipa/fixedpoint.h\n> > +++ b/src/ipa/libipa/fixedpoint.h\n> > @@ -10,6 +10,8 @@\n> >   #include <cmath>\n> >   #include <type_traits>\n> >   \n> > +#include \"quantized.h\"\n> > +\n> >   namespace libcamera {\n> >   \n> >   namespace ipa {\n> > @@ -63,6 +65,73 @@ constexpr R fixedToFloatingPoint(T number)\n> >       return static_cast<R>(t) / static_cast<R>(1 << F);\n> >   }\n> >   \n> > +template<unsigned int I, unsigned int F, typename T>\n> > +struct FixedPointQTraits {\n> > +private:\n> > +     static_assert(std::is_integral_v<T>, \"FixedPointQTraits: T must be integral\");\n> > +     using UT = std::make_unsigned_t<T>;\n> > +\n> > +     static constexpr unsigned int bits = I + F;\n> > +     static_assert(bits <= sizeof(T) * 8, \"FixedPointQTraits: too many bits for type T\");\n> > +\n> > +     static constexpr T bitMask = (bits < sizeof(T) * 8)\n> > +                                ? static_cast<T>((UT{1} << bits) - 1)\n> > +                                : static_cast<T>(~UT{0});\n> \n> I think `static_cast<T>((UT{1} << bits) - 1)` should work in every case.\n> If `I+F` is the full width, then `(UT{1} << bits) == 0`, subtracting one\n> yields `~UT{0}`. (Unless the usual integer promotions apply, but then `bits`\n> is less than the width of promoted-to type, so no overflow/wraparound is possible\n> as far as I can tell.)\n\nIt was so long ago - but I went through so much compiler matrix pain\nhere with compiler warnings with different things I tried.\n\nBut ultimately - this works around/supports using the maximum size\nwithout hitting this:\n\n\n https://godbolt.org/z/W7sE4s8oc\n\n#include <cstdint>\n#include <iostream>\n\nint main()\n{\n    uint32_t bits = 32;\n\n    uint32_t x = uint32_t{1} << bits;\n\n    std::cout << x << \"\\n\";\n}\n\nwith -std=c++17 -Wall -Werror -fsanitize=undefined\n\nProgram returned: 0\n/app/example.cpp:8:30: runtime error: shift exponent 32 is too large for\n32-bit type 'uint32_t' (aka 'unsigned int')\nSUMMARY: UndefinedBehaviorSanitizer: undefined-behavior\n/app/example.cpp:8:30 \n1\n\nSo I don't think I can simplify here.\n\n\n> \n> \n> > +\n> > +public:\n> > +     using QuantizedType = T;\n> > +\n> > +     static constexpr T qMin = std::is_signed_v<T>\n> > +                             ? static_cast<T>(-(UT{1} << (bits - 1)))\n> > +                             : static_cast<T>(0);\n> > +\n> > +     static constexpr T qMax = std::is_signed_v<T>\n> > +                             ? static_cast<T>((UT{1} << (bits - 1)) - 1)\n> > +                             : static_cast<T>((UT{1} << bits) - 1);\n> > +\n> > +     static constexpr float toFloat(QuantizedType q)\n> > +     {\n> > +             return fixedToFloatingPoint<I, F, float, QuantizedType>(q);\n> > +     }\n> > +\n> > +     static constexpr float min = fixedToFloatingPoint<I, F, float>(qMin);\n> > +     static constexpr float max = fixedToFloatingPoint<I, F, float>(qMax);\n> > +\n> > +     static_assert(min < max, \"FixedPointQTraits: Minimum must be less than maximum\");\n> > +\n> > +     /* Conversion functions required by Quantized<Traits> */\n> > +     static QuantizedType fromFloat(float v)\n> > +     {\n> > +             v = std::clamp(v, min, max);\n> > +             return floatingToFixedPoint<I, F, QuantizedType, float>(v);\n> > +     }\n> > +};\n> > +\n> > +namespace details {\n> > +\n> > +template<unsigned int Bits>\n> > +constexpr auto qtype()\n> > +{\n> > +     static_assert(Bits <= 64);\n> > +\n> > +     if constexpr (Bits <= 8)\n> > +             return int8_t();\n> > +     else if constexpr (Bits <= 16)\n> > +             return int16_t();\n> > +     else if constexpr (Bits <= 32)\n> > +             return int32_t();\n> > +     else if constexpr (Bits <= 64)\n> > +             return int64_t();\n> > +}\n> > +\n> > +} /* namespace details */\n> > +\n> > +template<unsigned int I, unsigned int F>\n> > +using Q = Quantized<FixedPointQTraits<I, F, decltype(details::qtype<I + F>())>>;\n> > +\n> > +template<unsigned int I, unsigned int F>\n> > +using UQ = Quantized<FixedPointQTraits<I, F, std::make_unsigned_t<decltype(details::qtype<I + F>())>>>;\n> > +\n> >   } /* namespace ipa */\n> >   \n> >   } /* namespace libcamera */\n>","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id BB28FC3220\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 21 Jan 2026 12:52:26 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id E6EBD61FC9;\n\tWed, 21 Jan 2026 13:52:25 +0100 (CET)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 6498761F9F\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 21 Jan 2026 13:52:24 +0100 (CET)","from pendragon.ideasonboard.com\n\t(cpc89244-aztw30-2-0-cust6594.18-1.cable.virginm.net [86.31.185.195])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 68BAE741;\n\tWed, 21 Jan 2026 13:51:52 +0100 (CET)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"gDrfcoic\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1768999912;\n\tbh=lRIKyMXuYvVwg2j7TCGRLa7PT2Dz1cK+c+qJORqbc80=;\n\th=In-Reply-To:References:Subject:From:Cc:To:Date:From;\n\tb=gDrfcoicOZe8UOq2Kvqpf23sVKT+NbsJqbmPPeJ0C8kGQ8OE4VgnLFCxyxrOZb4Ns\n\t5Xxnl9KJL7yBjgLwJ97kCNtpzFSPJnNNuMTCTBuvVgqVJVheoMG1QGdzsxdN3iMee1\n\tu6MVY+QjiYsdeUsbJI3emwywbseoYcQv02jh+4Go=","Content-Type":"text/plain; charset=\"utf-8\"","MIME-Version":"1.0","Content-Transfer-Encoding":"quoted-printable","In-Reply-To":"<7f271b8a-40bd-4d19-8a58-f9d320e3fa58@ideasonboard.com>","References":"<20260114173918.1744023-1-kieran.bingham@ideasonboard.com>\n\t<20260114173918.1744023-5-kieran.bingham@ideasonboard.com>\n\t<7f271b8a-40bd-4d19-8a58-f9d320e3fa58@ideasonboard.com>","Subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","From":"Kieran Bingham <kieran.bingham@ideasonboard.com>","Cc":"Isaac Scott <isaac.scott@ideasonboard.com>","To":"=?utf-8?q?Barnab=C3=A1s_P=C5=91cze?= <barnabas.pocze@ideasonboard.com>,\n\tlibcamera devel <libcamera-devel@lists.libcamera.org>","Date":"Wed, 21 Jan 2026 12:52:20 +0000","Message-ID":"<176899994091.1693075.17653379271066445227@ping.linuxembedded.co.uk>","User-Agent":"alot/0.9.1","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":37795,"web_url":"https://patchwork.libcamera.org/comment/37795/","msgid":"<3f10c916-3c79-4c0b-b8dc-711a77ff948c@ideasonboard.com>","date":"2026-01-21T13:41:18","subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","submitter":{"id":216,"url":"https://patchwork.libcamera.org/api/people/216/","name":"Barnabás Pőcze","email":"barnabas.pocze@ideasonboard.com"},"content":"2026. 01. 21. 13:52 keltezéssel, Kieran Bingham írta:\n> Quoting Barnabás Pőcze (2026-01-15 16:28:45)\n>> 2026. 01. 14. 18:39 keltezéssel, Kieran Bingham írta:\n>>> Extend the new Quantized type infrastructure by providing a\n>>> FixedPointQTraits template.\n>>>\n>>> This allows construction of fixed point types with a Quantized storage\n>>> that allows easy reading of both the underlying quantized type value and\n>>> a floating point representation of that same value.\n>>>\n>>> Reviewed-by: Isaac Scott <isaac.scott@ideasonboard.com>\n>>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n>>>\n>>> ---\n>>> v4:\n>>> - Assert that the given type has enough bits for the usage\n>>> - Use unsigned types for calculating qmin/qmax\n>>> - Reorder toFloat/fromFloat and min/max for future inlining\n>>> - Make toFloat and fromFloat constexpr\n>>>\n>>> v5:\n>>> - Make UT, Bits and Bitmask private (and remove doxygen)\n>>> - Remove constexpr from fromFloat which uses std::round (only constexpr\n>>>     in C++23)\n>>> - static_assert that min<max when converted\n>>> - Provide new Q and UQ automatic width types (Thanks Barnabás)\n>>> - Convert types to shortened Q/UQ automatic widths\n>>> - Use automatic width Q/UQ for 12,4\n>>> - change qmin->qMin qmax->qMax Bits->bits BitMask->bitMask\n>>> - Remove typedefs for Q1_7 etc\n>>>\n>>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n>>> ---\n>>>    src/ipa/libipa/fixedpoint.cpp | 89 +++++++++++++++++++++++++++++++++++\n>>>    src/ipa/libipa/fixedpoint.h   | 69 +++++++++++++++++++++++++++\n>>>    2 files changed, 158 insertions(+)\n>>>\n>>> diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp\n>>> index 6b698fc5d680..43d76f745d8a 100644\n>>> --- a/src/ipa/libipa/fixedpoint.cpp\n>>> +++ b/src/ipa/libipa/fixedpoint.cpp\n>>> @@ -37,6 +37,95 @@ namespace ipa {\n>>>     * \\return The converted value\n>>>     */\n>>>    \n>>> +/**\n>>> + * \\struct libcamera::ipa::FixedPointQTraits\n>>> + * \\brief Traits type implementing fixed-point quantisation conversions\n>>\n>> In a different commit you used \"quanti[z]...\", I think consistency would be better.\n>>\n>>\n>>> + *\n>>> + * The FixedPointQTraits structure defines a policy for mapping floating-point\n>>> + * values to and from fixed-point integer representations. It is parameterised\n>>> + * by the number of integer bits \\a I, fractional bits \\a F, and the integral\n>>> + * storage type \\a T. The traits are used with Quantized<Traits> to create a\n>>> + * quantised type that stores both the fixed-point representation and the\n>>> + * corresponding floating-point value.\n>>> + *\n>>> + * The trait exposes compile-time constants describing the bit layout, limits,\n>>> + * and scaling factors used in the fixed-point representation.\n>>\n>>     \"The sign of the value is determined by the sign of \\a T.\"\n>>\n>> I would add something like this.\n>>\n> \n> Also confirmed that the I contains the sign bit for signed types:\n> \n> + * The sign of the value is determined by the sign of \\a T. For signed types,\n> + * the number of integer bits in \\a I includes the sign bit.\n> \n> \n> \n>>> + *\n>>> + * \\tparam I Number of integer bits\n>>> + * \\tparam F Number of fractional bits\n>>> + * \\tparam T Integral type used to store the quantised value\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\typedef FixedPointQTraits::QuantizedType\n>>> + * \\brief The integral storage type used for the fixed-point representation\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\var FixedPointQTraits::qMin\n>>> + * \\brief Minimum representable quantised integer value\n>>> + *\n>>> + * This corresponds to the most negative value for signed formats or zero for\n>>> + * unsigned formats.\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\var FixedPointQTraits::qMax\n>>> + * \\brief Maximum representable quantised integer value\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\var FixedPointQTraits::min\n>>> + * \\brief Minimum representable floating-point value corresponding to qMin\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\var FixedPointQTraits::max\n>>> + * \\brief Maximum representable floating-point value corresponding to qMax\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\fn FixedPointQTraits::fromFloat(float v)\n>>> + * \\brief Convert a floating-point value to a fixed-point integer\n>>> + * \\param[in] v The floating-point value to be converted\n>>> + * \\return The quantised fixed-point integer representation\n>>> + *\n>>> + * The conversion rounds the floating-point input \\a v to the nearest integer\n>>> + * according to the scaling factor defined by the number of fractional bits F.\n>>\n>> I think it's worth mentioning that `v` is clamped first.\n> \n> Added.\n> \n> \n> - * The conversion rounds the floating-point input \\a v to the nearest integer\n> - * according to the scaling factor defined by the number of fractional bits F.\n> + * The conversion first clamps the floating-point input \\a v to the range [min,\n> + * max] and then rounds it to the nearest integer according to the scaling\n> + * factor defined by the number of fractional bits F.\n> \n> \n>>\n>>\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\fn FixedPointQTraits::toFloat(QuantizedType q)\n>>> + * \\brief Convert a fixed-point integer to a floating-point value\n>>> + * \\param[in] q The fixed-point integer value to be converted\n>>> + * \\return The corresponding floating-point value\n>>> + *\n>>> + * The conversion sign-extends the integer value if required and divides by the\n>>> + * scaling factor defined by the number of fractional bits F.\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\typedef Q\n>>> + * \\brief Define a signed fixed-point quantised type with automatic storage width\n>>> + * \\tparam I The number of integer bits\n>>> + * \\tparam F The number of fractional bits\n>>> + *\n>>> + * This alias defines a signed fixed-point quantised type using the\n>>> + * \\ref FixedPointQTraits trait and a suitable signed integer storage type\n>>> + * automatically selected based on the total number of bits \\a (I + F).\n>>> + */\n>>> +\n>>> +/**\n>>> + * \\typedef UQ\n>>> + * \\brief Define an unsigned fixed-point quantised type with automatic storage width\n>>> + * \\tparam I The number of integer bits\n>>> + * \\tparam F The number of fractional bits\n>>> + *\n>>> + * This alias defines an unsigned fixed-point quantised type using the\n>>> + * \\ref FixedPointQTraits trait and a suitable unsigned integer storage type\n>>> + * automatically selected based on the total number of bits \\a (I + F).\n>>> + */\n>>> +\n>>>    } /* namespace ipa */\n>>>    \n>>>    } /* namespace libcamera */\n>>> diff --git a/src/ipa/libipa/fixedpoint.h b/src/ipa/libipa/fixedpoint.h\n>>> index b4a7fa5e0ecd..4f6ee081604b 100644\n>>> --- a/src/ipa/libipa/fixedpoint.h\n>>> +++ b/src/ipa/libipa/fixedpoint.h\n>>> @@ -10,6 +10,8 @@\n>>>    #include <cmath>\n>>>    #include <type_traits>\n>>>    \n>>> +#include \"quantized.h\"\n>>> +\n>>>    namespace libcamera {\n>>>    \n>>>    namespace ipa {\n>>> @@ -63,6 +65,73 @@ constexpr R fixedToFloatingPoint(T number)\n>>>        return static_cast<R>(t) / static_cast<R>(1 << F);\n>>>    }\n>>>    \n>>> +template<unsigned int I, unsigned int F, typename T>\n>>> +struct FixedPointQTraits {\n>>> +private:\n>>> +     static_assert(std::is_integral_v<T>, \"FixedPointQTraits: T must be integral\");\n>>> +     using UT = std::make_unsigned_t<T>;\n>>> +\n>>> +     static constexpr unsigned int bits = I + F;\n>>> +     static_assert(bits <= sizeof(T) * 8, \"FixedPointQTraits: too many bits for type T\");\n>>> +\n>>> +     static constexpr T bitMask = (bits < sizeof(T) * 8)\n>>> +                                ? static_cast<T>((UT{1} << bits) - 1)\n>>> +                                : static_cast<T>(~UT{0});\n>>\n>> I think `static_cast<T>((UT{1} << bits) - 1)` should work in every case.\n>> If `I+F` is the full width, then `(UT{1} << bits) == 0`, subtracting one\n>> yields `~UT{0}`. (Unless the usual integer promotions apply, but then `bits`\n>> is less than the width of promoted-to type, so no overflow/wraparound is possible\n>> as far as I can tell.)\n> \n> It was so long ago - but I went through so much compiler matrix pain\n> here with compiler warnings with different things I tried.\n> \n> But ultimately - this works around/supports using the maximum size\n> without hitting this:\n> \n> \n>   https://godbolt.org/z/W7sE4s8oc\n> \n> #include <cstdint>\n> #include <iostream>\n> \n> int main()\n> {\n>      uint32_t bits = 32;\n> \n>      uint32_t x = uint32_t{1} << bits;\n> \n>      std::cout << x << \"\\n\";\n> }\n> \n> with -std=c++17 -Wall -Werror -fsanitize=undefined\n> \n> Program returned: 0\n> /app/example.cpp:8:30: runtime error: shift exponent 32 is too large for\n> 32-bit type 'uint32_t' (aka 'unsigned int')\n> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior\n> /app/example.cpp:8:30\n> 1\n> \n> So I don't think I can simplify here.\n\nAhh, indeed, sorry. I was too fixated on the left operand that I have\nmissed that the right operand must not be greater or equal to the bit\nwidth of the left operand.\n\n\n> \n> \n>>\n>>\n>>> +\n>>> +public:\n>>> +     using QuantizedType = T;\n>>> +\n>>> +     static constexpr T qMin = std::is_signed_v<T>\n>>> +                             ? static_cast<T>(-(UT{1} << (bits - 1)))\n>>> +                             : static_cast<T>(0);\n>>> +\n>>> +     static constexpr T qMax = std::is_signed_v<T>\n>>> +                             ? static_cast<T>((UT{1} << (bits - 1)) - 1)\n>>> +                             : static_cast<T>((UT{1} << bits) - 1);\n>>> +\n>>> +     static constexpr float toFloat(QuantizedType q)\n>>> +     {\n>>> +             return fixedToFloatingPoint<I, F, float, QuantizedType>(q);\n>>> +     }\n>>> +\n>>> +     static constexpr float min = fixedToFloatingPoint<I, F, float>(qMin);\n>>> +     static constexpr float max = fixedToFloatingPoint<I, F, float>(qMax);\n>>> +\n>>> +     static_assert(min < max, \"FixedPointQTraits: Minimum must be less than maximum\");\n>>> +\n>>> +     /* Conversion functions required by Quantized<Traits> */\n>>> +     static QuantizedType fromFloat(float v)\n>>> +     {\n>>> +             v = std::clamp(v, min, max);\n>>> +             return floatingToFixedPoint<I, F, QuantizedType, float>(v);\n>>> +     }\n>>> +};\n>>> +\n>>> +namespace details {\n>>> +\n>>> +template<unsigned int Bits>\n>>> +constexpr auto qtype()\n>>> +{\n>>> +     static_assert(Bits <= 64);\n>>> +\n>>> +     if constexpr (Bits <= 8)\n>>> +             return int8_t();\n>>> +     else if constexpr (Bits <= 16)\n>>> +             return int16_t();\n>>> +     else if constexpr (Bits <= 32)\n>>> +             return int32_t();\n>>> +     else if constexpr (Bits <= 64)\n>>> +             return int64_t();\n>>> +}\n>>> +\n>>> +} /* namespace details */\n>>> +\n>>> +template<unsigned int I, unsigned int F>\n>>> +using Q = Quantized<FixedPointQTraits<I, F, decltype(details::qtype<I + F>())>>;\n>>> +\n>>> +template<unsigned int I, unsigned int F>\n>>> +using UQ = Quantized<FixedPointQTraits<I, F, std::make_unsigned_t<decltype(details::qtype<I + F>())>>>;\n>>> +\n>>>    } /* namespace ipa */\n>>>    \n>>>    } /* namespace libcamera */\n>>","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 7D839C3220\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 21 Jan 2026 13:41:27 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id CC09761F9F;\n\tWed, 21 Jan 2026 14:41:26 +0100 (CET)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 0F96361F9F\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 21 Jan 2026 14:41:25 +0100 (CET)","from [192.168.33.24] (185.221.143.114.nat.pool.zt.hu\n\t[185.221.143.114])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 04D02B3;\n\tWed, 21 Jan 2026 14:40:52 +0100 (CET)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"RSf0Xxpg\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1769002853;\n\tbh=kkDohIeOcN6AE+wq/REwZyQj0w52CEv5gm9LuUjsEZA=;\n\th=Date:Subject:To:Cc:References:From:In-Reply-To:From;\n\tb=RSf0XxpgmudjfeL4/WvmcrDvCYE2aJsjeiaOCyr9PGh4KuUeHepTiSeIlAYXzDGdT\n\t/u8qMSNdCJkvekepryluWpG2yOzLgTOQbjnjyUd/ZZATqz6tPDfhnFnjYAzzzIhL8e\n\tbHmICvo5gDgq2KHa4WcD0EARFMGRF8Ic8oq3rccw=","Message-ID":"<3f10c916-3c79-4c0b-b8dc-711a77ff948c@ideasonboard.com>","Date":"Wed, 21 Jan 2026 14:41:18 +0100","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","To":"Kieran Bingham <kieran.bingham@ideasonboard.com>,\n\tlibcamera devel <libcamera-devel@lists.libcamera.org>","Cc":"Isaac Scott <isaac.scott@ideasonboard.com>","References":"<20260114173918.1744023-1-kieran.bingham@ideasonboard.com>\n\t<20260114173918.1744023-5-kieran.bingham@ideasonboard.com>\n\t<7f271b8a-40bd-4d19-8a58-f9d320e3fa58@ideasonboard.com>\n\t<176899994091.1693075.17653379271066445227@ping.linuxembedded.co.uk>","From":"=?utf-8?q?Barnab=C3=A1s_P=C5=91cze?= <barnabas.pocze@ideasonboard.com>","Content-Language":"en-US, hu-HU","In-Reply-To":"<176899994091.1693075.17653379271066445227@ping.linuxembedded.co.uk>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"8bit","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":37802,"web_url":"https://patchwork.libcamera.org/comment/37802/","msgid":"<20260121154308.GB184493@killaraus>","date":"2026-01-21T15:43:08","subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"On Wed, Jan 21, 2026 at 02:41:18PM +0100, Barnabás Pőcze wrote:\n> 2026. 01. 21. 13:52 keltezéssel, Kieran Bingham írta:\n> > Quoting Barnabás Pőcze (2026-01-15 16:28:45)\n> >> 2026. 01. 14. 18:39 keltezéssel, Kieran Bingham írta:\n> >>> Extend the new Quantized type infrastructure by providing a\n> >>> FixedPointQTraits template.\n> >>>\n> >>> This allows construction of fixed point types with a Quantized storage\n> >>> that allows easy reading of both the underlying quantized type value and\n> >>> a floating point representation of that same value.\n> >>>\n> >>> Reviewed-by: Isaac Scott <isaac.scott@ideasonboard.com>\n> >>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n> >>>\n> >>> ---\n> >>> v4:\n> >>> - Assert that the given type has enough bits for the usage\n> >>> - Use unsigned types for calculating qmin/qmax\n> >>> - Reorder toFloat/fromFloat and min/max for future inlining\n> >>> - Make toFloat and fromFloat constexpr\n> >>>\n> >>> v5:\n> >>> - Make UT, Bits and Bitmask private (and remove doxygen)\n> >>> - Remove constexpr from fromFloat which uses std::round (only constexpr\n> >>>     in C++23)\n> >>> - static_assert that min<max when converted\n> >>> - Provide new Q and UQ automatic width types (Thanks Barnabás)\n> >>> - Convert types to shortened Q/UQ automatic widths\n> >>> - Use automatic width Q/UQ for 12,4\n> >>> - change qmin->qMin qmax->qMax Bits->bits BitMask->bitMask\n> >>> - Remove typedefs for Q1_7 etc\n> >>>\n> >>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>\n> >>> ---\n> >>>    src/ipa/libipa/fixedpoint.cpp | 89 +++++++++++++++++++++++++++++++++++\n> >>>    src/ipa/libipa/fixedpoint.h   | 69 +++++++++++++++++++++++++++\n> >>>    2 files changed, 158 insertions(+)\n> >>>\n> >>> diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp\n> >>> index 6b698fc5d680..43d76f745d8a 100644\n> >>> --- a/src/ipa/libipa/fixedpoint.cpp\n> >>> +++ b/src/ipa/libipa/fixedpoint.cpp\n> >>> @@ -37,6 +37,95 @@ namespace ipa {\n> >>>     * \\return The converted value\n> >>>     */\n> >>>    \n> >>> +/**\n> >>> + * \\struct libcamera::ipa::FixedPointQTraits\n> >>> + * \\brief Traits type implementing fixed-point quantisation conversions\n> >>\n> >> In a different commit you used \"quanti[z]...\", I think consistency would be better.\n> >>\n> >>> + *\n> >>> + * The FixedPointQTraits structure defines a policy for mapping floating-point\n> >>> + * values to and from fixed-point integer representations. It is parameterised\n> >>> + * by the number of integer bits \\a I, fractional bits \\a F, and the integral\n> >>> + * storage type \\a T. The traits are used with Quantized<Traits> to create a\n> >>> + * quantised type that stores both the fixed-point representation and the\n> >>> + * corresponding floating-point value.\n> >>> + *\n> >>> + * The trait exposes compile-time constants describing the bit layout, limits,\n> >>> + * and scaling factors used in the fixed-point representation.\n> >>\n> >>     \"The sign of the value is determined by the sign of \\a T.\"\n> >>\n> >> I would add something like this.\n> >>\n> > \n> > Also confirmed that the I contains the sign bit for signed types:\n> > \n> > + * The sign of the value is determined by the sign of \\a T. For signed types,\n> > + * the number of integer bits in \\a I includes the sign bit.\n\nYou're documenting a type, so I would write \"The signedness of the type\nis determined by the signedness of \\a T\".\n\n> >>> + *\n> >>> + * \\tparam I Number of integer bits\n> >>> + * \\tparam F Number of fractional bits\n> >>> + * \\tparam T Integral type used to store the quantised value\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\typedef FixedPointQTraits::QuantizedType\n> >>> + * \\brief The integral storage type used for the fixed-point representation\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\var FixedPointQTraits::qMin\n> >>> + * \\brief Minimum representable quantised integer value\n> >>> + *\n> >>> + * This corresponds to the most negative value for signed formats or zero for\n> >>> + * unsigned formats.\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\var FixedPointQTraits::qMax\n> >>> + * \\brief Maximum representable quantised integer value\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\var FixedPointQTraits::min\n> >>> + * \\brief Minimum representable floating-point value corresponding to qMin\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\var FixedPointQTraits::max\n> >>> + * \\brief Maximum representable floating-point value corresponding to qMax\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\fn FixedPointQTraits::fromFloat(float v)\n> >>> + * \\brief Convert a floating-point value to a fixed-point integer\n> >>> + * \\param[in] v The floating-point value to be converted\n> >>> + * \\return The quantised fixed-point integer representation\n> >>> + *\n> >>> + * The conversion rounds the floating-point input \\a v to the nearest integer\n> >>> + * according to the scaling factor defined by the number of fractional bits F.\n> >>\n> >> I think it's worth mentioning that `v` is clamped first.\n> > \n> > Added.\n> > \n> > - * The conversion rounds the floating-point input \\a v to the nearest integer\n> > - * according to the scaling factor defined by the number of fractional bits F.\n> > + * The conversion first clamps the floating-point input \\a v to the range [min,\n> > + * max] and then rounds it to the nearest integer according to the scaling\n> > + * factor defined by the number of fractional bits F.\n> > \n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\fn FixedPointQTraits::toFloat(QuantizedType q)\n> >>> + * \\brief Convert a fixed-point integer to a floating-point value\n> >>> + * \\param[in] q The fixed-point integer value to be converted\n> >>> + * \\return The corresponding floating-point value\n> >>> + *\n> >>> + * The conversion sign-extends the integer value if required and divides by the\n> >>> + * scaling factor defined by the number of fractional bits F.\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\typedef Q\n> >>> + * \\brief Define a signed fixed-point quantised type with automatic storage width\n> >>> + * \\tparam I The number of integer bits\n> >>> + * \\tparam F The number of fractional bits\n> >>> + *\n> >>> + * This alias defines a signed fixed-point quantised type using the\n> >>> + * \\ref FixedPointQTraits trait and a suitable signed integer storage type\n> >>> + * automatically selected based on the total number of bits \\a (I + F).\n> >>> + */\n> >>> +\n> >>> +/**\n> >>> + * \\typedef UQ\n> >>> + * \\brief Define an unsigned fixed-point quantised type with automatic storage width\n> >>> + * \\tparam I The number of integer bits\n> >>> + * \\tparam F The number of fractional bits\n> >>> + *\n> >>> + * This alias defines an unsigned fixed-point quantised type using the\n> >>> + * \\ref FixedPointQTraits trait and a suitable unsigned integer storage type\n> >>> + * automatically selected based on the total number of bits \\a (I + F).\n> >>> + */\n> >>> +\n> >>>    } /* namespace ipa */\n> >>>    \n> >>>    } /* namespace libcamera */\n> >>> diff --git a/src/ipa/libipa/fixedpoint.h b/src/ipa/libipa/fixedpoint.h\n> >>> index b4a7fa5e0ecd..4f6ee081604b 100644\n> >>> --- a/src/ipa/libipa/fixedpoint.h\n> >>> +++ b/src/ipa/libipa/fixedpoint.h\n> >>> @@ -10,6 +10,8 @@\n> >>>    #include <cmath>\n> >>>    #include <type_traits>\n> >>>    \n> >>> +#include \"quantized.h\"\n> >>> +\n> >>>    namespace libcamera {\n> >>>    \n> >>>    namespace ipa {\n> >>> @@ -63,6 +65,73 @@ constexpr R fixedToFloatingPoint(T number)\n> >>>        return static_cast<R>(t) / static_cast<R>(1 << F);\n> >>>    }\n> >>>    \n> >>> +template<unsigned int I, unsigned int F, typename T>\n> >>> +struct FixedPointQTraits {\n> >>> +private:\n> >>> +     static_assert(std::is_integral_v<T>, \"FixedPointQTraits: T must be integral\");\n> >>> +     using UT = std::make_unsigned_t<T>;\n> >>> +\n> >>> +     static constexpr unsigned int bits = I + F;\n> >>> +     static_assert(bits <= sizeof(T) * 8, \"FixedPointQTraits: too many bits for type T\");\n> >>> +\n> >>> +     static constexpr T bitMask = (bits < sizeof(T) * 8)\n> >>> +                                ? static_cast<T>((UT{1} << bits) - 1)\n> >>> +                                : static_cast<T>(~UT{0});\n> >>\n> >> I think `static_cast<T>((UT{1} << bits) - 1)` should work in every case.\n> >> If `I+F` is the full width, then `(UT{1} << bits) == 0`, subtracting one\n> >> yields `~UT{0}`. (Unless the usual integer promotions apply, but then `bits`\n> >> is less than the width of promoted-to type, so no overflow/wraparound is possible\n> >> as far as I can tell.)\n> > \n> > It was so long ago - but I went through so much compiler matrix pain\n> > here with compiler warnings with different things I tried.\n> > \n> > But ultimately - this works around/supports using the maximum size\n> > without hitting this:\n> > \n> >   https://godbolt.org/z/W7sE4s8oc\n> > \n> > #include <cstdint>\n> > #include <iostream>\n> > \n> > int main()\n> > {\n> >      uint32_t bits = 32;\n> > \n> >      uint32_t x = uint32_t{1} << bits;\n> > \n> >      std::cout << x << \"\\n\";\n> > }\n> > \n> > with -std=c++17 -Wall -Werror -fsanitize=undefined\n> > \n> > Program returned: 0\n> > /app/example.cpp:8:30: runtime error: shift exponent 32 is too large for\n> > 32-bit type 'uint32_t' (aka 'unsigned int')\n> > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior\n> > /app/example.cpp:8:30\n> > 1\n> > \n> > So I don't think I can simplify here.\n> \n> Ahh, indeed, sorry. I was too fixated on the left operand that I have\n> missed that the right operand must not be greater or equal to the bit\n> width of the left operand.\n> \n> >>> +\n> >>> +public:\n> >>> +     using QuantizedType = T;\n> >>> +\n> >>> +     static constexpr T qMin = std::is_signed_v<T>\n> >>> +                             ? static_cast<T>(-(UT{1} << (bits - 1)))\n> >>> +                             : static_cast<T>(0);\n> >>> +\n> >>> +     static constexpr T qMax = std::is_signed_v<T>\n> >>> +                             ? static_cast<T>((UT{1} << (bits - 1)) - 1)\n> >>> +                             : static_cast<T>((UT{1} << bits) - 1);\n> >>> +\n> >>> +     static constexpr float toFloat(QuantizedType q)\n> >>> +     {\n> >>> +             return fixedToFloatingPoint<I, F, float, QuantizedType>(q);\n> >>> +     }\n> >>> +\n> >>> +     static constexpr float min = fixedToFloatingPoint<I, F, float>(qMin);\n> >>> +     static constexpr float max = fixedToFloatingPoint<I, F, float>(qMax);\n> >>> +\n> >>> +     static_assert(min < max, \"FixedPointQTraits: Minimum must be less than maximum\");\n> >>> +\n> >>> +     /* Conversion functions required by Quantized<Traits> */\n> >>> +     static QuantizedType fromFloat(float v)\n> >>> +     {\n> >>> +             v = std::clamp(v, min, max);\n> >>> +             return floatingToFixedPoint<I, F, QuantizedType, float>(v);\n> >>> +     }\n> >>> +};\n> >>> +\n> >>> +namespace details {\n> >>> +\n> >>> +template<unsigned int Bits>\n> >>> +constexpr auto qtype()\n> >>> +{\n> >>> +     static_assert(Bits <= 64);\n> >>> +\n> >>> +     if constexpr (Bits <= 8)\n> >>> +             return int8_t();\n> >>> +     else if constexpr (Bits <= 16)\n> >>> +             return int16_t();\n> >>> +     else if constexpr (Bits <= 32)\n> >>> +             return int32_t();\n> >>> +     else if constexpr (Bits <= 64)\n> >>> +             return int64_t();\n> >>> +}\n> >>> +\n> >>> +} /* namespace details */\n> >>> +\n> >>> +template<unsigned int I, unsigned int F>\n> >>> +using Q = Quantized<FixedPointQTraits<I, F, decltype(details::qtype<I + F>())>>;\n> >>> +\n> >>> +template<unsigned int I, unsigned int F>\n> >>> +using UQ = Quantized<FixedPointQTraits<I, F, std::make_unsigned_t<decltype(details::qtype<I + F>())>>>;\n> >>> +\n> >>>    } /* namespace ipa */\n> >>>    \n> >>>    } /* namespace libcamera */","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 86CAEC3220\n\tfor <parsemail@patchwork.libcamera.org>;\n\tWed, 21 Jan 2026 15:43:13 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 6120C61FC9;\n\tWed, 21 Jan 2026 16:43:12 +0100 (CET)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[213.167.242.64])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id D9CBC61F9F\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 21 Jan 2026 16:43:10 +0100 (CET)","from pendragon.ideasonboard.com\n\t(2001-14ba-703d-e500--2a1.rev.dnainternet.fi\n\t[IPv6:2001:14ba:703d:e500::2a1])\n\tby perceval.ideasonboard.com (Postfix) with UTF8SMTPSA id 8CDAD1C6;\n\tWed, 21 Jan 2026 16:42:38 +0100 (CET)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"qfbUkmpT\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1769010158;\n\tbh=3xPpYCKMOwCu9aeHGweQicfjNlf/DO8PumN0yFCqnt4=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=qfbUkmpTEi7py6WwjCsMo/BaVGV9pldCBCY3mgnvE4Qp8Ze9VctGeVkPcFPzjo/Z/\n\tIUTK6lJUYSYR5SvwB60O+3XKnxh7ey/2lJr0mOhCitsJyTk/s7tHP1PX5SRGG7ZI1F\n\tqpa9b7cQw5yK0Wt7CBP/kKdKI7CXeONBWciAqh5M=","Date":"Wed, 21 Jan 2026 17:43:08 +0200","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"=?utf-8?q?Barnab=C3=A1s_P=C5=91cze?= <barnabas.pocze@ideasonboard.com>","Cc":"Kieran Bingham <kieran.bingham@ideasonboard.com>,\n\tlibcamera devel <libcamera-devel@lists.libcamera.org>,\n\tIsaac Scott <isaac.scott@ideasonboard.com>","Subject":"Re: [PATCH v5 04/16] ipa: libipa: Provide fixed point quantized\n\ttraits","Message-ID":"<20260121154308.GB184493@killaraus>","References":"<20260114173918.1744023-1-kieran.bingham@ideasonboard.com>\n\t<20260114173918.1744023-5-kieran.bingham@ideasonboard.com>\n\t<7f271b8a-40bd-4d19-8a58-f9d320e3fa58@ideasonboard.com>\n\t<176899994091.1693075.17653379271066445227@ping.linuxembedded.co.uk>\n\t<3f10c916-3c79-4c0b-b8dc-711a77ff948c@ideasonboard.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<3f10c916-3c79-4c0b-b8dc-711a77ff948c@ideasonboard.com>","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}}]