ipa: libipa: fixedpoint: Expand documentation on sign bit
diff mbox series

Message ID 20260120083952.15338-1-jacopo.mondi@ideasonboard.com
State New
Headers show
Series
  • ipa: libipa: fixedpoint: Expand documentation on sign bit
Related show

Commit Message

Jacopo Mondi Jan. 20, 2026, 8:39 a.m. UTC
Converting numbers with a signed fixed-point representation to
the corresponding float value requires to include the sign bit in the
width of the fixed-point integral part.

Clearly specify it in documentation.

Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
---
 src/ipa/libipa/fixedpoint.cpp | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

--
2.52.0

Comments

Stefan Klug Jan. 20, 2026, 8:53 a.m. UTC | #1
Hi Jacopo,

Quoting Jacopo Mondi (2026-01-20 09:39:49)
> Converting numbers with a signed fixed-point representation to
> the corresponding float value requires to include the sign bit in the
> width of the fixed-point integral part.
> 
> Clearly specify it in documentation.
> 
> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> ---
>  src/ipa/libipa/fixedpoint.cpp | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp
> index 6b698fc5d680..b37cdc43936f 100644
> --- a/src/ipa/libipa/fixedpoint.cpp
> +++ b/src/ipa/libipa/fixedpoint.cpp
> @@ -29,11 +29,31 @@ namespace ipa {
>  /**
>   * \fn R fixedToFloatingPoint(T number)
>   * \brief Convert a fixed-point number to a floating point representation
> - * \tparam I Bit width of the integer part of the fixed-point
> + * \tparam I Bit width of the integer part of the fixed-point including the
> + * optional sign bit
>   * \tparam F Bit width of the fractional part of the fixed-point
>   * \tparam R Return type of the floating point representation
>   * \tparam T Input type of the fixed-point representation
>   * \param number The fixed point number to convert to floating point
> + *
> + * If the fixed-point representation is signed, the sign bit shall be included
> + * in the \a I template parameter that specifies the number of bits of the
> + * integral part of the fixed-point representation.
> + *
> + * As an example, a value represented as signed fixed-point Q4.8 format can be
> + * converted to its corresponding floating point representation as:

I'm a bit confused here. Doesn't signed Q4.8 mean that the first bit of
the 4 is the sign bit? The same way a signed int32 has the signed bit on
the first of the 32 bits?

Best regards,
Stefan

> + *
> + * \code{.cpp}
> + * double d = fixedToFloatingPoint<5, 8, double, uint16_t>(fixed);
> + * \endcode
> + *
> + * While a value represented as unsigned fixed-point Q4.8 format can be
> + * converted as:
> + *
> + * \code{.cpp}
> + * double d = fixedToFloatingPoint<4, 8, double, uint16_t>(fixed);
> + * \endcode
> + *
>   * \return The converted value
>   */
> 
> --
> 2.52.0
>
Jacopo Mondi Jan. 20, 2026, 9 a.m. UTC | #2
Hi Stefan

On Tue, Jan 20, 2026 at 09:53:06AM +0100, Stefan Klug wrote:
> Hi Jacopo,
>
> Quoting Jacopo Mondi (2026-01-20 09:39:49)
> > Converting numbers with a signed fixed-point representation to
> > the corresponding float value requires to include the sign bit in the
> > width of the fixed-point integral part.
> >
> > Clearly specify it in documentation.
> >
> > Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > ---
> >  src/ipa/libipa/fixedpoint.cpp | 22 +++++++++++++++++++++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp
> > index 6b698fc5d680..b37cdc43936f 100644
> > --- a/src/ipa/libipa/fixedpoint.cpp
> > +++ b/src/ipa/libipa/fixedpoint.cpp
> > @@ -29,11 +29,31 @@ namespace ipa {
> >  /**
> >   * \fn R fixedToFloatingPoint(T number)
> >   * \brief Convert a fixed-point number to a floating point representation
> > - * \tparam I Bit width of the integer part of the fixed-point
> > + * \tparam I Bit width of the integer part of the fixed-point including the
> > + * optional sign bit
> >   * \tparam F Bit width of the fractional part of the fixed-point
> >   * \tparam R Return type of the floating point representation
> >   * \tparam T Input type of the fixed-point representation
> >   * \param number The fixed point number to convert to floating point
> > + *
> > + * If the fixed-point representation is signed, the sign bit shall be included
> > + * in the \a I template parameter that specifies the number of bits of the
> > + * integral part of the fixed-point representation.
> > + *
> > + * As an example, a value represented as signed fixed-point Q4.8 format can be
> > + * converted to its corresponding floating point representation as:
>
> I'm a bit confused here. Doesn't signed Q4.8 mean that the first bit of
> the 4 is the sign bit? The same way a signed int32 has the signed bit on
> the first of the 32 bits?

I'm right now looking at the datasheet documentation of a value said
to be in "signed Q4.8" format whose register size is 13 bits

Coefft R-G [12:0] : sign/magnitude 4.8-bit fixed-point

>
> Best regards,
> Stefan
>
> > + *
> > + * \code{.cpp}
> > + * double d = fixedToFloatingPoint<5, 8, double, uint16_t>(fixed);
> > + * \endcode
> > + *
> > + * While a value represented as unsigned fixed-point Q4.8 format can be
> > + * converted as:
> > + *
> > + * \code{.cpp}
> > + * double d = fixedToFloatingPoint<4, 8, double, uint16_t>(fixed);
> > + * \endcode
> > + *
> >   * \return The converted value
> >   */
> >
> > --
> > 2.52.0
> >
Stefan Klug Jan. 20, 2026, 9:10 a.m. UTC | #3
Hi Jacopo,

Quoting Jacopo Mondi (2026-01-20 10:00:14)
> Hi Stefan
> 
> On Tue, Jan 20, 2026 at 09:53:06AM +0100, Stefan Klug wrote:
> > Hi Jacopo,
> >
> > Quoting Jacopo Mondi (2026-01-20 09:39:49)
> > > Converting numbers with a signed fixed-point representation to
> > > the corresponding float value requires to include the sign bit in the
> > > width of the fixed-point integral part.
> > >
> > > Clearly specify it in documentation.
> > >
> > > Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > > ---
> > >  src/ipa/libipa/fixedpoint.cpp | 22 +++++++++++++++++++++-
> > >  1 file changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp
> > > index 6b698fc5d680..b37cdc43936f 100644
> > > --- a/src/ipa/libipa/fixedpoint.cpp
> > > +++ b/src/ipa/libipa/fixedpoint.cpp
> > > @@ -29,11 +29,31 @@ namespace ipa {
> > >  /**
> > >   * \fn R fixedToFloatingPoint(T number)
> > >   * \brief Convert a fixed-point number to a floating point representation
> > > - * \tparam I Bit width of the integer part of the fixed-point
> > > + * \tparam I Bit width of the integer part of the fixed-point including the
> > > + * optional sign bit
> > >   * \tparam F Bit width of the fractional part of the fixed-point
> > >   * \tparam R Return type of the floating point representation
> > >   * \tparam T Input type of the fixed-point representation
> > >   * \param number The fixed point number to convert to floating point
> > > + *
> > > + * If the fixed-point representation is signed, the sign bit shall be included
> > > + * in the \a I template parameter that specifies the number of bits of the
> > > + * integral part of the fixed-point representation.
> > > + *
> > > + * As an example, a value represented as signed fixed-point Q4.8 format can be
> > > + * converted to its corresponding floating point representation as:
> >
> > I'm a bit confused here. Doesn't signed Q4.8 mean that the first bit of
> > the 4 is the sign bit? The same way a signed int32 has the signed bit on
> > the first of the 32 bits?
> 
> I'm right now looking at the datasheet documentation of a value said
> to be in "signed Q4.8" format whose register size is 13 bits
> 
> Coefft R-G [12:0] : sign/magnitude 4.8-bit fixed-point

I should have consulted wikipedia first. https://en.wikipedia.org/wiki/Q_(number_format)
clearly states that the sign bit is implicitely added.

Best regards,
Stefan

> 
> >
> > Best regards,
> > Stefan
> >
> > > + *
> > > + * \code{.cpp}
> > > + * double d = fixedToFloatingPoint<5, 8, double, uint16_t>(fixed);
> > > + * \endcode
> > > + *
> > > + * While a value represented as unsigned fixed-point Q4.8 format can be
> > > + * converted as:
> > > + *
> > > + * \code{.cpp}
> > > + * double d = fixedToFloatingPoint<4, 8, double, uint16_t>(fixed);
> > > + * \endcode
> > > + *
> > >   * \return The converted value
> > >   */
> > >
> > > --
> > > 2.52.0
> > >
Barnabás Pőcze Jan. 20, 2026, 9:11 a.m. UTC | #4
2026. 01. 20. 10:00 keltezéssel, Jacopo Mondi írta:
> Hi Stefan
> 
> On Tue, Jan 20, 2026 at 09:53:06AM +0100, Stefan Klug wrote:
>> Hi Jacopo,
>>
>> Quoting Jacopo Mondi (2026-01-20 09:39:49)
>>> Converting numbers with a signed fixed-point representation to
>>> the corresponding float value requires to include the sign bit in the
>>> width of the fixed-point integral part.
>>>
>>> Clearly specify it in documentation.
>>>
>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>>> ---
>>>   src/ipa/libipa/fixedpoint.cpp | 22 +++++++++++++++++++++-
>>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp
>>> index 6b698fc5d680..b37cdc43936f 100644
>>> --- a/src/ipa/libipa/fixedpoint.cpp
>>> +++ b/src/ipa/libipa/fixedpoint.cpp
>>> @@ -29,11 +29,31 @@ namespace ipa {
>>>   /**
>>>    * \fn R fixedToFloatingPoint(T number)
>>>    * \brief Convert a fixed-point number to a floating point representation
>>> - * \tparam I Bit width of the integer part of the fixed-point
>>> + * \tparam I Bit width of the integer part of the fixed-point including the
>>> + * optional sign bit
>>>    * \tparam F Bit width of the fractional part of the fixed-point
>>>    * \tparam R Return type of the floating point representation
>>>    * \tparam T Input type of the fixed-point representation
>>>    * \param number The fixed point number to convert to floating point
>>> + *
>>> + * If the fixed-point representation is signed, the sign bit shall be included
>>> + * in the \a I template parameter that specifies the number of bits of the
>>> + * integral part of the fixed-point representation.
>>> + *
>>> + * As an example, a value represented as signed fixed-point Q4.8 format can be
>>> + * converted to its corresponding floating point representation as:
>>
>> I'm a bit confused here. Doesn't signed Q4.8 mean that the first bit of
>> the 4 is the sign bit? The same way a signed int32 has the signed bit on
>> the first of the 32 bits?

It would appear there are two interpretations: https://en.wikipedia.org/wiki/Q_(number_format)

"Texas Instruments version": "Thus, the total number w of bits used is 1 + m + n."
"ARM version": "A variant of the Q notation has been in use by ARM in which the m number also counts the sign bit."


> 
> I'm right now looking at the datasheet documentation of a value said
> to be in "signed Q4.8" format whose register size is 13 bits
> 
> Coefft R-G [12:0] : sign/magnitude 4.8-bit fixed-point

Does that mean "sign/magnitude" as in https://en.wikipedia.org/wiki/Signed_number_representations#Sign–magnitude ?
If so, then I'm not sure these functions will work.


> 
>>
>> Best regards,
>> Stefan
>>
>>> + *
>>> + * \code{.cpp}
>>> + * double d = fixedToFloatingPoint<5, 8, double, uint16_t>(fixed);
>>> + * \endcode
>>> + *
>>> + * While a value represented as unsigned fixed-point Q4.8 format can be
>>> + * converted as:
>>> + *
>>> + * \code{.cpp}
>>> + * double d = fixedToFloatingPoint<4, 8, double, uint16_t>(fixed);
>>> + * \endcode
>>> + *
>>>    * \return The converted value
>>>    */
>>>
>>> --
>>> 2.52.0
>>>

Patch
diff mbox series

diff --git a/src/ipa/libipa/fixedpoint.cpp b/src/ipa/libipa/fixedpoint.cpp
index 6b698fc5d680..b37cdc43936f 100644
--- a/src/ipa/libipa/fixedpoint.cpp
+++ b/src/ipa/libipa/fixedpoint.cpp
@@ -29,11 +29,31 @@  namespace ipa {
 /**
  * \fn R fixedToFloatingPoint(T number)
  * \brief Convert a fixed-point number to a floating point representation
- * \tparam I Bit width of the integer part of the fixed-point
+ * \tparam I Bit width of the integer part of the fixed-point including the
+ * optional sign bit
  * \tparam F Bit width of the fractional part of the fixed-point
  * \tparam R Return type of the floating point representation
  * \tparam T Input type of the fixed-point representation
  * \param number The fixed point number to convert to floating point
+ *
+ * If the fixed-point representation is signed, the sign bit shall be included
+ * in the \a I template parameter that specifies the number of bits of the
+ * integral part of the fixed-point representation.
+ *
+ * As an example, a value represented as signed fixed-point Q4.8 format can be
+ * converted to its corresponding floating point representation as:
+ *
+ * \code{.cpp}
+ * double d = fixedToFloatingPoint<5, 8, double, uint16_t>(fixed);
+ * \endcode
+ *
+ * While a value represented as unsigned fixed-point Q4.8 format can be
+ * converted as:
+ *
+ * \code{.cpp}
+ * double d = fixedToFloatingPoint<4, 8, double, uint16_t>(fixed);
+ * \endcode
+ *
  * \return The converted value
  */