[4/6] controls: ipa: rpi: Add CNN controls
diff mbox series

Message ID 20241213094602.2083174-5-naush@raspberrypi.com
State New
Headers show
Series
  • Raspberry Pi: Various changes
Related show

Commit Message

Naushir Patuck Dec. 13, 2024, 9:38 a.m. UTC
Add the follwing RPi vendor controls to handle Convolutional Neural
Network processing:

CnnOutputTensor
CnnOutputTensorInfo
CnnEnableInputTensor
CnnInputTensor
CnnInputTensorInfo
CnnKpiInfo

These controls will be used to support the new Raspberry Pi AI Camera,
using an IMX500 sensor with on-board neural network processing.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
---
 src/ipa/rpi/controller/controller.h |  33 +++++++++
 src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)

Comments

David Plowman Dec. 13, 2024, 10:24 a.m. UTC | #1
Hi Naush

Thanks for the patch.

On Fri, 13 Dec 2024 at 09:46, Naushir Patuck <naush@raspberrypi.com> wrote:
>
> Add the follwing RPi vendor controls to handle Convolutional Neural
> Network processing:
>
> CnnOutputTensor
> CnnOutputTensorInfo
> CnnEnableInputTensor
> CnnInputTensor
> CnnInputTensorInfo
> CnnKpiInfo
>
> These controls will be used to support the new Raspberry Pi AI Camera,
> using an IMX500 sensor with on-board neural network processing.
>
> Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> ---
>  src/ipa/rpi/controller/controller.h |  33 +++++++++
>  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)
>
> diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> index 64f93f414524..489188b44d9b 100644
> --- a/src/ipa/rpi/controller/controller.h
> +++ b/src/ipa/rpi/controller/controller.h
> @@ -25,6 +25,39 @@
>
>  namespace RPiController {
>
> +/*
> + * The following structures are used to export the CNN input/output tensor information
> + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> + * Applications must cast the span to these structures exactly.
> + */
> +static constexpr unsigned int NetworkNameLen = 64;
> +static constexpr unsigned int MaxNumTensors = 16;
> +static constexpr unsigned int MaxNumDimensions = 16;

Did I see elsewhere that we like to put "k" on the front of constants
these days? Not that I'm bothered either way...

> +
> +struct OutputTensorInfo {
> +       uint32_t tensorDataNum;
> +       uint32_t numDimensions;
> +       uint16_t size[MaxNumDimensions];
> +};
> +
> +struct CnnOutputTensorInfo {
> +       char networkName[NetworkNameLen];
> +       uint32_t numTensors;
> +       OutputTensorInfo info[MaxNumTensors];
> +};
> +
> +struct CnnInputTensorInfo {
> +       char networkName[NetworkNameLen];
> +       uint32_t width;
> +       uint32_t height;
> +       uint32_t numChannels;
> +};
> +
> +struct CnnKpiInfo {
> +       uint32_t dnnRuntime;
> +       uint32_t dspRuntime;
> +};
> +

I wondered momentarily whether these should be in a separate header
file, but honestly, there are so few I'm happy not to bother with it!

>  class Algorithm;
>  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
>
> diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> index 34bbdfc863c5..c0b5f63df525 100644
> --- a/src/libcamera/control_ids_rpi.yaml
> +++ b/src/libcamera/control_ids_rpi.yaml
> @@ -55,4 +55,112 @@ controls:
>          official libcamera API support for per-stream controls in the future.
>
>          \sa ScalerCrop
> +
> +  - CnnOutputTensor:
> +      type: float
> +      size: [n]
> +      description: |
> +        This control returns a span of floating point values that represent the
> +        output tensors from a Convolutional Neural Network (CNN). The size and
> +        format of this array of values is entirely dependent on the neural
> +        network used, and further post-processing may need to be performed at
> +        the application level to generate the final desired output. This control
> +        is agnostic of the hardware or software used to generate the output
> +        tensors.
> +
> +        The structure of the span is described by the CnnOutputTensorInfo
> +        control.
> +
> +        \sa CnnOutputTensorInfo
> +
> +  - CnnOutputTensorInfo:
> +      type: uint8_t
> +      size: [n]
> +      description: |
> +        This control returns the structure of the CnnOutputTensor. This structure
> +        takes the following form:
> +
> +        constexpr unsigned int NetworkNameLen = 64;
> +        constexpr unsigned int MaxNumTensors = 16;
> +        constexpr unsigned int MaxNumDimensions = 16;
> +
> +        struct CnnOutputTensorInfo {
> +          char networkName[NetworkNameLen];
> +          uint32_t numTensors;
> +          OutputTensorInfo info[MaxNumTensors];
> +        };
> +
> +        with
> +
> +        struct OutputTensorInfo {
> +          uint32_t tensorDataNum;
> +          uint32_t numDimensions;
> +          uint16_t size[MaxNumDimensions];
> +        };
> +
> +        networkName is the name of the CNN used,
> +        numTensors is the number of output tensors returned,
> +        tensorDataNum gives the number of elements in each output tensor,
> +        numDimensions gives the dimensionality of each output tensor,
> +        size gives the size of each dimension in each output tensor.
> +
> +        \sa CnnOutputTensor
> +
> +  - CnnEnableInputTensor:
> +      type: bool
> +      description: |
> +        Boolean to control if the IPA returns the input tensor used by the CNN
> +        to generate the output tensors via the CnnInputTensor control. Because
> +        the input tensor may be relatively large, for efficiency reason avoid

s/reason/reasons/

> +        enabling input tensor output unless required for debugging purposes.

Actually I found some of this just a bit tricky to parse and
understand. Is "via the CnnInputTensor control" maybe superfluous?
Also "input tensor output" took me a moment. Maybe "output of the
input tensor" is easier?

> +
> +        \sa CnnInputTensor
> +
> +  - CnnInputTensor:
> +       type: uint8_t
> +       size: [n]
> +       description: |
> +        This control returns a span of uint8_t pixel values that represent the
> +        input tensor for a Convolutional Neural Network (CNN). The size and
> +        format of this array of values is entirely dependent on the neural
> +        network used, and further post-processing (e.g. pixel normalisations) may
> +        need to be performed at the application level to generate the final input
> +        image.
> +
> +        The structure of the span is described by the CnnInputTensorInfo
> +        control.
> +
> +        \sa CnnInputTensorInfo
> +
> +  - CnnInputTensorInfo:
> +      type: uint8_t
> +      size: [n]
> +      description: |
> +        This control returns the structure of the CnnInputTensor. This structure
> +        takes the following form:
> +
> +        constexpr unsigned int NetworkNameLen = 64;
> +
> +        struct CnnInputTensorInfo {
> +          char networkName[NetworkNameLen];
> +          uint32_t width;
> +          uint32_t height;
> +          uint32_t numChannels;
> +        };
> +
> +        where
> +
> +        networkName is the name of the CNN used,
> +        width and height are the input tensor image width and height in pixels,
> +        numChannels is the number of channels in the input tensor image.
> +
> +        \sa CnnInputTensor
> +
> +  - CnnKpiInfo:
> +      type: int32_t
> +      size: [2]
> +      description: |
> +        This control returns performance metrics for the CNN processing stage.
> +        Two values are returned in this span, the runtime of the CNN/DNN stage

No particular issue, just wondering why we sometimes have CNN and
sometimes DNN. Are they the same, really (we define CNN, but did we
ever define DNN)? Should we standardise?

Minor edits aside:

Reviewed-by: David Plowman <david.plowman@raspberrypi.com>

Thanks!
David

> +        and the DSP stage in milliseconds.
>  ...
> --
> 2.43.0
>
Naushir Patuck Dec. 13, 2024, 1:34 p.m. UTC | #2
Hi David,

Thank you for the feedback on this + all the other patches.  I've
fixed up all the minors for v2, and also commented inline below.

On Fri, 13 Dec 2024 at 10:24, David Plowman
<david.plowman@raspberrypi.com> wrote:
>
> Hi Naush
>
> Thanks for the patch.
>
> On Fri, 13 Dec 2024 at 09:46, Naushir Patuck <naush@raspberrypi.com> wrote:
> >
> > Add the follwing RPi vendor controls to handle Convolutional Neural
> > Network processing:
> >
> > CnnOutputTensor
> > CnnOutputTensorInfo
> > CnnEnableInputTensor
> > CnnInputTensor
> > CnnInputTensorInfo
> > CnnKpiInfo
> >
> > These controls will be used to support the new Raspberry Pi AI Camera,
> > using an IMX500 sensor with on-board neural network processing.
> >
> > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > ---
> >  src/ipa/rpi/controller/controller.h |  33 +++++++++
> >  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
> >  2 files changed, 141 insertions(+)
> >
> > diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> > index 64f93f414524..489188b44d9b 100644
> > --- a/src/ipa/rpi/controller/controller.h
> > +++ b/src/ipa/rpi/controller/controller.h
> > @@ -25,6 +25,39 @@
> >
> >  namespace RPiController {
> >
> > +/*
> > + * The following structures are used to export the CNN input/output tensor information
> > + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> > + * Applications must cast the span to these structures exactly.
> > + */
> > +static constexpr unsigned int NetworkNameLen = 64;
> > +static constexpr unsigned int MaxNumTensors = 16;
> > +static constexpr unsigned int MaxNumDimensions = 16;
>
> Did I see elsewhere that we like to put "k" on the front of constants
> these days? Not that I'm bothered either way...
>
> > +
> > +struct OutputTensorInfo {
> > +       uint32_t tensorDataNum;
> > +       uint32_t numDimensions;
> > +       uint16_t size[MaxNumDimensions];
> > +};
> > +
> > +struct CnnOutputTensorInfo {
> > +       char networkName[NetworkNameLen];
> > +       uint32_t numTensors;
> > +       OutputTensorInfo info[MaxNumTensors];
> > +};
> > +
> > +struct CnnInputTensorInfo {
> > +       char networkName[NetworkNameLen];
> > +       uint32_t width;
> > +       uint32_t height;
> > +       uint32_t numChannels;
> > +};
> > +
> > +struct CnnKpiInfo {
> > +       uint32_t dnnRuntime;
> > +       uint32_t dspRuntime;
> > +};
> > +
>
> I wondered momentarily whether these should be in a separate header
> file, but honestly, there are so few I'm happy not to bother with it!

I thought about this, but the definitions are small I left it in here.

>
> >  class Algorithm;
> >  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
> >
> > diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> > index 34bbdfc863c5..c0b5f63df525 100644
> > --- a/src/libcamera/control_ids_rpi.yaml
> > +++ b/src/libcamera/control_ids_rpi.yaml
> > @@ -55,4 +55,112 @@ controls:
> >          official libcamera API support for per-stream controls in the future.
> >
> >          \sa ScalerCrop
> > +
> > +  - CnnOutputTensor:
> > +      type: float
> > +      size: [n]
> > +      description: |
> > +        This control returns a span of floating point values that represent the
> > +        output tensors from a Convolutional Neural Network (CNN). The size and
> > +        format of this array of values is entirely dependent on the neural
> > +        network used, and further post-processing may need to be performed at
> > +        the application level to generate the final desired output. This control
> > +        is agnostic of the hardware or software used to generate the output
> > +        tensors.
> > +
> > +        The structure of the span is described by the CnnOutputTensorInfo
> > +        control.
> > +
> > +        \sa CnnOutputTensorInfo
> > +
> > +  - CnnOutputTensorInfo:
> > +      type: uint8_t
> > +      size: [n]
> > +      description: |
> > +        This control returns the structure of the CnnOutputTensor. This structure
> > +        takes the following form:
> > +
> > +        constexpr unsigned int NetworkNameLen = 64;
> > +        constexpr unsigned int MaxNumTensors = 16;
> > +        constexpr unsigned int MaxNumDimensions = 16;
> > +
> > +        struct CnnOutputTensorInfo {
> > +          char networkName[NetworkNameLen];
> > +          uint32_t numTensors;
> > +          OutputTensorInfo info[MaxNumTensors];
> > +        };
> > +
> > +        with
> > +
> > +        struct OutputTensorInfo {
> > +          uint32_t tensorDataNum;
> > +          uint32_t numDimensions;
> > +          uint16_t size[MaxNumDimensions];
> > +        };
> > +
> > +        networkName is the name of the CNN used,
> > +        numTensors is the number of output tensors returned,
> > +        tensorDataNum gives the number of elements in each output tensor,
> > +        numDimensions gives the dimensionality of each output tensor,
> > +        size gives the size of each dimension in each output tensor.
> > +
> > +        \sa CnnOutputTensor
> > +
> > +  - CnnEnableInputTensor:
> > +      type: bool
> > +      description: |
> > +        Boolean to control if the IPA returns the input tensor used by the CNN
> > +        to generate the output tensors via the CnnInputTensor control. Because
> > +        the input tensor may be relatively large, for efficiency reason avoid
>
> s/reason/reasons/
>
> > +        enabling input tensor output unless required for debugging purposes.
>
> Actually I found some of this just a bit tricky to parse and
> understand. Is "via the CnnInputTensor control" maybe superfluous?
> Also "input tensor output" took me a moment. Maybe "output of the
> input tensor" is easier?

Agree, that did not read well.  I've reworded it to:

        Boolean to control if the IPA returns (through metadata) the input
        tensor used by the CNN to generate the output tensors. Because the input
        tensor may be relatively large, for efficiency reasons avoid returning
        the input tensor unless required for debugging purposes.

>
> > +
> > +        \sa CnnInputTensor
> > +
> > +  - CnnInputTensor:
> > +       type: uint8_t
> > +       size: [n]
> > +       description: |
> > +        This control returns a span of uint8_t pixel values that represent the
> > +        input tensor for a Convolutional Neural Network (CNN). The size and
> > +        format of this array of values is entirely dependent on the neural
> > +        network used, and further post-processing (e.g. pixel normalisations) may
> > +        need to be performed at the application level to generate the final input
> > +        image.
> > +
> > +        The structure of the span is described by the CnnInputTensorInfo
> > +        control.
> > +
> > +        \sa CnnInputTensorInfo
> > +
> > +  - CnnInputTensorInfo:
> > +      type: uint8_t
> > +      size: [n]
> > +      description: |
> > +        This control returns the structure of the CnnInputTensor. This structure
> > +        takes the following form:
> > +
> > +        constexpr unsigned int NetworkNameLen = 64;
> > +
> > +        struct CnnInputTensorInfo {
> > +          char networkName[NetworkNameLen];
> > +          uint32_t width;
> > +          uint32_t height;
> > +          uint32_t numChannels;
> > +        };
> > +
> > +        where
> > +
> > +        networkName is the name of the CNN used,
> > +        width and height are the input tensor image width and height in pixels,
> > +        numChannels is the number of channels in the input tensor image.
> > +
> > +        \sa CnnInputTensor
> > +
> > +  - CnnKpiInfo:
> > +      type: int32_t
> > +      size: [2]
> > +      description: |
> > +        This control returns performance metrics for the CNN processing stage.
> > +        Two values are returned in this span, the runtime of the CNN/DNN stage
>
> No particular issue, just wondering why we sometimes have CNN and
> sometimes DNN. Are they the same, really (we define CNN, but did we
> ever define DNN)? Should we standardise?

I was following one particular vendor's terminology here - but for no
real reason.  I've replaced it with CNN.

Regards,
Naush


> Minor edits aside:
>
> Reviewed-by: David Plowman <david.plowman@raspberrypi.com>
>
> Thanks!
> David
>
> > +        and the DSP stage in milliseconds.
> >  ...
> > --
> > 2.43.0
> >
Laurent Pinchart Dec. 15, 2024, 4:37 p.m. UTC | #3
Hi Naush,

Thank you for the patch.

On Fri, Dec 13, 2024 at 09:38:27AM +0000, Naushir Patuck wrote:
> Add the follwing RPi vendor controls to handle Convolutional Neural
> Network processing:
> 
> CnnOutputTensor
> CnnOutputTensorInfo
> CnnEnableInputTensor
> CnnInputTensor
> CnnInputTensorInfo
> CnnKpiInfo
> 
> These controls will be used to support the new Raspberry Pi AI Camera,
> using an IMX500 sensor with on-board neural network processing.

I think those controls should be reviewed in the context of the IMX500
kernel driver. That would also help with the libcamera policy that
drivers need to be on their way to mainline. When do you plan to post it
for review on the linux-media mailing list ?

> Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> ---
>  src/ipa/rpi/controller/controller.h |  33 +++++++++
>  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)
> 
> diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> index 64f93f414524..489188b44d9b 100644
> --- a/src/ipa/rpi/controller/controller.h
> +++ b/src/ipa/rpi/controller/controller.h
> @@ -25,6 +25,39 @@
>  
>  namespace RPiController {
>  
> +/*
> + * The following structures are used to export the CNN input/output tensor information
> + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> + * Applications must cast the span to these structures exactly.
> + */
> +static constexpr unsigned int NetworkNameLen = 64;
> +static constexpr unsigned int MaxNumTensors = 16;
> +static constexpr unsigned int MaxNumDimensions = 16;
> +
> +struct OutputTensorInfo {
> +	uint32_t tensorDataNum;
> +	uint32_t numDimensions;
> +	uint16_t size[MaxNumDimensions];
> +};
> +
> +struct CnnOutputTensorInfo {
> +	char networkName[NetworkNameLen];
> +	uint32_t numTensors;
> +	OutputTensorInfo info[MaxNumTensors];
> +};
> +
> +struct CnnInputTensorInfo {
> +	char networkName[NetworkNameLen];
> +	uint32_t width;
> +	uint32_t height;
> +	uint32_t numChannels;
> +};
> +
> +struct CnnKpiInfo {
> +	uint32_t dnnRuntime;
> +	uint32_t dspRuntime;
> +};
> +
>  class Algorithm;
>  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
>  
> diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> index 34bbdfc863c5..c0b5f63df525 100644
> --- a/src/libcamera/control_ids_rpi.yaml
> +++ b/src/libcamera/control_ids_rpi.yaml
> @@ -55,4 +55,112 @@ controls:
>          official libcamera API support for per-stream controls in the future.
>  
>          \sa ScalerCrop
> +
> +  - CnnOutputTensor:
> +      type: float
> +      size: [n]
> +      description: |
> +        This control returns a span of floating point values that represent the
> +        output tensors from a Convolutional Neural Network (CNN). The size and
> +        format of this array of values is entirely dependent on the neural
> +        network used, and further post-processing may need to be performed at
> +        the application level to generate the final desired output. This control
> +        is agnostic of the hardware or software used to generate the output
> +        tensors.
> +
> +        The structure of the span is described by the CnnOutputTensorInfo
> +        control.
> +
> +        \sa CnnOutputTensorInfo
> +
> +  - CnnOutputTensorInfo:
> +      type: uint8_t
> +      size: [n]
> +      description: |
> +        This control returns the structure of the CnnOutputTensor. This structure
> +        takes the following form:
> +
> +        constexpr unsigned int NetworkNameLen = 64;
> +        constexpr unsigned int MaxNumTensors = 16;
> +        constexpr unsigned int MaxNumDimensions = 16;
> +
> +        struct CnnOutputTensorInfo {
> +          char networkName[NetworkNameLen];
> +          uint32_t numTensors;
> +          OutputTensorInfo info[MaxNumTensors];
> +        };
> +
> +        with
> +
> +        struct OutputTensorInfo {
> +          uint32_t tensorDataNum;
> +          uint32_t numDimensions;
> +          uint16_t size[MaxNumDimensions];
> +        };
> +
> +        networkName is the name of the CNN used,
> +        numTensors is the number of output tensors returned,
> +        tensorDataNum gives the number of elements in each output tensor,
> +        numDimensions gives the dimensionality of each output tensor,
> +        size gives the size of each dimension in each output tensor.
> +
> +        \sa CnnOutputTensor
> +
> +  - CnnEnableInputTensor:
> +      type: bool
> +      description: |
> +        Boolean to control if the IPA returns the input tensor used by the CNN
> +        to generate the output tensors via the CnnInputTensor control. Because
> +        the input tensor may be relatively large, for efficiency reason avoid
> +        enabling input tensor output unless required for debugging purposes.
> +
> +        \sa CnnInputTensor
> +
> +  - CnnInputTensor:
> +       type: uint8_t
> +       size: [n]
> +       description: |
> +        This control returns a span of uint8_t pixel values that represent the
> +        input tensor for a Convolutional Neural Network (CNN). The size and
> +        format of this array of values is entirely dependent on the neural
> +        network used, and further post-processing (e.g. pixel normalisations) may
> +        need to be performed at the application level to generate the final input
> +        image.
> +
> +        The structure of the span is described by the CnnInputTensorInfo
> +        control.
> +
> +        \sa CnnInputTensorInfo
> +
> +  - CnnInputTensorInfo:
> +      type: uint8_t
> +      size: [n]
> +      description: |
> +        This control returns the structure of the CnnInputTensor. This structure
> +        takes the following form:
> +
> +        constexpr unsigned int NetworkNameLen = 64;
> +
> +        struct CnnInputTensorInfo {
> +          char networkName[NetworkNameLen];
> +          uint32_t width;
> +          uint32_t height;
> +          uint32_t numChannels;
> +        };
> +
> +        where
> +
> +        networkName is the name of the CNN used,
> +        width and height are the input tensor image width and height in pixels,
> +        numChannels is the number of channels in the input tensor image.
> +
> +        \sa CnnInputTensor
> +
> +  - CnnKpiInfo:
> +      type: int32_t
> +      size: [2]
> +      description: |
> +        This control returns performance metrics for the CNN processing stage.
> +        Two values are returned in this span, the runtime of the CNN/DNN stage
> +        and the DSP stage in milliseconds.
>  ...
Naushir Patuck Dec. 16, 2024, 10:11 a.m. UTC | #4
Hi Laurent,

On Sun, 15 Dec 2024 at 16:38, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Naush,
>
> Thank you for the patch.
>
> On Fri, Dec 13, 2024 at 09:38:27AM +0000, Naushir Patuck wrote:
> > Add the follwing RPi vendor controls to handle Convolutional Neural
> > Network processing:
> >
> > CnnOutputTensor
> > CnnOutputTensorInfo
> > CnnEnableInputTensor
> > CnnInputTensor
> > CnnInputTensorInfo
> > CnnKpiInfo
> >
> > These controls will be used to support the new Raspberry Pi AI Camera,
> > using an IMX500 sensor with on-board neural network processing.
>
> I think those controls should be reviewed in the context of the IMX500
> kernel driver. That would also help with the libcamera policy that
> drivers need to be on their way to mainline. When do you plan to post it
> for review on the linux-media mailing list ?

The intention of these controls was to avoid tying them to the IMX500
specifically and be generic.  Of course the only user of these
currently would be the imx500, but there is no reliance on e.g. the
IMX500 camera helper.

With regards to upstreaming, as soon as we have completed the streams
API, I'll be posting the imx500, imx477 and imx708 drivers to
linux-media.  However I have to be realistic with everyone, the IMX500
driver with neural network functionality has close to zero chance of
being accepted upstream.  We rely on closed firmware blobs to drive
the DSP, the models are also closed blobs that are made with closed
source (but freely available) software, and the output stream
structure cannot be documented as it is network dependent.

 So as not to waste everyone's time, I'll only be posting the imaging
part of the imx500 driver for upstream. I can understand if this means
you don't want to merge this patch upstream.  Let me know if you want
this patch removed, and we can get the other patches in this series
merged.

Regards,
Naush


>
> > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > ---
> >  src/ipa/rpi/controller/controller.h |  33 +++++++++
> >  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
> >  2 files changed, 141 insertions(+)
> >
> > diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> > index 64f93f414524..489188b44d9b 100644
> > --- a/src/ipa/rpi/controller/controller.h
> > +++ b/src/ipa/rpi/controller/controller.h
> > @@ -25,6 +25,39 @@
> >
> >  namespace RPiController {
> >
> > +/*
> > + * The following structures are used to export the CNN input/output tensor information
> > + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> > + * Applications must cast the span to these structures exactly.
> > + */
> > +static constexpr unsigned int NetworkNameLen = 64;
> > +static constexpr unsigned int MaxNumTensors = 16;
> > +static constexpr unsigned int MaxNumDimensions = 16;
> > +
> > +struct OutputTensorInfo {
> > +     uint32_t tensorDataNum;
> > +     uint32_t numDimensions;
> > +     uint16_t size[MaxNumDimensions];
> > +};
> > +
> > +struct CnnOutputTensorInfo {
> > +     char networkName[NetworkNameLen];
> > +     uint32_t numTensors;
> > +     OutputTensorInfo info[MaxNumTensors];
> > +};
> > +
> > +struct CnnInputTensorInfo {
> > +     char networkName[NetworkNameLen];
> > +     uint32_t width;
> > +     uint32_t height;
> > +     uint32_t numChannels;
> > +};
> > +
> > +struct CnnKpiInfo {
> > +     uint32_t dnnRuntime;
> > +     uint32_t dspRuntime;
> > +};
> > +
> >  class Algorithm;
> >  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
> >
> > diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> > index 34bbdfc863c5..c0b5f63df525 100644
> > --- a/src/libcamera/control_ids_rpi.yaml
> > +++ b/src/libcamera/control_ids_rpi.yaml
> > @@ -55,4 +55,112 @@ controls:
> >          official libcamera API support for per-stream controls in the future.
> >
> >          \sa ScalerCrop
> > +
> > +  - CnnOutputTensor:
> > +      type: float
> > +      size: [n]
> > +      description: |
> > +        This control returns a span of floating point values that represent the
> > +        output tensors from a Convolutional Neural Network (CNN). The size and
> > +        format of this array of values is entirely dependent on the neural
> > +        network used, and further post-processing may need to be performed at
> > +        the application level to generate the final desired output. This control
> > +        is agnostic of the hardware or software used to generate the output
> > +        tensors.
> > +
> > +        The structure of the span is described by the CnnOutputTensorInfo
> > +        control.
> > +
> > +        \sa CnnOutputTensorInfo
> > +
> > +  - CnnOutputTensorInfo:
> > +      type: uint8_t
> > +      size: [n]
> > +      description: |
> > +        This control returns the structure of the CnnOutputTensor. This structure
> > +        takes the following form:
> > +
> > +        constexpr unsigned int NetworkNameLen = 64;
> > +        constexpr unsigned int MaxNumTensors = 16;
> > +        constexpr unsigned int MaxNumDimensions = 16;
> > +
> > +        struct CnnOutputTensorInfo {
> > +          char networkName[NetworkNameLen];
> > +          uint32_t numTensors;
> > +          OutputTensorInfo info[MaxNumTensors];
> > +        };
> > +
> > +        with
> > +
> > +        struct OutputTensorInfo {
> > +          uint32_t tensorDataNum;
> > +          uint32_t numDimensions;
> > +          uint16_t size[MaxNumDimensions];
> > +        };
> > +
> > +        networkName is the name of the CNN used,
> > +        numTensors is the number of output tensors returned,
> > +        tensorDataNum gives the number of elements in each output tensor,
> > +        numDimensions gives the dimensionality of each output tensor,
> > +        size gives the size of each dimension in each output tensor.
> > +
> > +        \sa CnnOutputTensor
> > +
> > +  - CnnEnableInputTensor:
> > +      type: bool
> > +      description: |
> > +        Boolean to control if the IPA returns the input tensor used by the CNN
> > +        to generate the output tensors via the CnnInputTensor control. Because
> > +        the input tensor may be relatively large, for efficiency reason avoid
> > +        enabling input tensor output unless required for debugging purposes.
> > +
> > +        \sa CnnInputTensor
> > +
> > +  - CnnInputTensor:
> > +       type: uint8_t
> > +       size: [n]
> > +       description: |
> > +        This control returns a span of uint8_t pixel values that represent the
> > +        input tensor for a Convolutional Neural Network (CNN). The size and
> > +        format of this array of values is entirely dependent on the neural
> > +        network used, and further post-processing (e.g. pixel normalisations) may
> > +        need to be performed at the application level to generate the final input
> > +        image.
> > +
> > +        The structure of the span is described by the CnnInputTensorInfo
> > +        control.
> > +
> > +        \sa CnnInputTensorInfo
> > +
> > +  - CnnInputTensorInfo:
> > +      type: uint8_t
> > +      size: [n]
> > +      description: |
> > +        This control returns the structure of the CnnInputTensor. This structure
> > +        takes the following form:
> > +
> > +        constexpr unsigned int NetworkNameLen = 64;
> > +
> > +        struct CnnInputTensorInfo {
> > +          char networkName[NetworkNameLen];
> > +          uint32_t width;
> > +          uint32_t height;
> > +          uint32_t numChannels;
> > +        };
> > +
> > +        where
> > +
> > +        networkName is the name of the CNN used,
> > +        width and height are the input tensor image width and height in pixels,
> > +        numChannels is the number of channels in the input tensor image.
> > +
> > +        \sa CnnInputTensor
> > +
> > +  - CnnKpiInfo:
> > +      type: int32_t
> > +      size: [2]
> > +      description: |
> > +        This control returns performance metrics for the CNN processing stage.
> > +        Two values are returned in this span, the runtime of the CNN/DNN stage
> > +        and the DSP stage in milliseconds.
> >  ...
>
> --
> Regards,
>
> Laurent Pinchart
Laurent Pinchart Dec. 17, 2024, 11 p.m. UTC | #5
Hi Naush,

On Mon, Dec 16, 2024 at 10:11:28AM +0000, Naushir Patuck wrote:
> On Sun, 15 Dec 2024 at 16:38, Laurent Pinchart wrote:
> > On Fri, Dec 13, 2024 at 09:38:27AM +0000, Naushir Patuck wrote:
> > > Add the follwing RPi vendor controls to handle Convolutional Neural
> > > Network processing:
> > >
> > > CnnOutputTensor
> > > CnnOutputTensorInfo
> > > CnnEnableInputTensor
> > > CnnInputTensor
> > > CnnInputTensorInfo
> > > CnnKpiInfo
> > >
> > > These controls will be used to support the new Raspberry Pi AI Camera,
> > > using an IMX500 sensor with on-board neural network processing.
> >
> > I think those controls should be reviewed in the context of the IMX500
> > kernel driver. That would also help with the libcamera policy that
> > drivers need to be on their way to mainline. When do you plan to post it
> > for review on the linux-media mailing list ?
> 
> The intention of these controls was to avoid tying them to the IMX500
> specifically and be generic.  Of course the only user of these
> currently would be the imx500, but there is no reliance on e.g. the
> IMX500 camera helper.
> 
> With regards to upstreaming, as soon as we have completed the streams
> API, I'll be posting the imx500, imx477 and imx708 drivers to
> linux-media.  However I have to be realistic with everyone, the IMX500
> driver with neural network functionality has close to zero chance of
> being accepted upstream.  We rely on closed firmware blobs to drive
> the DSP, the models are also closed blobs that are made with closed
> source (but freely available) software, and the output stream
> structure cannot be documented as it is network dependent.

Close to zero is small, but I wouldn't entirely rule it out. Maybe not
right now, but let's see how the situation will evolve.

>  So as not to waste everyone's time, I'll only be posting the imaging
> part of the imx500 driver for upstream. I can understand if this means
> you don't want to merge this patch upstream.  Let me know if you want
> this patch removed, and we can get the other patches in this series
> merged.

For the time being that would be preferable. I'm sorry about that.

> > > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > > ---
> > >  src/ipa/rpi/controller/controller.h |  33 +++++++++
> > >  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
> > >  2 files changed, 141 insertions(+)
> > >
> > > diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> > > index 64f93f414524..489188b44d9b 100644
> > > --- a/src/ipa/rpi/controller/controller.h
> > > +++ b/src/ipa/rpi/controller/controller.h
> > > @@ -25,6 +25,39 @@
> > >
> > >  namespace RPiController {
> > >
> > > +/*
> > > + * The following structures are used to export the CNN input/output tensor information
> > > + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> > > + * Applications must cast the span to these structures exactly.
> > > + */
> > > +static constexpr unsigned int NetworkNameLen = 64;
> > > +static constexpr unsigned int MaxNumTensors = 16;
> > > +static constexpr unsigned int MaxNumDimensions = 16;
> > > +
> > > +struct OutputTensorInfo {
> > > +     uint32_t tensorDataNum;
> > > +     uint32_t numDimensions;
> > > +     uint16_t size[MaxNumDimensions];
> > > +};
> > > +
> > > +struct CnnOutputTensorInfo {
> > > +     char networkName[NetworkNameLen];
> > > +     uint32_t numTensors;
> > > +     OutputTensorInfo info[MaxNumTensors];
> > > +};
> > > +
> > > +struct CnnInputTensorInfo {
> > > +     char networkName[NetworkNameLen];
> > > +     uint32_t width;
> > > +     uint32_t height;
> > > +     uint32_t numChannels;
> > > +};
> > > +
> > > +struct CnnKpiInfo {
> > > +     uint32_t dnnRuntime;
> > > +     uint32_t dspRuntime;
> > > +};
> > > +
> > >  class Algorithm;
> > >  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
> > >
> > > diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> > > index 34bbdfc863c5..c0b5f63df525 100644
> > > --- a/src/libcamera/control_ids_rpi.yaml
> > > +++ b/src/libcamera/control_ids_rpi.yaml
> > > @@ -55,4 +55,112 @@ controls:
> > >          official libcamera API support for per-stream controls in the future.
> > >
> > >          \sa ScalerCrop
> > > +
> > > +  - CnnOutputTensor:
> > > +      type: float
> > > +      size: [n]
> > > +      description: |
> > > +        This control returns a span of floating point values that represent the
> > > +        output tensors from a Convolutional Neural Network (CNN). The size and
> > > +        format of this array of values is entirely dependent on the neural
> > > +        network used, and further post-processing may need to be performed at
> > > +        the application level to generate the final desired output. This control
> > > +        is agnostic of the hardware or software used to generate the output
> > > +        tensors.
> > > +
> > > +        The structure of the span is described by the CnnOutputTensorInfo
> > > +        control.
> > > +
> > > +        \sa CnnOutputTensorInfo
> > > +
> > > +  - CnnOutputTensorInfo:
> > > +      type: uint8_t
> > > +      size: [n]
> > > +      description: |
> > > +        This control returns the structure of the CnnOutputTensor. This structure
> > > +        takes the following form:
> > > +
> > > +        constexpr unsigned int NetworkNameLen = 64;
> > > +        constexpr unsigned int MaxNumTensors = 16;
> > > +        constexpr unsigned int MaxNumDimensions = 16;
> > > +
> > > +        struct CnnOutputTensorInfo {
> > > +          char networkName[NetworkNameLen];
> > > +          uint32_t numTensors;
> > > +          OutputTensorInfo info[MaxNumTensors];
> > > +        };
> > > +
> > > +        with
> > > +
> > > +        struct OutputTensorInfo {
> > > +          uint32_t tensorDataNum;
> > > +          uint32_t numDimensions;
> > > +          uint16_t size[MaxNumDimensions];
> > > +        };
> > > +
> > > +        networkName is the name of the CNN used,
> > > +        numTensors is the number of output tensors returned,
> > > +        tensorDataNum gives the number of elements in each output tensor,
> > > +        numDimensions gives the dimensionality of each output tensor,
> > > +        size gives the size of each dimension in each output tensor.
> > > +
> > > +        \sa CnnOutputTensor
> > > +
> > > +  - CnnEnableInputTensor:
> > > +      type: bool
> > > +      description: |
> > > +        Boolean to control if the IPA returns the input tensor used by the CNN
> > > +        to generate the output tensors via the CnnInputTensor control. Because
> > > +        the input tensor may be relatively large, for efficiency reason avoid
> > > +        enabling input tensor output unless required for debugging purposes.
> > > +
> > > +        \sa CnnInputTensor
> > > +
> > > +  - CnnInputTensor:
> > > +       type: uint8_t
> > > +       size: [n]
> > > +       description: |
> > > +        This control returns a span of uint8_t pixel values that represent the
> > > +        input tensor for a Convolutional Neural Network (CNN). The size and
> > > +        format of this array of values is entirely dependent on the neural
> > > +        network used, and further post-processing (e.g. pixel normalisations) may
> > > +        need to be performed at the application level to generate the final input
> > > +        image.
> > > +
> > > +        The structure of the span is described by the CnnInputTensorInfo
> > > +        control.
> > > +
> > > +        \sa CnnInputTensorInfo
> > > +
> > > +  - CnnInputTensorInfo:
> > > +      type: uint8_t
> > > +      size: [n]
> > > +      description: |
> > > +        This control returns the structure of the CnnInputTensor. This structure
> > > +        takes the following form:
> > > +
> > > +        constexpr unsigned int NetworkNameLen = 64;
> > > +
> > > +        struct CnnInputTensorInfo {
> > > +          char networkName[NetworkNameLen];
> > > +          uint32_t width;
> > > +          uint32_t height;
> > > +          uint32_t numChannels;
> > > +        };
> > > +
> > > +        where
> > > +
> > > +        networkName is the name of the CNN used,
> > > +        width and height are the input tensor image width and height in pixels,
> > > +        numChannels is the number of channels in the input tensor image.
> > > +
> > > +        \sa CnnInputTensor
> > > +
> > > +  - CnnKpiInfo:
> > > +      type: int32_t
> > > +      size: [2]
> > > +      description: |
> > > +        This control returns performance metrics for the CNN processing stage.
> > > +        Two values are returned in this span, the runtime of the CNN/DNN stage
> > > +        and the DSP stage in milliseconds.
> > >  ...

Patch
diff mbox series

diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
index 64f93f414524..489188b44d9b 100644
--- a/src/ipa/rpi/controller/controller.h
+++ b/src/ipa/rpi/controller/controller.h
@@ -25,6 +25,39 @@ 
 
 namespace RPiController {
 
+/*
+ * The following structures are used to export the CNN input/output tensor information
+ * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
+ * Applications must cast the span to these structures exactly.
+ */
+static constexpr unsigned int NetworkNameLen = 64;
+static constexpr unsigned int MaxNumTensors = 16;
+static constexpr unsigned int MaxNumDimensions = 16;
+
+struct OutputTensorInfo {
+	uint32_t tensorDataNum;
+	uint32_t numDimensions;
+	uint16_t size[MaxNumDimensions];
+};
+
+struct CnnOutputTensorInfo {
+	char networkName[NetworkNameLen];
+	uint32_t numTensors;
+	OutputTensorInfo info[MaxNumTensors];
+};
+
+struct CnnInputTensorInfo {
+	char networkName[NetworkNameLen];
+	uint32_t width;
+	uint32_t height;
+	uint32_t numChannels;
+};
+
+struct CnnKpiInfo {
+	uint32_t dnnRuntime;
+	uint32_t dspRuntime;
+};
+
 class Algorithm;
 typedef std::unique_ptr<Algorithm> AlgorithmPtr;
 
diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
index 34bbdfc863c5..c0b5f63df525 100644
--- a/src/libcamera/control_ids_rpi.yaml
+++ b/src/libcamera/control_ids_rpi.yaml
@@ -55,4 +55,112 @@  controls:
         official libcamera API support for per-stream controls in the future.
 
         \sa ScalerCrop
+
+  - CnnOutputTensor:
+      type: float
+      size: [n]
+      description: |
+        This control returns a span of floating point values that represent the
+        output tensors from a Convolutional Neural Network (CNN). The size and
+        format of this array of values is entirely dependent on the neural
+        network used, and further post-processing may need to be performed at
+        the application level to generate the final desired output. This control
+        is agnostic of the hardware or software used to generate the output
+        tensors.
+
+        The structure of the span is described by the CnnOutputTensorInfo
+        control.
+
+        \sa CnnOutputTensorInfo
+
+  - CnnOutputTensorInfo:
+      type: uint8_t
+      size: [n]
+      description: |
+        This control returns the structure of the CnnOutputTensor. This structure
+        takes the following form:
+
+        constexpr unsigned int NetworkNameLen = 64;
+        constexpr unsigned int MaxNumTensors = 16;
+        constexpr unsigned int MaxNumDimensions = 16;
+
+        struct CnnOutputTensorInfo {
+          char networkName[NetworkNameLen];
+          uint32_t numTensors;
+          OutputTensorInfo info[MaxNumTensors];
+        };
+
+        with
+
+        struct OutputTensorInfo {
+          uint32_t tensorDataNum;
+          uint32_t numDimensions;
+          uint16_t size[MaxNumDimensions];
+        };
+
+        networkName is the name of the CNN used,
+        numTensors is the number of output tensors returned,
+        tensorDataNum gives the number of elements in each output tensor,
+        numDimensions gives the dimensionality of each output tensor,
+        size gives the size of each dimension in each output tensor.
+
+        \sa CnnOutputTensor
+
+  - CnnEnableInputTensor:
+      type: bool
+      description: |
+        Boolean to control if the IPA returns the input tensor used by the CNN
+        to generate the output tensors via the CnnInputTensor control. Because
+        the input tensor may be relatively large, for efficiency reason avoid
+        enabling input tensor output unless required for debugging purposes.
+
+        \sa CnnInputTensor
+
+  - CnnInputTensor:
+       type: uint8_t
+       size: [n]
+       description: |
+        This control returns a span of uint8_t pixel values that represent the
+        input tensor for a Convolutional Neural Network (CNN). The size and
+        format of this array of values is entirely dependent on the neural
+        network used, and further post-processing (e.g. pixel normalisations) may
+        need to be performed at the application level to generate the final input
+        image.
+
+        The structure of the span is described by the CnnInputTensorInfo
+        control.
+
+        \sa CnnInputTensorInfo
+
+  - CnnInputTensorInfo:
+      type: uint8_t
+      size: [n]
+      description: |
+        This control returns the structure of the CnnInputTensor. This structure
+        takes the following form:
+
+        constexpr unsigned int NetworkNameLen = 64;
+
+        struct CnnInputTensorInfo {
+          char networkName[NetworkNameLen];
+          uint32_t width;
+          uint32_t height;
+          uint32_t numChannels;
+        };
+
+        where
+
+        networkName is the name of the CNN used,
+        width and height are the input tensor image width and height in pixels,
+        numChannels is the number of channels in the input tensor image.
+
+        \sa CnnInputTensor
+
+  - CnnKpiInfo:
+      type: int32_t
+      size: [2]
+      description: |
+        This control returns performance metrics for the CNN processing stage.
+        Two values are returned in this span, the runtime of the CNN/DNN stage
+        and the DSP stage in milliseconds.
 ...