[libcamera-devel,2/3] pipeline: raspberrypi: Rework the internal buffer allocation scheme
diff mbox series

Message ID 20211110100802.349623-2-naush@raspberrypi.com
State Superseded
Headers show
Series
  • [libcamera-devel,1/3] pipeline: raspberrypi: Add const qualifer in isRaw()
Related show

Commit Message

Naushir Patuck Nov. 10, 2021, 10:08 a.m. UTC
For simplicity, the pipeline handler would look at the maximum number of
buffers set in the StreamConfiguration and allocate the same number of internal
buffers for all device nodes. This would likely overallocate buffers for some
nodes. Rework this logic to try and minimise overallcations without compromising
performance.

For ISP nodes, we only ever need 1 set of internal buffers, as the hardware runs
synchronous with the requests and IPA.

For Unicam nodes, allocate a minimum for 4 buffers (exported + internal), but
also require at least 1 internal buffer.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
---
 .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
 1 file changed, 33 insertions(+), 11 deletions(-)

Comments

Kieran Bingham Nov. 10, 2021, 10:31 a.m. UTC | #1
Quoting Naushir Patuck (2021-11-10 10:08:01)
> For simplicity, the pipeline handler would look at the maximum number of
> buffers set in the StreamConfiguration and allocate the same number of internal
> buffers for all device nodes. This would likely overallocate buffers for some
> nodes. Rework this logic to try and minimise overallcations without compromising
> performance.
> 
> For ISP nodes, we only ever need 1 set of internal buffers, as the hardware runs
> synchronous with the requests and IPA.
> 
> For Unicam nodes, allocate a minimum for 4 buffers (exported + internal), but
> also require at least 1 internal buffer.
> 
> Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> ---
>  .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
>  1 file changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> index 11d3c2b120dd..5f0f00aacd59 100644
> --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> @@ -1211,21 +1211,43 @@ int PipelineHandlerRPi::queueAllBuffers(Camera *camera)
>  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
>  {
>         RPiCameraData *data = cameraData(camera);
> +       unsigned int numBuffers;
> +       bool rawStream = false;
>         int ret;
>  
> -       /*
> -        * Decide how many internal buffers to allocate. For now, simply look
> -        * at how many external buffers will be provided. We'll need to improve
> -        * this logic. However, we really must have all streams allocate the same
> -        * number of buffers to simplify error handling in queueRequestDevice().
> -        */
> -       unsigned int maxBuffers = 0;
> -       for (const Stream *s : camera->streams())
> -               if (static_cast<const RPi::Stream *>(s)->isExternal())
> -                       maxBuffers = std::max(maxBuffers, s->configuration().bufferCount);
> +       for (Stream *s : camera->streams()) {
> +               if (isRaw(s->configuration().pixelFormat)) {
> +                       numBuffers = s->configuration().bufferCount;
> +                       rawStream = true;
> +                       break;
> +               }
> +       }
>  
> +       /* Decide how many internal buffers to allocate. */
>         for (auto const stream : data->streams_) {
> -               ret = stream->prepareBuffers(maxBuffers);
> +               if (stream == &data->unicam_[Unicam::Image] ||
> +                   stream == &data->unicam_[Unicam::Embedded]) {
> +                       /*
> +                        * For Unicam, allocate a minimum of 4 buffers as we want
> +                        * to avoid any frame drops. If an application has configured
> +                        * a RAW stream, allocate additional buffers to make up the
> +                        * minimum, but ensure we have at least 1 set of internal
> +                        * buffers to use to minimise frame drops.
> +                        */
> +                       constexpr unsigned int minBuffers = 4;
> +                       numBuffers = rawStream ? std::max<int>(1, minBuffers - numBuffers)
> +                                              : minBuffers;
> +               } else {
> +                       /*
> +                        * Since the ISP runs synchronous with the IPA and requests,
> +                        * we only ever need one set of internal buffers. Any buffers
> +                        * the application wants to hold onto will already be exported
> +                        * through PipelineHandlerRPi::exportFrameBuffers().
> +                        */

Ok, I see this is for the 4 device streams of the ISP. These don't
represent the buffers that go to the user application though do they?

Oh - I see - it's /just/ any extra internal allocation. If a user buffer
is sent in, it would be used on Output0, Output1 ... so we're not going
to starve here.

Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

> +                       numBuffers = 1;
> +               }
> +
> +               ret = stream->prepareBuffers(numBuffers);
>                 if (ret < 0)
>                         return ret;
>         }
> -- 
> 2.25.1
>
Umang Jain Nov. 10, 2021, 5:27 p.m. UTC | #2
Hi Naush

On 11/10/21 3:38 PM, Naushir Patuck wrote:
> For simplicity, the pipeline handler would look at the maximum number of
> buffers set in the StreamConfiguration and allocate the same number of internal
> buffers for all device nodes. This would likely overallocate buffers for some
> nodes. Rework this logic to try and minimise overallcations without compromising
> performance.
>
> For ISP nodes, we only ever need 1 set of internal buffers, as the hardware runs
> synchronous with the requests and IPA.
>
> For Unicam nodes, allocate a minimum for 4 buffers (exported + internal), but
> also require at least 1 internal buffer.
>
> Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> ---
>   .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
>   1 file changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> index 11d3c2b120dd..5f0f00aacd59 100644
> --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> @@ -1211,21 +1211,43 @@ int PipelineHandlerRPi::queueAllBuffers(Camera *camera)
>   int PipelineHandlerRPi::prepareBuffers(Camera *camera)
>   {
>   	RPiCameraData *data = cameraData(camera);
> +	unsigned int numBuffers;

I would initialise this value to 0, it might stand a chance in future 
(though unlikely) that it is used uninitialised below

         std::max<int>(1, minBuffers - numBuffers)

So just future-proof the usage

Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>

> +	bool rawStream = false;
>   	int ret;
>   
> -	/*
> -	 * Decide how many internal buffers to allocate. For now, simply look
> -	 * at how many external buffers will be provided. We'll need to improve
> -	 * this logic. However, we really must have all streams allocate the same
> -	 * number of buffers to simplify error handling in queueRequestDevice().
> -	 */
> -	unsigned int maxBuffers = 0;
> -	for (const Stream *s : camera->streams())
> -		if (static_cast<const RPi::Stream *>(s)->isExternal())
> -			maxBuffers = std::max(maxBuffers, s->configuration().bufferCount);
> +	for (Stream *s : camera->streams()) {
> +		if (isRaw(s->configuration().pixelFormat)) {
> +			numBuffers = s->configuration().bufferCount;
> +			rawStream = true;
> +			break;
> +		}
> +	}
>   
> +	/* Decide how many internal buffers to allocate. */
>   	for (auto const stream : data->streams_) {
> -		ret = stream->prepareBuffers(maxBuffers);
> +		if (stream == &data->unicam_[Unicam::Image] ||
> +		    stream == &data->unicam_[Unicam::Embedded]) {
> +			/*
> +			 * For Unicam, allocate a minimum of 4 buffers as we want
> +			 * to avoid any frame drops. If an application has configured
> +			 * a RAW stream, allocate additional buffers to make up the
> +			 * minimum, but ensure we have at least 1 set of internal
> +			 * buffers to use to minimise frame drops.
> +			 */
> +			constexpr unsigned int minBuffers = 4;
> +			numBuffers = rawStream ? std::max<int>(1, minBuffers - numBuffers)
> +					       : minBuffers;
> +		} else {
> +			/*
> +			 * Since the ISP runs synchronous with the IPA and requests,
> +			 * we only ever need one set of internal buffers. Any buffers
> +			 * the application wants to hold onto will already be exported
> +			 * through PipelineHandlerRPi::exportFrameBuffers().
> +			 */
> +			numBuffers = 1;
> +		}
> +
> +		ret = stream->prepareBuffers(numBuffers);
>   		if (ret < 0)
>   			return ret;
>   	}
Laurent Pinchart Nov. 11, 2021, 2:35 p.m. UTC | #3
Hi Naush,

Thank you for the patch.

On Wed, Nov 10, 2021 at 10:08:01AM +0000, Naushir Patuck wrote:
> For simplicity, the pipeline handler would look at the maximum number of
> buffers set in the StreamConfiguration and allocate the same number of internal
> buffers for all device nodes. This would likely overallocate buffers for some
> nodes. Rework this logic to try and minimise overallcations without compromising
> performance.
> 
> For ISP nodes, we only ever need 1 set of internal buffers, as the hardware runs
> synchronous with the requests and IPA.
> 
> For Unicam nodes, allocate a minimum for 4 buffers (exported + internal), but
> also require at least 1 internal buffer.
> 
> Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> ---
>  .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
>  1 file changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> index 11d3c2b120dd..5f0f00aacd59 100644
> --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> @@ -1211,21 +1211,43 @@ int PipelineHandlerRPi::queueAllBuffers(Camera *camera)
>  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
>  {
>  	RPiCameraData *data = cameraData(camera);
> +	unsigned int numBuffers;

I'd name the variable numRawBuffers.

> +	bool rawStream = false;
>  	int ret;
>  
> -	/*
> -	 * Decide how many internal buffers to allocate. For now, simply look
> -	 * at how many external buffers will be provided. We'll need to improve
> -	 * this logic. However, we really must have all streams allocate the same
> -	 * number of buffers to simplify error handling in queueRequestDevice().

Does the error logic still work after this change ?

> -	 */
> -	unsigned int maxBuffers = 0;
> -	for (const Stream *s : camera->streams())
> -		if (static_cast<const RPi::Stream *>(s)->isExternal())
> -			maxBuffers = std::max(maxBuffers, s->configuration().bufferCount);
> +	for (Stream *s : camera->streams()) {
> +		if (isRaw(s->configuration().pixelFormat)) {
> +			numBuffers = s->configuration().bufferCount;
> +			rawStream = true;
> +			break;
> +		}
> +	}
>  
> +	/* Decide how many internal buffers to allocate. */
>  	for (auto const stream : data->streams_) {
> -		ret = stream->prepareBuffers(maxBuffers);

And here add a numBuffers local variable.

> +		if (stream == &data->unicam_[Unicam::Image] ||
> +		    stream == &data->unicam_[Unicam::Embedded]) {
> +			/*
> +			 * For Unicam, allocate a minimum of 4 buffers as we want
> +			 * to avoid any frame drops. If an application has configured
> +			 * a RAW stream, allocate additional buffers to make up the
> +			 * minimum, but ensure we have at least 1 set of internal
> +			 * buffers to use to minimise frame drops.
> +			 */
> +			constexpr unsigned int minBuffers = 4;
> +			numBuffers = rawStream ? std::max<int>(1, minBuffers - numBuffers)
> +					       : minBuffers;

You could initialize numRawBuffers to 0 and drop rawStream, as if
!rawStream, then numRawBuffers will be equal to 0, so

			numBuffers = std::max<int>(1, minBuffers - numRawBuffers);

would do the right thing.

> +		} else {
> +			/*
> +			 * Since the ISP runs synchronous with the IPA and requests,
> +			 * we only ever need one set of internal buffers. Any buffers
> +			 * the application wants to hold onto will already be exported
> +			 * through PipelineHandlerRPi::exportFrameBuffers().
> +			 */
> +			numBuffers = 1;
> +		}
> +
> +		ret = stream->prepareBuffers(numBuffers);

I have a hard time follow this. Before the patch, the pipeline handler
calls prepareBuffers() with the number of buffers requested by the
application. Now it calls it with a number of internal buffers only,
without any change to the prepareBuffers() function itself. This would
benefit from an explanation in the commit message, as it's not clear why
it's correct.

Furthermore, what will happen if an application requests 4 buffers for
the raw stream and any number of buffers for a processed streams, and
repeatedly queues requests with no buffer for the raw stream ? It seems
to me that you'll now allocate a single extra buffer for the raw stream,
which won't be enough to keep the unicam queue running.

>  		if (ret < 0)
>  			return ret;
>  	}
Naushir Patuck Nov. 11, 2021, 2:58 p.m. UTC | #4
Hi Laurent,

Thank you for your feedback!

On Thu, 11 Nov 2021 at 14:35, Laurent Pinchart <
laurent.pinchart@ideasonboard.com> wrote:

> Hi Naush,
>
> Thank you for the patch.
>
> On Wed, Nov 10, 2021 at 10:08:01AM +0000, Naushir Patuck wrote:
> > For simplicity, the pipeline handler would look at the maximum number of
> > buffers set in the StreamConfiguration and allocate the same number of
> internal
> > buffers for all device nodes. This would likely overallocate buffers for
> some
> > nodes. Rework this logic to try and minimise overallcations without
> compromising
> > performance.
> >
> > For ISP nodes, we only ever need 1 set of internal buffers, as the
> hardware runs
> > synchronous with the requests and IPA.
> >
> > For Unicam nodes, allocate a minimum for 4 buffers (exported +
> internal), but
> > also require at least 1 internal buffer.
> >
> > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > ---
> >  .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
> >  1 file changed, 33 insertions(+), 11 deletions(-)
> >
> > diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > index 11d3c2b120dd..5f0f00aacd59 100644
> > --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > @@ -1211,21 +1211,43 @@ int PipelineHandlerRPi::queueAllBuffers(Camera
> *camera)
> >  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
> >  {
> >       RPiCameraData *data = cameraData(camera);
> > +     unsigned int numBuffers;
>
> I'd name the variable numRawBuffers.
>
> > +     bool rawStream = false;
> >       int ret;
> >
> > -     /*
> > -      * Decide how many internal buffers to allocate. For now, simply
> look
> > -      * at how many external buffers will be provided. We'll need to
> improve
> > -      * this logic. However, we really must have all streams allocate
> the same
> > -      * number of buffers to simplify error handling in
> queueRequestDevice().
>
> Does the error logic still work after this change ?
>

Yes, the comment above has been bit-rotted, and this is handled correctly
now.


>
> > -      */
> > -     unsigned int maxBuffers = 0;
> > -     for (const Stream *s : camera->streams())
> > -             if (static_cast<const RPi::Stream *>(s)->isExternal())
> > -                     maxBuffers = std::max(maxBuffers,
> s->configuration().bufferCount);
> > +     for (Stream *s : camera->streams()) {
> > +             if (isRaw(s->configuration().pixelFormat)) {
> > +                     numBuffers = s->configuration().bufferCount;
> > +                     rawStream = true;
> > +                     break;
> > +             }
> > +     }
> >
> > +     /* Decide how many internal buffers to allocate. */
> >       for (auto const stream : data->streams_) {
> > -             ret = stream->prepareBuffers(maxBuffers);
>
> And here add a numBuffers local variable.
>
> > +             if (stream == &data->unicam_[Unicam::Image] ||
> > +                 stream == &data->unicam_[Unicam::Embedded]) {
> > +                     /*
> > +                      * For Unicam, allocate a minimum of 4 buffers as
> we want
> > +                      * to avoid any frame drops. If an application has
> configured
> > +                      * a RAW stream, allocate additional buffers to
> make up the
> > +                      * minimum, but ensure we have at least 1 set of
> internal
> > +                      * buffers to use to minimise frame drops.
> > +                      */
> > +                     constexpr unsigned int minBuffers = 4;
> > +                     numBuffers = rawStream ? std::max<int>(1,
> minBuffers - numBuffers)
> > +                                            : minBuffers;
>
> You could initialize numRawBuffers to 0 and drop rawStream, as if
> !rawStream, then numRawBuffers will be equal to 0, so
>
>                         numBuffers = std::max<int>(1, minBuffers -
> numRawBuffers);
>
> would do the right thing.
>
> > +             } else {
> > +                     /*
> > +                      * Since the ISP runs synchronous with the IPA and
> requests,
> > +                      * we only ever need one set of internal buffers.
> Any buffers
> > +                      * the application wants to hold onto will already
> be exported
> > +                      * through
> PipelineHandlerRPi::exportFrameBuffers().
> > +                      */
> > +                     numBuffers = 1;
> > +             }
> > +
> > +             ret = stream->prepareBuffers(numBuffers);
>
> I have a hard time follow this. Before the patch, the pipeline handler
> calls prepareBuffers() with the number of buffers requested by the
> application. Now it calls it with a number of internal buffers only,
> without any change to the prepareBuffers() function itself. This would
> benefit from an explanation in the commit message, as it's not clear why
> it's correct.
>

For clarification, prepareBuffers() would (and still will) allocate
internal use
buffers, and call importBuffers() to allocate the buffer cache with the
internal
+ user requested buffer count. Buffer allocations for the user requested
buffer
count goes through PipelineHandler::exportFrameBuffers().

So this change (somewhat) decouples the user count from the internal count.
I will clarify this in the commit message.


>
> Furthermore, what will happen if an application requests 4 buffers for
> the raw stream and any number of buffers for a processed streams, and
> repeatedly queues requests with no buffer for the raw stream ? It seems
> to me that you'll now allocate a single extra buffer for the raw stream,
> which won't be enough to keep the unicam queue running.
>

This is a tricky one.  Without any hints from the application as to what it
intends
to do, I have to balance memory usage with performance.  You are right that
with only a single internal buffer allocated, the performance may be
degraded
if the application does not provide buffers through the Request.  However,
If the application has allocated 4x 18Mbyte buffers (12mpix 12-bit), I
really don't
want to be allocating 2x more of them when they possibly may never be used
(assuming the application is circulating all these buffers constantly). I
would hope
application writers would be using those buffers sensibly :-)

Some of our platforms have very small CMA heaps to work with, memory will
get very tight for these use-cases, so I do want to limit memory usage at
the
expense of performance.  Some hints from the application on how it intends
to
use buffers might help here with this balance.

David, do you think we can bump the internal allocation up to 2 buffers?

Regards,
Naush


>
> >               if (ret < 0)
> >                       return ret;
> >       }
>
> --
> Regards,
>
> Laurent Pinchart
>
Laurent Pinchart Nov. 11, 2021, 3:48 p.m. UTC | #5
Hi Naush,

On Thu, Nov 11, 2021 at 02:58:21PM +0000, Naushir Patuck wrote:
> On Thu, 11 Nov 2021 at 14:35, Laurent Pinchart wrote:
> > On Wed, Nov 10, 2021 at 10:08:01AM +0000, Naushir Patuck wrote:
> > > For simplicity, the pipeline handler would look at the maximum number of
> > > buffers set in the StreamConfiguration and allocate the same number of internal
> > > buffers for all device nodes. This would likely overallocate buffers for some
> > > nodes. Rework this logic to try and minimise overallcations without compromising
> > > performance.
> > >
> > > For ISP nodes, we only ever need 1 set of internal buffers, as the hardware runs
> > > synchronous with the requests and IPA.
> > >
> > > For Unicam nodes, allocate a minimum for 4 buffers (exported + internal), but
> > > also require at least 1 internal buffer.
> > >
> > > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > > ---
> > >  .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
> > >  1 file changed, 33 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > index 11d3c2b120dd..5f0f00aacd59 100644
> > > --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > @@ -1211,21 +1211,43 @@ int PipelineHandlerRPi::queueAllBuffers(Camera *camera)
> > >  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
> > >  {
> > >       RPiCameraData *data = cameraData(camera);
> > > +     unsigned int numBuffers;
> > I'd name the variable numRawBuffers.
> >
> > > +     bool rawStream = false;
> > >       int ret;
> > >
> > > -     /*
> > > -      * Decide how many internal buffers to allocate. For now, simply look
> > > -      * at how many external buffers will be provided. We'll need to improve
> > > -      * this logic. However, we really must have all streams allocate the same
> > > -      * number of buffers to simplify error handling in queueRequestDevice().
> >
> > Does the error logic still work after this change ?
> 
> Yes, the comment above has been bit-rotted, and this is handled correctly
> now.
> 
> > > -      */
> > > -     unsigned int maxBuffers = 0;
> > > -     for (const Stream *s : camera->streams())
> > > -             if (static_cast<const RPi::Stream *>(s)->isExternal())
> > > -                     maxBuffers = std::max(maxBuffers, s->configuration().bufferCount);
> > > +     for (Stream *s : camera->streams()) {
> > > +             if (isRaw(s->configuration().pixelFormat)) {
> > > +                     numBuffers = s->configuration().bufferCount;
> > > +                     rawStream = true;
> > > +                     break;
> > > +             }
> > > +     }
> > >
> > > +     /* Decide how many internal buffers to allocate. */
> > >       for (auto const stream : data->streams_) {
> > > -             ret = stream->prepareBuffers(maxBuffers);
> >
> > And here add a numBuffers local variable.
> >
> > > +             if (stream == &data->unicam_[Unicam::Image] ||
> > > +                 stream == &data->unicam_[Unicam::Embedded]) {
> > > +                     /*
> > > +                      * For Unicam, allocate a minimum of 4 buffers as we want
> > > +                      * to avoid any frame drops. If an application has configured
> > > +                      * a RAW stream, allocate additional buffers to make up the
> > > +                      * minimum, but ensure we have at least 1 set of internal
> > > +                      * buffers to use to minimise frame drops.
> > > +                      */
> > > +                     constexpr unsigned int minBuffers = 4;
> > > +                     numBuffers = rawStream ? std::max<int>(1, minBuffers - numBuffers)
> > > +                                            : minBuffers;
> >
> > You could initialize numRawBuffers to 0 and drop rawStream, as if
> > !rawStream, then numRawBuffers will be equal to 0, so
> >
> >                         numBuffers = std::max<int>(1, minBuffers - numRawBuffers);
> >
> > would do the right thing.
> >
> > > +             } else {
> > > +                     /*
> > > +                      * Since the ISP runs synchronous with the IPA and requests,
> > > +                      * we only ever need one set of internal buffers. Any buffers
> > > +                      * the application wants to hold onto will already be exported
> > > +                      * through PipelineHandlerRPi::exportFrameBuffers().
> > > +                      */
> > > +                     numBuffers = 1;
> > > +             }
> > > +
> > > +             ret = stream->prepareBuffers(numBuffers);
> >
> > I have a hard time follow this. Before the patch, the pipeline handler
> > calls prepareBuffers() with the number of buffers requested by the
> > application. Now it calls it with a number of internal buffers only,
> > without any change to the prepareBuffers() function itself. This would
> > benefit from an explanation in the commit message, as it's not clear why
> > it's correct.
> 
> For clarification, prepareBuffers() would (and still will) allocate internal use
> buffers, and call importBuffers() to allocate the buffer cache with the internal
> + user requested buffer count. Buffer allocations for the user requested buffer
> count goes through PipelineHandler::exportFrameBuffers().

I'm looking at prepareBuffers():

int Stream::prepareBuffers(unsigned int count)
{
	int ret;

	if (!importOnly_) {
		if (count) {
			/* Export some frame buffers for internal use. */
			ret = dev_->exportBuffers(count, &internalBuffers_);
			if (ret < 0)
				return ret;

			/* Add these exported buffers to the internal/external buffer list. */
			setExportedBuffers(&internalBuffers_);

			/* Add these buffers to the queue of internal usable buffers. */
			for (auto const &buffer : internalBuffers_)
				availableBuffers_.push(buffer.get());
		}

		/* We must import all internal/external exported buffers. */
		count = bufferMap_.size();
	}

	return dev_->importBuffers(count);
}

In the !importOnly_ case, we add count buffers to bufferMap_ by a call
to setExportedBuffers(). setExportedBuffers() is also called by
PipelineHandlerRPi::exportFrameBuffers(), so we currently add
max(s->configuration().bufferCount) buffers to any buffers already
exported by the application. This certainly overallocates, as we
essentially allocations bufferCount * 2 buffers.

With this patch, overallocation is reduced. However, there's no
requirement for applications to use the FrameBufferAllocator (which ends
up calling PipelineHandlerRPi::exportFrameBuffers()) to allocate
buffers. If an application allocates buffers through different means
(for instance exporting them from a display device), then we will call
dev_->importBuffers() with a very low count.

> So this change (somewhat) decouples the user count from the internal count.
> I will clarify this in the commit message.
> 
> > Furthermore, what will happen if an application requests 4 buffers for
> > the raw stream and any number of buffers for a processed streams, and
> > repeatedly queues requests with no buffer for the raw stream ? It seems
> > to me that you'll now allocate a single extra buffer for the raw stream,
> > which won't be enough to keep the unicam queue running.
> 
> This is a tricky one.  Without any hints from the application as to what it intends
> to do, I have to balance memory usage with performance.  You are right that
> with only a single internal buffer allocated, the performance may be degraded
> if the application does not provide buffers through the Request.  However,
> If the application has allocated 4x 18Mbyte buffers (12mpix 12-bit), I really don't
> want to be allocating 2x more of them when they possibly may never be used
> (assuming the application is circulating all these buffers constantly). I would hope
> application writers would be using those buffers sensibly :-)
> 
> Some of our platforms have very small CMA heaps to work with, memory will
> get very tight for these use-cases, so I do want to limit memory usage at the
> expense of performance.  Some hints from the application on how it intends to
> use buffers might help here with this balance.

I agree with you on the problem statement, but I'm wondering if we
shouldn't start by implementing the safe option to avoid frame drops,
and then improve performance on top (possibly with a hint API).

> David, do you think we can bump the internal allocation up to 2 buffers?
> 
> > >               if (ret < 0)
> > >                       return ret;
> > >       }
Naushir Patuck Nov. 11, 2021, 4:42 p.m. UTC | #6
Hi Laurent,

On Thu, 11 Nov 2021 at 15:49, Laurent Pinchart <
laurent.pinchart@ideasonboard.com> wrote:

> Hi Naush,
>
> On Thu, Nov 11, 2021 at 02:58:21PM +0000, Naushir Patuck wrote:
> > On Thu, 11 Nov 2021 at 14:35, Laurent Pinchart wrote:
> > > On Wed, Nov 10, 2021 at 10:08:01AM +0000, Naushir Patuck wrote:
> > > > For simplicity, the pipeline handler would look at the maximum
> number of
> > > > buffers set in the StreamConfiguration and allocate the same number
> of internal
> > > > buffers for all device nodes. This would likely overallocate buffers
> for some
> > > > nodes. Rework this logic to try and minimise overallcations without
> compromising
> > > > performance.
> > > >
> > > > For ISP nodes, we only ever need 1 set of internal buffers, as the
> hardware runs
> > > > synchronous with the requests and IPA.
> > > >
> > > > For Unicam nodes, allocate a minimum for 4 buffers (exported +
> internal), but
> > > > also require at least 1 internal buffer.
> > > >
> > > > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > > > ---
> > > >  .../pipeline/raspberrypi/raspberrypi.cpp      | 44
> ++++++++++++++-----
> > > >  1 file changed, 33 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > > index 11d3c2b120dd..5f0f00aacd59 100644
> > > > --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > > +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > > @@ -1211,21 +1211,43 @@ int
> PipelineHandlerRPi::queueAllBuffers(Camera *camera)
> > > >  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
> > > >  {
> > > >       RPiCameraData *data = cameraData(camera);
> > > > +     unsigned int numBuffers;
> > > I'd name the variable numRawBuffers.
> > >
> > > > +     bool rawStream = false;
> > > >       int ret;
> > > >
> > > > -     /*
> > > > -      * Decide how many internal buffers to allocate. For now,
> simply look
> > > > -      * at how many external buffers will be provided. We'll need
> to improve
> > > > -      * this logic. However, we really must have all streams
> allocate the same
> > > > -      * number of buffers to simplify error handling in
> queueRequestDevice().
> > >
> > > Does the error logic still work after this change ?
> >
> > Yes, the comment above has been bit-rotted, and this is handled correctly
> > now.
> >
> > > > -      */
> > > > -     unsigned int maxBuffers = 0;
> > > > -     for (const Stream *s : camera->streams())
> > > > -             if (static_cast<const RPi::Stream *>(s)->isExternal())
> > > > -                     maxBuffers = std::max(maxBuffers,
> s->configuration().bufferCount);
> > > > +     for (Stream *s : camera->streams()) {
> > > > +             if (isRaw(s->configuration().pixelFormat)) {
> > > > +                     numBuffers = s->configuration().bufferCount;
> > > > +                     rawStream = true;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > >
> > > > +     /* Decide how many internal buffers to allocate. */
> > > >       for (auto const stream : data->streams_) {
> > > > -             ret = stream->prepareBuffers(maxBuffers);
> > >
> > > And here add a numBuffers local variable.
> > >
> > > > +             if (stream == &data->unicam_[Unicam::Image] ||
> > > > +                 stream == &data->unicam_[Unicam::Embedded]) {
> > > > +                     /*
> > > > +                      * For Unicam, allocate a minimum of 4 buffers
> as we want
> > > > +                      * to avoid any frame drops. If an application
> has configured
> > > > +                      * a RAW stream, allocate additional buffers
> to make up the
> > > > +                      * minimum, but ensure we have at least 1 set
> of internal
> > > > +                      * buffers to use to minimise frame drops.
> > > > +                      */
> > > > +                     constexpr unsigned int minBuffers = 4;
> > > > +                     numBuffers = rawStream ? std::max<int>(1,
> minBuffers - numBuffers)
> > > > +                                            : minBuffers;
> > >
> > > You could initialize numRawBuffers to 0 and drop rawStream, as if
> > > !rawStream, then numRawBuffers will be equal to 0, so
> > >
> > >                         numBuffers = std::max<int>(1, minBuffers -
> numRawBuffers);
> > >
> > > would do the right thing.
> > >
> > > > +             } else {
> > > > +                     /*
> > > > +                      * Since the ISP runs synchronous with the IPA
> and requests,
> > > > +                      * we only ever need one set of internal
> buffers. Any buffers
> > > > +                      * the application wants to hold onto will
> already be exported
> > > > +                      * through
> PipelineHandlerRPi::exportFrameBuffers().
> > > > +                      */
> > > > +                     numBuffers = 1;
> > > > +             }
> > > > +
> > > > +             ret = stream->prepareBuffers(numBuffers);
> > >
> > > I have a hard time follow this. Before the patch, the pipeline handler
> > > calls prepareBuffers() with the number of buffers requested by the
> > > application. Now it calls it with a number of internal buffers only,
> > > without any change to the prepareBuffers() function itself. This would
> > > benefit from an explanation in the commit message, as it's not clear
> why
> > > it's correct.
> >
> > For clarification, prepareBuffers() would (and still will) allocate
> internal use
> > buffers, and call importBuffers() to allocate the buffer cache with the
> internal
> > + user requested buffer count. Buffer allocations for the user requested
> buffer
> > count goes through PipelineHandler::exportFrameBuffers().
>
> I'm looking at prepareBuffers():
>
> int Stream::prepareBuffers(unsigned int count)
> {
>         int ret;
>
>         if (!importOnly_) {
>                 if (count) {
>                         /* Export some frame buffers for internal use. */
>                         ret = dev_->exportBuffers(count,
> &internalBuffers_);
>                         if (ret < 0)
>                                 return ret;
>
>                         /* Add these exported buffers to the
> internal/external buffer list. */
>                         setExportedBuffers(&internalBuffers_);
>
>                         /* Add these buffers to the queue of internal
> usable buffers. */
>                         for (auto const &buffer : internalBuffers_)
>                                 availableBuffers_.push(buffer.get());
>                 }
>
>                 /* We must import all internal/external exported buffers.
> */
>                 count = bufferMap_.size();
>         }
>
>         return dev_->importBuffers(count);
> }
>
> In the !importOnly_ case, we add count buffers to bufferMap_ by a call
> to setExportedBuffers(). setExportedBuffers() is also called by
> PipelineHandlerRPi::exportFrameBuffers(), so we currently add
> max(s->configuration().bufferCount) buffers to any buffers already
> exported by the application. This certainly overallocates, as we
> essentially allocations bufferCount * 2 buffers.
>
> With this patch, overallocation is reduced. However, there's no
> requirement for applications to use the FrameBufferAllocator (which ends
> up calling PipelineHandlerRPi::exportFrameBuffers()) to allocate
> buffers. If an application allocates buffers through different means
> (for instance exporting them from a display device), then we will call
> dev_->importBuffers() with a very low count.
>

Yes, I see how this could go wrong.  Should I just assume the worst, and
pass VIDEO_MAX_FRAME into importBuffers.  Would that be advisable?


>
> > So this change (somewhat) decouples the user count from the internal
> count.
> > I will clarify this in the commit message.
> >
> > > Furthermore, what will happen if an application requests 4 buffers for
> > > the raw stream and any number of buffers for a processed streams, and
> > > repeatedly queues requests with no buffer for the raw stream ? It seems
> > > to me that you'll now allocate a single extra buffer for the raw
> stream,
> > > which won't be enough to keep the unicam queue running.
> >
> > This is a tricky one.  Without any hints from the application as to what
> it intends
> > to do, I have to balance memory usage with performance.  You are right
> that
> > with only a single internal buffer allocated, the performance may be
> degraded
> > if the application does not provide buffers through the Request.
> However,
> > If the application has allocated 4x 18Mbyte buffers (12mpix 12-bit), I
> really don't
> > want to be allocating 2x more of them when they possibly may never be
> used
> > (assuming the application is circulating all these buffers constantly).
> I would hope
> > application writers would be using those buffers sensibly :-)
> >
> > Some of our platforms have very small CMA heaps to work with, memory will
> > get very tight for these use-cases, so I do want to limit memory usage
> at the
> > expense of performance.  Some hints from the application on how it
> intends to
> > use buffers might help here with this balance.
>
> I agree with you on the problem statement, but I'm wondering if we
> shouldn't start by implementing the safe option to avoid frame drops,
> and then improve performance on top (possibly with a hint API).
>

I think maybe we can start with 2 internal buffer allocations, then see if
that may ever need reducing.

Regards,
Naush


>
> > David, do you think we can bump the internal allocation up to 2 buffers?
> >
> > > >               if (ret < 0)
> > > >                       return ret;
> > > >       }
>
> --
> Regards,
>
> Laurent Pinchart
>

Patch
diff mbox series

diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
index 11d3c2b120dd..5f0f00aacd59 100644
--- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
+++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
@@ -1211,21 +1211,43 @@  int PipelineHandlerRPi::queueAllBuffers(Camera *camera)
 int PipelineHandlerRPi::prepareBuffers(Camera *camera)
 {
 	RPiCameraData *data = cameraData(camera);
+	unsigned int numBuffers;
+	bool rawStream = false;
 	int ret;
 
-	/*
-	 * Decide how many internal buffers to allocate. For now, simply look
-	 * at how many external buffers will be provided. We'll need to improve
-	 * this logic. However, we really must have all streams allocate the same
-	 * number of buffers to simplify error handling in queueRequestDevice().
-	 */
-	unsigned int maxBuffers = 0;
-	for (const Stream *s : camera->streams())
-		if (static_cast<const RPi::Stream *>(s)->isExternal())
-			maxBuffers = std::max(maxBuffers, s->configuration().bufferCount);
+	for (Stream *s : camera->streams()) {
+		if (isRaw(s->configuration().pixelFormat)) {
+			numBuffers = s->configuration().bufferCount;
+			rawStream = true;
+			break;
+		}
+	}
 
+	/* Decide how many internal buffers to allocate. */
 	for (auto const stream : data->streams_) {
-		ret = stream->prepareBuffers(maxBuffers);
+		if (stream == &data->unicam_[Unicam::Image] ||
+		    stream == &data->unicam_[Unicam::Embedded]) {
+			/*
+			 * For Unicam, allocate a minimum of 4 buffers as we want
+			 * to avoid any frame drops. If an application has configured
+			 * a RAW stream, allocate additional buffers to make up the
+			 * minimum, but ensure we have at least 1 set of internal
+			 * buffers to use to minimise frame drops.
+			 */
+			constexpr unsigned int minBuffers = 4;
+			numBuffers = rawStream ? std::max<int>(1, minBuffers - numBuffers)
+					       : minBuffers;
+		} else {
+			/*
+			 * Since the ISP runs synchronous with the IPA and requests,
+			 * we only ever need one set of internal buffers. Any buffers
+			 * the application wants to hold onto will already be exported
+			 * through PipelineHandlerRPi::exportFrameBuffers().
+			 */
+			numBuffers = 1;
+		}
+
+		ret = stream->prepareBuffers(numBuffers);
 		if (ret < 0)
 			return ret;
 	}