[libcamera-devel,2/2] cam: Fix capturing an precis number of requests
diff mbox series

Message ID 20210119123110.2976971-3-niklas.soderlund@ragnatech.se
State Superseded
Delegated to: Niklas Söderlund
Headers show
Series
  • cam: Fix races in event loop and long request processing times
Related show

Commit Message

Niklas Söderlund Jan. 19, 2021, 12:31 p.m. UTC
When moving request processing from the camera manager thread to the
main thread a subtle race condition where added when running with the
--capture=N option.

Before the change the check of how many request had been queued was ran
in the camera manager thread and thus could not race with the producer
thread. After the change if requests are completed faster then they are
consumed (the consumer writes them to disk for example) the check may be
delayed and more then the expected number of request may be asked to
processed.

Sebastian Fricke <sebastian.fricke@posteo.net>
Fixes: 02eae70e15bdbb24 ("cam: Move request processing to main thread")
Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
---
 src/cam/capture.cpp | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Niklas Söderlund Jan. 19, 2021, 12:33 p.m. UTC | #1
On 2021-01-19 13:31:10 +0100, Niklas Söderlund wrote:
> When moving request processing from the camera manager thread to the
> main thread a subtle race condition where added when running with the
> --capture=N option.
> 
> Before the change the check of how many request had been queued was ran
> in the camera manager thread and thus could not race with the producer
> thread. After the change if requests are completed faster then they are
> consumed (the consumer writes them to disk for example) the check may be
> delayed and more then the expected number of request may be asked to
> processed.
> 
> Sebastian Fricke <sebastian.fricke@posteo.net>

Hum this should ofc be 'Reported-by: Sebastian ...'.

> Fixes: 02eae70e15bdbb24 ("cam: Move request processing to main thread")
> Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> ---
>  src/cam/capture.cpp | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/cam/capture.cpp b/src/cam/capture.cpp
> index 113ea49d50046e5b..ef1601c716bfa137 100644
> --- a/src/cam/capture.cpp
> +++ b/src/cam/capture.cpp
> @@ -157,6 +157,12 @@ void Capture::requestComplete(Request *request)
>  	if (request->status() == Request::RequestCancelled)
>  		return;
>  
> +	captureCount_++;
> +	if (captureLimit_ && captureCount_ > captureLimit_) {
> +		loop_->exit(0);
> +		return;
> +	}
> +
>  	/*
>  	 * Defer processing of the completed request to the event loop, to avoid
>  	 * blocking the camera manager thread.
> @@ -206,12 +212,6 @@ void Capture::processRequest(Request *request)
>  
>  	std::cout << info.str() << std::endl;
>  
> -	captureCount_++;
> -	if (captureLimit_ && captureCount_ >= captureLimit_) {
> -		loop_->exit(0);
> -		return;
> -	}
> -
>  	request->reuse(Request::ReuseBuffers);
>  	camera_->queueRequest(request);
>  }
> -- 
> 2.30.0
>
Laurent Pinchart Jan. 19, 2021, 12:50 p.m. UTC | #2
Hi Niklas,

Thank you for the patch.

On Tue, Jan 19, 2021 at 01:31:10PM +0100, Niklas Söderlund wrote:
> When moving request processing from the camera manager thread to the
> main thread a subtle race condition where added when running with the
> --capture=N option.
> 
> Before the change the check of how many request had been queued was ran
> in the camera manager thread and thus could not race with the producer
> thread. After the change if requests are completed faster then they are
> consumed (the consumer writes them to disk for example) the check may be
> delayed and more then the expected number of request may be asked to
> processed.

I'm not sure to see the problem. Capture::processRequest() is called
asynchronously on request completion, but the calls are serialized. As
we don't requeue requests in Capture::requestComplete(), but in
Capture::processRequest(), and only after the captureLimit_ check, we
should never queue more than captureLimit_ requests. With this change
we'll terminate the event loop early, which means that some completed
requests may fail to reach Capture::processRequest(), and won't be
written to a file.

> Sebastian Fricke <sebastian.fricke@posteo.net>
> Fixes: 02eae70e15bdbb24 ("cam: Move request processing to main thread")
> Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> ---
>  src/cam/capture.cpp | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/cam/capture.cpp b/src/cam/capture.cpp
> index 113ea49d50046e5b..ef1601c716bfa137 100644
> --- a/src/cam/capture.cpp
> +++ b/src/cam/capture.cpp
> @@ -157,6 +157,12 @@ void Capture::requestComplete(Request *request)
>  	if (request->status() == Request::RequestCancelled)
>  		return;
>  
> +	captureCount_++;
> +	if (captureLimit_ && captureCount_ > captureLimit_) {
> +		loop_->exit(0);
> +		return;
> +	}
> +
>  	/*
>  	 * Defer processing of the completed request to the event loop, to avoid
>  	 * blocking the camera manager thread.
> @@ -206,12 +212,6 @@ void Capture::processRequest(Request *request)
>  
>  	std::cout << info.str() << std::endl;
>  
> -	captureCount_++;
> -	if (captureLimit_ && captureCount_ >= captureLimit_) {
> -		loop_->exit(0);
> -		return;
> -	}
> -
>  	request->reuse(Request::ReuseBuffers);
>  	camera_->queueRequest(request);
>  }
Niklas Söderlund Jan. 19, 2021, 1:14 p.m. UTC | #3
Hi Laurent,

Thanks for your feedback.

On 2021-01-19 14:50:04 +0200, Laurent Pinchart wrote:
> Hi Niklas,
> 
> Thank you for the patch.
> 
> On Tue, Jan 19, 2021 at 01:31:10PM +0100, Niklas Söderlund wrote:
> > When moving request processing from the camera manager thread to the
> > main thread a subtle race condition where added when running with the
> > --capture=N option.
> > 
> > Before the change the check of how many request had been queued was ran
> > in the camera manager thread and thus could not race with the producer
> > thread. After the change if requests are completed faster then they are
> > consumed (the consumer writes them to disk for example) the check may be
> > delayed and more then the expected number of request may be asked to
> > processed.
> 
> I'm not sure to see the problem. Capture::processRequest() is called
> asynchronously on request completion, but the calls are serialized. As
> we don't requeue requests in Capture::requestComplete(), but in
> Capture::processRequest(), and only after the captureLimit_ check, we
> should never queue more than captureLimit_ requests. With this change
> we'll terminate the event loop early, which means that some completed
> requests may fail to reach Capture::processRequest(), and won't be
> written to a file.

The problem is that we have more then one request queued to libcamera at 
a time. If the check is at the end of Capture::processRequest() and the 
processing time for each request is large more then N requests are 
completed and placed on the callback queue and process. Worst case is 
that we write N + pipeline depth of frames to disk.

One could argue that the design in cam is bad and we should not count 
number of completed requests but rather how many we queue to the camera.  
I think this is something we could switch. But I feel a lot more 
comfortable if we first had a compliance tool to verify that all 
pipelines actually return N requests if N requests where queue ;-P

> 
> > Sebastian Fricke <sebastian.fricke@posteo.net>
> > Fixes: 02eae70e15bdbb24 ("cam: Move request processing to main thread")
> > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> > ---
> >  src/cam/capture.cpp | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/cam/capture.cpp b/src/cam/capture.cpp
> > index 113ea49d50046e5b..ef1601c716bfa137 100644
> > --- a/src/cam/capture.cpp
> > +++ b/src/cam/capture.cpp
> > @@ -157,6 +157,12 @@ void Capture::requestComplete(Request *request)
> >  	if (request->status() == Request::RequestCancelled)
> >  		return;
> >  
> > +	captureCount_++;
> > +	if (captureLimit_ && captureCount_ > captureLimit_) {
> > +		loop_->exit(0);
> > +		return;
> > +	}
> > +
> >  	/*
> >  	 * Defer processing of the completed request to the event loop, to avoid
> >  	 * blocking the camera manager thread.
> > @@ -206,12 +212,6 @@ void Capture::processRequest(Request *request)
> >  
> >  	std::cout << info.str() << std::endl;
> >  
> > -	captureCount_++;
> > -	if (captureLimit_ && captureCount_ >= captureLimit_) {
> > -		loop_->exit(0);
> > -		return;
> > -	}
> > -
> >  	request->reuse(Request::ReuseBuffers);
> >  	camera_->queueRequest(request);
> >  }
> 
> -- 
> Regards,
> 
> Laurent Pinchart
Laurent Pinchart Jan. 19, 2021, 4:08 p.m. UTC | #4
Hi Niklas,

On Tue, Jan 19, 2021 at 02:14:53PM +0100, Niklas Söderlund wrote:
> On 2021-01-19 14:50:04 +0200, Laurent Pinchart wrote:
> > On Tue, Jan 19, 2021 at 01:31:10PM +0100, Niklas Söderlund wrote:
> > > When moving request processing from the camera manager thread to the
> > > main thread a subtle race condition where added when running with the
> > > --capture=N option.
> > > 
> > > Before the change the check of how many request had been queued was ran
> > > in the camera manager thread and thus could not race with the producer
> > > thread. After the change if requests are completed faster then they are
> > > consumed (the consumer writes them to disk for example) the check may be
> > > delayed and more then the expected number of request may be asked to
> > > processed.
> > 
> > I'm not sure to see the problem. Capture::processRequest() is called
> > asynchronously on request completion, but the calls are serialized. As
> > we don't requeue requests in Capture::requestComplete(), but in
> > Capture::processRequest(), and only after the captureLimit_ check, we
> > should never queue more than captureLimit_ requests. With this change
> > we'll terminate the event loop early, which means that some completed
> > requests may fail to reach Capture::processRequest(), and won't be
> > written to a file.
> 
> The problem is that we have more then one request queued to libcamera at 
> a time. If the check is at the end of Capture::processRequest() and the 
> processing time for each request is large more then N requests are 
> completed and placed on the callback queue and process. Worst case is 
> that we write N + pipeline depth of frames to disk.
> 
> One could argue that the design in cam is bad and we should not count 
> number of completed requests but rather how many we queue to the camera.  
> I think this is something we could switch. But I feel a lot more 
> comfortable if we first had a compliance tool to verify that all 
> pipelines actually return N requests if N requests where queue ;-P

I was also going to mention counting the number of buffers queued :-) We
should stop requeuing requests once we reach captureLimit_ queued
requests, and we should stop the event loop once we reach captureLimit_
completed requests. As we prequeue a fixed number of requests at the
beginning we may not need to have two counters, but two counters may not
hurt either.

> > > Sebastian Fricke <sebastian.fricke@posteo.net>
> > > Fixes: 02eae70e15bdbb24 ("cam: Move request processing to main thread")
> > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> > > ---
> > >  src/cam/capture.cpp | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/src/cam/capture.cpp b/src/cam/capture.cpp
> > > index 113ea49d50046e5b..ef1601c716bfa137 100644
> > > --- a/src/cam/capture.cpp
> > > +++ b/src/cam/capture.cpp
> > > @@ -157,6 +157,12 @@ void Capture::requestComplete(Request *request)
> > >  	if (request->status() == Request::RequestCancelled)
> > >  		return;
> > >  
> > > +	captureCount_++;
> > > +	if (captureLimit_ && captureCount_ > captureLimit_) {
> > > +		loop_->exit(0);
> > > +		return;
> > > +	}
> > > +
> > >  	/*
> > >  	 * Defer processing of the completed request to the event loop, to avoid
> > >  	 * blocking the camera manager thread.
> > > @@ -206,12 +212,6 @@ void Capture::processRequest(Request *request)
> > >  
> > >  	std::cout << info.str() << std::endl;
> > >  
> > > -	captureCount_++;
> > > -	if (captureLimit_ && captureCount_ >= captureLimit_) {
> > > -		loop_->exit(0);
> > > -		return;
> > > -	}
> > > -
> > >  	request->reuse(Request::ReuseBuffers);
> > >  	camera_->queueRequest(request);
> > >  }

Patch
diff mbox series

diff --git a/src/cam/capture.cpp b/src/cam/capture.cpp
index 113ea49d50046e5b..ef1601c716bfa137 100644
--- a/src/cam/capture.cpp
+++ b/src/cam/capture.cpp
@@ -157,6 +157,12 @@  void Capture::requestComplete(Request *request)
 	if (request->status() == Request::RequestCancelled)
 		return;
 
+	captureCount_++;
+	if (captureLimit_ && captureCount_ > captureLimit_) {
+		loop_->exit(0);
+		return;
+	}
+
 	/*
 	 * Defer processing of the completed request to the event loop, to avoid
 	 * blocking the camera manager thread.
@@ -206,12 +212,6 @@  void Capture::processRequest(Request *request)
 
 	std::cout << info.str() << std::endl;
 
-	captureCount_++;
-	if (captureLimit_ && captureCount_ >= captureLimit_) {
-		loop_->exit(0);
-		return;
-	}
-
 	request->reuse(Request::ReuseBuffers);
 	camera_->queueRequest(request);
 }