[libcamera-devel,v2] cam: event_loop: Stop processing calls when event loop is exiting
diff mbox series

Message ID 20210119233315.3305649-1-niklas.soderlund@ragnatech.se
State Superseded
Delegated to: Niklas Söderlund
Headers show
Series
  • [libcamera-devel,v2] cam: event_loop: Stop processing calls when event loop is exiting
Related show

Commit Message

Niklas Söderlund Jan. 19, 2021, 11:33 p.m. UTC
Terminating the event loop with EventLoop::exit() does not grantee that
it will terminate. If the event loops 'call later' queue can be kept
non-empty by continuously queuing new calls using EventLoop::callLater()
either from a different thread or from callbacks in the loop itself
EventLoop::dispatchCalls() will never complete and the loop will run
forever.

Solve this by evaluating the exit condition between each callback and
only enter the idle loop if there are no callbacks to process.

Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net>
Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support")
Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
---
* Changes since v1
- Was [PATCH 0/2] cam: Fix races in event loop and long request
  processing times
- Rework dispatchCalls() instead of callLater() for a simpler fix.
---
 src/cam/event_loop.cpp | 22 +++++++++++++---------
 src/cam/event_loop.h   |  2 +-
 2 files changed, 14 insertions(+), 10 deletions(-)

Comments

Laurent Pinchart Jan. 20, 2021, 3:56 a.m. UTC | #1
Ni Niklas,

Thank you for the patch.

On Wed, Jan 20, 2021 at 12:33:15AM +0100, Niklas Söderlund wrote:
> Terminating the event loop with EventLoop::exit() does not grantee that
> it will terminate. If the event loops 'call later' queue can be kept
> non-empty by continuously queuing new calls using EventLoop::callLater()
> either from a different thread or from callbacks in the loop itself
> EventLoop::dispatchCalls() will never complete and the loop will run
> forever.
> 
> Solve this by evaluating the exit condition between each callback and
> only enter the idle loop if there are no callbacks to process.
> 
> Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net>
> Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support")
> Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>

This seems to be even more of a hack than the previous version :-( Can
we fix this properly please ?

> ---
> * Changes since v1
> - Was [PATCH 0/2] cam: Fix races in event loop and long request
>   processing times
> - Rework dispatchCalls() instead of callLater() for a simpler fix.
> ---
>  src/cam/event_loop.cpp | 22 +++++++++++++---------
>  src/cam/event_loop.h   |  2 +-
>  2 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp
> index 94c5d1d362455f33..18786b4500be9b6f 100644
> --- a/src/cam/event_loop.cpp
> +++ b/src/cam/event_loop.cpp
> @@ -41,7 +41,9 @@ int EventLoop::exec()
>  	exit_.store(false, std::memory_order_release);
>  
>  	while (!exit_.load(std::memory_order_acquire)) {
> -		dispatchCalls();
> +		if (dispatchCall())
> +			continue;
> +
>  		event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY);
>  	}
>  
> @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func)
>  	interrupt();
>  }
>  
> -void EventLoop::dispatchCalls()
> +bool EventLoop::dispatchCall()
>  {
>  	std::unique_lock<std::mutex> locker(lock_);
>  
> -	for (auto iter = calls_.begin(); iter != calls_.end(); ) {
> -		std::function<void()> call = std::move(*iter);
> +	if (calls_.empty())
> +		return false;
>  
> -		iter = calls_.erase(iter);
> +	auto iter = calls_.begin();
> +	std::function<void()> call = std::move(*iter);
> +	calls_.erase(iter);
>  
> -		locker.unlock();
> -		call();
> -		locker.lock();
> -	}
> +	locker.unlock();
> +	call();
> +
> +	return true;
>  }
> diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h
> index 408073c50594d09d..b84e1c9e432086e3 100644
> --- a/src/cam/event_loop.h
> +++ b/src/cam/event_loop.h
> @@ -38,7 +38,7 @@ private:
>  	std::mutex lock_;
>  
>  	void interrupt();
> -	void dispatchCalls();
> +	bool dispatchCall();
>  };
>  
>  #endif /* __CAM_EVENT_LOOP_H__ */
Niklas Söderlund Jan. 20, 2021, 9:12 a.m. UTC | #2
Hi Laurent,

Thanks for your feedback.

On 2021-01-20 05:56:46 +0200, Laurent Pinchart wrote:
> Ni Niklas,
> 
> Thank you for the patch.
> 
> On Wed, Jan 20, 2021 at 12:33:15AM +0100, Niklas Söderlund wrote:
> > Terminating the event loop with EventLoop::exit() does not grantee that
> > it will terminate. If the event loops 'call later' queue can be kept
> > non-empty by continuously queuing new calls using EventLoop::callLater()
> > either from a different thread or from callbacks in the loop itself
> > EventLoop::dispatchCalls() will never complete and the loop will run
> > forever.
> > 
> > Solve this by evaluating the exit condition between each callback and
> > only enter the idle loop if there are no callbacks to process.
> > 
> > Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net>
> > Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support")
> > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> 
> This seems to be even more of a hack than the previous version :-( Can
> we fix this properly please ?

I would like noting more then to fix this properly.

The heart of the problem is that some of our pipelines require more then 
N requests to be queued for it to complete N. I agree it's bad that our 
pipelines behaves differently and something should be done to align 
them.

It feels natural that such work would be paired with a compliance tool 
so situations like this can be detected early and avoided. I have been 
tempted many times to start a compliance tool but have postponed it. My 
primary reason for doing so is that we never really defined the number 
of requests queued vs. pipeline depth and pipeline stalls and what a 
correct behavior in such situations is. I was pleased with the camera 
model work had was hoping it would lead to a more strict definition wich 
such tools could be built upon.

How ever solving the core problem is a rather large task to sort out.  
How can we handle the issue in the short time frame? No matter how I 
dice it I see a hack in cam is needed ;-(

- The best option I see is to use the 'process call in batches' solution
  as your suggestion in v1 and then either remove support for 
  --capture=N or accept that on "broken" pipelines N+pipeline depth 
  frames will be processed, but at least cam won't deadlock.

- Another other option (that I really don't like) is to revert the 
  deferred calls support.

- We can also fix this properly in cam by only queuing N requests and 
  expecting N requests to complete. This will break --capture=N on some 
  pipelines, but perhaps that is the pressure we need to push the fix 
  upwards.

> 
> > ---
> > * Changes since v1
> > - Was [PATCH 0/2] cam: Fix races in event loop and long request
> >   processing times
> > - Rework dispatchCalls() instead of callLater() for a simpler fix.
> > ---
> >  src/cam/event_loop.cpp | 22 +++++++++++++---------
> >  src/cam/event_loop.h   |  2 +-
> >  2 files changed, 14 insertions(+), 10 deletions(-)
> > 
> > diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp
> > index 94c5d1d362455f33..18786b4500be9b6f 100644
> > --- a/src/cam/event_loop.cpp
> > +++ b/src/cam/event_loop.cpp
> > @@ -41,7 +41,9 @@ int EventLoop::exec()
> >  	exit_.store(false, std::memory_order_release);
> >  
> >  	while (!exit_.load(std::memory_order_acquire)) {
> > -		dispatchCalls();
> > +		if (dispatchCall())
> > +			continue;
> > +
> >  		event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY);
> >  	}
> >  
> > @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func)
> >  	interrupt();
> >  }
> >  
> > -void EventLoop::dispatchCalls()
> > +bool EventLoop::dispatchCall()
> >  {
> >  	std::unique_lock<std::mutex> locker(lock_);
> >  
> > -	for (auto iter = calls_.begin(); iter != calls_.end(); ) {
> > -		std::function<void()> call = std::move(*iter);
> > +	if (calls_.empty())
> > +		return false;
> >  
> > -		iter = calls_.erase(iter);
> > +	auto iter = calls_.begin();
> > +	std::function<void()> call = std::move(*iter);
> > +	calls_.erase(iter);
> >  
> > -		locker.unlock();
> > -		call();
> > -		locker.lock();
> > -	}
> > +	locker.unlock();
> > +	call();
> > +
> > +	return true;
> >  }
> > diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h
> > index 408073c50594d09d..b84e1c9e432086e3 100644
> > --- a/src/cam/event_loop.h
> > +++ b/src/cam/event_loop.h
> > @@ -38,7 +38,7 @@ private:
> >  	std::mutex lock_;
> >  
> >  	void interrupt();
> > -	void dispatchCalls();
> > +	bool dispatchCall();
> >  };
> >  
> >  #endif /* __CAM_EVENT_LOOP_H__ */
> 
> -- 
> Regards,
> 
> Laurent Pinchart
Sebastian Fricke Jan. 20, 2021, 10:29 a.m. UTC | #3
Hey Niklas & Laurent,

On 20.01.2021 10:12, Niklas Söderlund wrote:
>Hi Laurent,
>
>Thanks for your feedback.
>
>On 2021-01-20 05:56:46 +0200, Laurent Pinchart wrote:
>> Ni Niklas,
>>
>> Thank you for the patch.
>>
>> On Wed, Jan 20, 2021 at 12:33:15AM +0100, Niklas Söderlund wrote:
>> > Terminating the event loop with EventLoop::exit() does not grantee that
>> > it will terminate. If the event loops 'call later' queue can be kept
>> > non-empty by continuously queuing new calls using EventLoop::callLater()
>> > either from a different thread or from callbacks in the loop itself
>> > EventLoop::dispatchCalls() will never complete and the loop will run
>> > forever.
>> >
>> > Solve this by evaluating the exit condition between each callback and
>> > only enter the idle loop if there are no callbacks to process.
>> >
>> > Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net>
>> > Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support")
>> > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
>>
>> This seems to be even more of a hack than the previous version :-( Can
>> we fix this properly please ?
>
>I would like noting more then to fix this properly.
>
>The heart of the problem is that some of our pipelines require more then
>N requests to be queued for it to complete N. I agree it's bad that our
>pipelines behaves differently and something should be done to align
>them.
>
>It feels natural that such work would be paired with a compliance tool
>so situations like this can be detected early and avoided. I have been
>tempted many times to start a compliance tool but have postponed it. My

I am currently looking for a follow up project to start within the next 2-3
weeks. If you would like to we could talk about your idea and design, so
that I am able to start working on it, as soon as my current projects
are finished.

>primary reason for doing so is that we never really defined the number
>of requests queued vs. pipeline depth and pipeline stalls and what a
>correct behavior in such situations is. I was pleased with the camera
>model work had was hoping it would lead to a more strict definition wich
>such tools could be built upon.
>
>How ever solving the core problem is a rather large task to sort out.

If I understood your discussion correctly, then the sub-tasks of this
one 'bigger' task are:
- Make sure that each pipeline requies exactly the expected amount of
   requests in order process the given amount of frames, by implementing
   a complience tool to check the pipelines.
- Switch the behavior of the `cam` command to count the number of queued
   requests to the camera instead of the number of completed requests.

Which other tasks would you add to that list?

>How can we handle the issue in the short time frame? No matter how I
>dice it I see a hack in cam is needed ;-(
>
>- The best option I see is to use the 'process call in batches' solution
>  as your suggestion in v1 and then either remove support for
>  --capture=N or accept that on "broken" pipelines N+pipeline depth
>  frames will be processed, but at least cam won't deadlock.
>
>- Another other option (that I really don't like) is to revert the
>  deferred calls support.
>
>- We can also fix this properly in cam by only queuing N requests and
>  expecting N requests to complete. This will break --capture=N on some
>  pipelines, but perhaps that is the pressure we need to push the fix
>  upwards.

I think I like the idea of starting out with the right behavior in the
first place. Are there any cases known to you guys, where the `cam`
application is used for anything else than testing the libcamera project?
Like is there anything that could actually "break"?

>
>>
>> > ---
>> > * Changes since v1
>> > - Was [PATCH 0/2] cam: Fix races in event loop and long request
>> >   processing times
>> > - Rework dispatchCalls() instead of callLater() for a simpler fix.
>> > ---
>> >  src/cam/event_loop.cpp | 22 +++++++++++++---------
>> >  src/cam/event_loop.h   |  2 +-
>> >  2 files changed, 14 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp
>> > index 94c5d1d362455f33..18786b4500be9b6f 100644
>> > --- a/src/cam/event_loop.cpp
>> > +++ b/src/cam/event_loop.cpp
>> > @@ -41,7 +41,9 @@ int EventLoop::exec()
>> >  	exit_.store(false, std::memory_order_release);
>> >
>> >  	while (!exit_.load(std::memory_order_acquire)) {
>> > -		dispatchCalls();
>> > +		if (dispatchCall())
>> > +			continue;
>> > +
>> >  		event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY);
>> >  	}
>> >
>> > @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func)
>> >  	interrupt();
>> >  }
>> >
>> > -void EventLoop::dispatchCalls()
>> > +bool EventLoop::dispatchCall()
>> >  {
>> >  	std::unique_lock<std::mutex> locker(lock_);
>> >
>> > -	for (auto iter = calls_.begin(); iter != calls_.end(); ) {
>> > -		std::function<void()> call = std::move(*iter);
>> > +	if (calls_.empty())
>> > +		return false;
>> >
>> > -		iter = calls_.erase(iter);
>> > +	auto iter = calls_.begin();
>> > +	std::function<void()> call = std::move(*iter);
>> > +	calls_.erase(iter);
>> >
>> > -		locker.unlock();
>> > -		call();
>> > -		locker.lock();
>> > -	}
>> > +	locker.unlock();
>> > +	call();
>> > +
>> > +	return true;
>> >  }
>> > diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h
>> > index 408073c50594d09d..b84e1c9e432086e3 100644
>> > --- a/src/cam/event_loop.h
>> > +++ b/src/cam/event_loop.h
>> > @@ -38,7 +38,7 @@ private:
>> >  	std::mutex lock_;
>> >
>> >  	void interrupt();
>> > -	void dispatchCalls();
>> > +	bool dispatchCall();
>> >  };
>> >
>> >  #endif /* __CAM_EVENT_LOOP_H__ */
>>
>> --
>> Regards,
>>
>> Laurent Pinchart
>
>-- 
>Regards,
>Niklas Söderlund

Greetings,
Sebastian Fricke
>_______________________________________________
>libcamera-devel mailing list
>libcamera-devel@lists.libcamera.org
>https://lists.libcamera.org/listinfo/libcamera-devel

Patch
diff mbox series

diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp
index 94c5d1d362455f33..18786b4500be9b6f 100644
--- a/src/cam/event_loop.cpp
+++ b/src/cam/event_loop.cpp
@@ -41,7 +41,9 @@  int EventLoop::exec()
 	exit_.store(false, std::memory_order_release);
 
 	while (!exit_.load(std::memory_order_acquire)) {
-		dispatchCalls();
+		if (dispatchCall())
+			continue;
+
 		event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY);
 	}
 
@@ -70,17 +72,19 @@  void EventLoop::callLater(const std::function<void()> &func)
 	interrupt();
 }
 
-void EventLoop::dispatchCalls()
+bool EventLoop::dispatchCall()
 {
 	std::unique_lock<std::mutex> locker(lock_);
 
-	for (auto iter = calls_.begin(); iter != calls_.end(); ) {
-		std::function<void()> call = std::move(*iter);
+	if (calls_.empty())
+		return false;
 
-		iter = calls_.erase(iter);
+	auto iter = calls_.begin();
+	std::function<void()> call = std::move(*iter);
+	calls_.erase(iter);
 
-		locker.unlock();
-		call();
-		locker.lock();
-	}
+	locker.unlock();
+	call();
+
+	return true;
 }
diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h
index 408073c50594d09d..b84e1c9e432086e3 100644
--- a/src/cam/event_loop.h
+++ b/src/cam/event_loop.h
@@ -38,7 +38,7 @@  private:
 	std::mutex lock_;
 
 	void interrupt();
-	void dispatchCalls();
+	bool dispatchCall();
 };
 
 #endif /* __CAM_EVENT_LOOP_H__ */