Message ID | 20210119233315.3305649-1-niklas.soderlund@ragnatech.se |
---|---|
State | Superseded |
Delegated to: | Niklas Söderlund |
Headers | show |
Series |
|
Related | show |
Ni Niklas, Thank you for the patch. On Wed, Jan 20, 2021 at 12:33:15AM +0100, Niklas Söderlund wrote: > Terminating the event loop with EventLoop::exit() does not grantee that > it will terminate. If the event loops 'call later' queue can be kept > non-empty by continuously queuing new calls using EventLoop::callLater() > either from a different thread or from callbacks in the loop itself > EventLoop::dispatchCalls() will never complete and the loop will run > forever. > > Solve this by evaluating the exit condition between each callback and > only enter the idle loop if there are no callbacks to process. > > Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net> > Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support") > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> This seems to be even more of a hack than the previous version :-( Can we fix this properly please ? > --- > * Changes since v1 > - Was [PATCH 0/2] cam: Fix races in event loop and long request > processing times > - Rework dispatchCalls() instead of callLater() for a simpler fix. > --- > src/cam/event_loop.cpp | 22 +++++++++++++--------- > src/cam/event_loop.h | 2 +- > 2 files changed, 14 insertions(+), 10 deletions(-) > > diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp > index 94c5d1d362455f33..18786b4500be9b6f 100644 > --- a/src/cam/event_loop.cpp > +++ b/src/cam/event_loop.cpp > @@ -41,7 +41,9 @@ int EventLoop::exec() > exit_.store(false, std::memory_order_release); > > while (!exit_.load(std::memory_order_acquire)) { > - dispatchCalls(); > + if (dispatchCall()) > + continue; > + > event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY); > } > > @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func) > interrupt(); > } > > -void EventLoop::dispatchCalls() > +bool EventLoop::dispatchCall() > { > std::unique_lock<std::mutex> locker(lock_); > > - for (auto iter = calls_.begin(); iter != calls_.end(); ) { > - std::function<void()> call = std::move(*iter); > + if (calls_.empty()) > + return false; > > - iter = calls_.erase(iter); > + auto iter = calls_.begin(); > + std::function<void()> call = std::move(*iter); > + calls_.erase(iter); > > - locker.unlock(); > - call(); > - locker.lock(); > - } > + locker.unlock(); > + call(); > + > + return true; > } > diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h > index 408073c50594d09d..b84e1c9e432086e3 100644 > --- a/src/cam/event_loop.h > +++ b/src/cam/event_loop.h > @@ -38,7 +38,7 @@ private: > std::mutex lock_; > > void interrupt(); > - void dispatchCalls(); > + bool dispatchCall(); > }; > > #endif /* __CAM_EVENT_LOOP_H__ */
Hi Laurent, Thanks for your feedback. On 2021-01-20 05:56:46 +0200, Laurent Pinchart wrote: > Ni Niklas, > > Thank you for the patch. > > On Wed, Jan 20, 2021 at 12:33:15AM +0100, Niklas Söderlund wrote: > > Terminating the event loop with EventLoop::exit() does not grantee that > > it will terminate. If the event loops 'call later' queue can be kept > > non-empty by continuously queuing new calls using EventLoop::callLater() > > either from a different thread or from callbacks in the loop itself > > EventLoop::dispatchCalls() will never complete and the loop will run > > forever. > > > > Solve this by evaluating the exit condition between each callback and > > only enter the idle loop if there are no callbacks to process. > > > > Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net> > > Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support") > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> > > This seems to be even more of a hack than the previous version :-( Can > we fix this properly please ? I would like noting more then to fix this properly. The heart of the problem is that some of our pipelines require more then N requests to be queued for it to complete N. I agree it's bad that our pipelines behaves differently and something should be done to align them. It feels natural that such work would be paired with a compliance tool so situations like this can be detected early and avoided. I have been tempted many times to start a compliance tool but have postponed it. My primary reason for doing so is that we never really defined the number of requests queued vs. pipeline depth and pipeline stalls and what a correct behavior in such situations is. I was pleased with the camera model work had was hoping it would lead to a more strict definition wich such tools could be built upon. How ever solving the core problem is a rather large task to sort out. How can we handle the issue in the short time frame? No matter how I dice it I see a hack in cam is needed ;-( - The best option I see is to use the 'process call in batches' solution as your suggestion in v1 and then either remove support for --capture=N or accept that on "broken" pipelines N+pipeline depth frames will be processed, but at least cam won't deadlock. - Another other option (that I really don't like) is to revert the deferred calls support. - We can also fix this properly in cam by only queuing N requests and expecting N requests to complete. This will break --capture=N on some pipelines, but perhaps that is the pressure we need to push the fix upwards. > > > --- > > * Changes since v1 > > - Was [PATCH 0/2] cam: Fix races in event loop and long request > > processing times > > - Rework dispatchCalls() instead of callLater() for a simpler fix. > > --- > > src/cam/event_loop.cpp | 22 +++++++++++++--------- > > src/cam/event_loop.h | 2 +- > > 2 files changed, 14 insertions(+), 10 deletions(-) > > > > diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp > > index 94c5d1d362455f33..18786b4500be9b6f 100644 > > --- a/src/cam/event_loop.cpp > > +++ b/src/cam/event_loop.cpp > > @@ -41,7 +41,9 @@ int EventLoop::exec() > > exit_.store(false, std::memory_order_release); > > > > while (!exit_.load(std::memory_order_acquire)) { > > - dispatchCalls(); > > + if (dispatchCall()) > > + continue; > > + > > event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY); > > } > > > > @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func) > > interrupt(); > > } > > > > -void EventLoop::dispatchCalls() > > +bool EventLoop::dispatchCall() > > { > > std::unique_lock<std::mutex> locker(lock_); > > > > - for (auto iter = calls_.begin(); iter != calls_.end(); ) { > > - std::function<void()> call = std::move(*iter); > > + if (calls_.empty()) > > + return false; > > > > - iter = calls_.erase(iter); > > + auto iter = calls_.begin(); > > + std::function<void()> call = std::move(*iter); > > + calls_.erase(iter); > > > > - locker.unlock(); > > - call(); > > - locker.lock(); > > - } > > + locker.unlock(); > > + call(); > > + > > + return true; > > } > > diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h > > index 408073c50594d09d..b84e1c9e432086e3 100644 > > --- a/src/cam/event_loop.h > > +++ b/src/cam/event_loop.h > > @@ -38,7 +38,7 @@ private: > > std::mutex lock_; > > > > void interrupt(); > > - void dispatchCalls(); > > + bool dispatchCall(); > > }; > > > > #endif /* __CAM_EVENT_LOOP_H__ */ > > -- > Regards, > > Laurent Pinchart
Hey Niklas & Laurent, On 20.01.2021 10:12, Niklas Söderlund wrote: >Hi Laurent, > >Thanks for your feedback. > >On 2021-01-20 05:56:46 +0200, Laurent Pinchart wrote: >> Ni Niklas, >> >> Thank you for the patch. >> >> On Wed, Jan 20, 2021 at 12:33:15AM +0100, Niklas Söderlund wrote: >> > Terminating the event loop with EventLoop::exit() does not grantee that >> > it will terminate. If the event loops 'call later' queue can be kept >> > non-empty by continuously queuing new calls using EventLoop::callLater() >> > either from a different thread or from callbacks in the loop itself >> > EventLoop::dispatchCalls() will never complete and the loop will run >> > forever. >> > >> > Solve this by evaluating the exit condition between each callback and >> > only enter the idle loop if there are no callbacks to process. >> > >> > Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net> >> > Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support") >> > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> >> >> This seems to be even more of a hack than the previous version :-( Can >> we fix this properly please ? > >I would like noting more then to fix this properly. > >The heart of the problem is that some of our pipelines require more then >N requests to be queued for it to complete N. I agree it's bad that our >pipelines behaves differently and something should be done to align >them. > >It feels natural that such work would be paired with a compliance tool >so situations like this can be detected early and avoided. I have been >tempted many times to start a compliance tool but have postponed it. My I am currently looking for a follow up project to start within the next 2-3 weeks. If you would like to we could talk about your idea and design, so that I am able to start working on it, as soon as my current projects are finished. >primary reason for doing so is that we never really defined the number >of requests queued vs. pipeline depth and pipeline stalls and what a >correct behavior in such situations is. I was pleased with the camera >model work had was hoping it would lead to a more strict definition wich >such tools could be built upon. > >How ever solving the core problem is a rather large task to sort out. If I understood your discussion correctly, then the sub-tasks of this one 'bigger' task are: - Make sure that each pipeline requies exactly the expected amount of requests in order process the given amount of frames, by implementing a complience tool to check the pipelines. - Switch the behavior of the `cam` command to count the number of queued requests to the camera instead of the number of completed requests. Which other tasks would you add to that list? >How can we handle the issue in the short time frame? No matter how I >dice it I see a hack in cam is needed ;-( > >- The best option I see is to use the 'process call in batches' solution > as your suggestion in v1 and then either remove support for > --capture=N or accept that on "broken" pipelines N+pipeline depth > frames will be processed, but at least cam won't deadlock. > >- Another other option (that I really don't like) is to revert the > deferred calls support. > >- We can also fix this properly in cam by only queuing N requests and > expecting N requests to complete. This will break --capture=N on some > pipelines, but perhaps that is the pressure we need to push the fix > upwards. I think I like the idea of starting out with the right behavior in the first place. Are there any cases known to you guys, where the `cam` application is used for anything else than testing the libcamera project? Like is there anything that could actually "break"? > >> >> > --- >> > * Changes since v1 >> > - Was [PATCH 0/2] cam: Fix races in event loop and long request >> > processing times >> > - Rework dispatchCalls() instead of callLater() for a simpler fix. >> > --- >> > src/cam/event_loop.cpp | 22 +++++++++++++--------- >> > src/cam/event_loop.h | 2 +- >> > 2 files changed, 14 insertions(+), 10 deletions(-) >> > >> > diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp >> > index 94c5d1d362455f33..18786b4500be9b6f 100644 >> > --- a/src/cam/event_loop.cpp >> > +++ b/src/cam/event_loop.cpp >> > @@ -41,7 +41,9 @@ int EventLoop::exec() >> > exit_.store(false, std::memory_order_release); >> > >> > while (!exit_.load(std::memory_order_acquire)) { >> > - dispatchCalls(); >> > + if (dispatchCall()) >> > + continue; >> > + >> > event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY); >> > } >> > >> > @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func) >> > interrupt(); >> > } >> > >> > -void EventLoop::dispatchCalls() >> > +bool EventLoop::dispatchCall() >> > { >> > std::unique_lock<std::mutex> locker(lock_); >> > >> > - for (auto iter = calls_.begin(); iter != calls_.end(); ) { >> > - std::function<void()> call = std::move(*iter); >> > + if (calls_.empty()) >> > + return false; >> > >> > - iter = calls_.erase(iter); >> > + auto iter = calls_.begin(); >> > + std::function<void()> call = std::move(*iter); >> > + calls_.erase(iter); >> > >> > - locker.unlock(); >> > - call(); >> > - locker.lock(); >> > - } >> > + locker.unlock(); >> > + call(); >> > + >> > + return true; >> > } >> > diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h >> > index 408073c50594d09d..b84e1c9e432086e3 100644 >> > --- a/src/cam/event_loop.h >> > +++ b/src/cam/event_loop.h >> > @@ -38,7 +38,7 @@ private: >> > std::mutex lock_; >> > >> > void interrupt(); >> > - void dispatchCalls(); >> > + bool dispatchCall(); >> > }; >> > >> > #endif /* __CAM_EVENT_LOOP_H__ */ >> >> -- >> Regards, >> >> Laurent Pinchart > >-- >Regards, >Niklas Söderlund Greetings, Sebastian Fricke >_______________________________________________ >libcamera-devel mailing list >libcamera-devel@lists.libcamera.org >https://lists.libcamera.org/listinfo/libcamera-devel
diff --git a/src/cam/event_loop.cpp b/src/cam/event_loop.cpp index 94c5d1d362455f33..18786b4500be9b6f 100644 --- a/src/cam/event_loop.cpp +++ b/src/cam/event_loop.cpp @@ -41,7 +41,9 @@ int EventLoop::exec() exit_.store(false, std::memory_order_release); while (!exit_.load(std::memory_order_acquire)) { - dispatchCalls(); + if (dispatchCall()) + continue; + event_base_loop(event_, EVLOOP_NO_EXIT_ON_EMPTY); } @@ -70,17 +72,19 @@ void EventLoop::callLater(const std::function<void()> &func) interrupt(); } -void EventLoop::dispatchCalls() +bool EventLoop::dispatchCall() { std::unique_lock<std::mutex> locker(lock_); - for (auto iter = calls_.begin(); iter != calls_.end(); ) { - std::function<void()> call = std::move(*iter); + if (calls_.empty()) + return false; - iter = calls_.erase(iter); + auto iter = calls_.begin(); + std::function<void()> call = std::move(*iter); + calls_.erase(iter); - locker.unlock(); - call(); - locker.lock(); - } + locker.unlock(); + call(); + + return true; } diff --git a/src/cam/event_loop.h b/src/cam/event_loop.h index 408073c50594d09d..b84e1c9e432086e3 100644 --- a/src/cam/event_loop.h +++ b/src/cam/event_loop.h @@ -38,7 +38,7 @@ private: std::mutex lock_; void interrupt(); - void dispatchCalls(); + bool dispatchCall(); }; #endif /* __CAM_EVENT_LOOP_H__ */
Terminating the event loop with EventLoop::exit() does not grantee that it will terminate. If the event loops 'call later' queue can be kept non-empty by continuously queuing new calls using EventLoop::callLater() either from a different thread or from callbacks in the loop itself EventLoop::dispatchCalls() will never complete and the loop will run forever. Solve this by evaluating the exit condition between each callback and only enter the idle loop if there are no callbacks to process. Reported-by: Sebastian Fricke <sebastian.fricke@posteo.net> Fixes: f49e93338b6309a6 ("cam: event_loop: Add deferred calls support") Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> --- * Changes since v1 - Was [PATCH 0/2] cam: Fix races in event loop and long request processing times - Rework dispatchCalls() instead of callLater() for a simpler fix. --- src/cam/event_loop.cpp | 22 +++++++++++++--------- src/cam/event_loop.h | 2 +- 2 files changed, 14 insertions(+), 10 deletions(-)