[libcamera-devel,RFC] android: CameraWorker: Cancel pending messages in queue in stop()
diff mbox series

Message ID 20210423040427.1226704-1-hiroh@chromium.org
State Superseded
Headers show
Series
  • [libcamera-devel,RFC] android: CameraWorker: Cancel pending messages in queue in stop()
Related show

Commit Message

Hirokazu Honda April 23, 2021, 4:04 a.m. UTC
CameraWorker::stop() executes Thread::exit() and Thread::wait().
However, intuitively, the two functions doesn't clean up pending
messages in the message queue of the thread. They are rather
cleaned in the destruction of CameraWorker, but CameraWorker's
life time is the same as CameraDevice and thus CameraManager,
which is never destroyed. On the other hand,
Camera3RequestDescriptors are destroyed in CameraDevice::stop().

This causes CameraWorker::Worker::queueRequest() starts to be
executed after CameraWorker::start(), which is after
CameraDevice::stop(). This obviously causes a segmentation fault,
because CameraWorker::Worker::queueRequest() touches
CaptureRequest which is owned by Camera3RequestDescriptor.

This fixes the issue by cancel pending message in thread queue in
CameraWorker::stop() with Thread::dispatchMessages().

Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
---
 src/android/camera_worker.cpp | 1 +
 1 file changed, 1 insertion(+)

Comments

Hirokazu Honda May 10, 2021, 6:03 a.m. UTC | #1
Gentle ping for review.

On Fri, Apr 23, 2021 at 1:04 PM Hirokazu Honda <hiroh@chromium.org> wrote:

> CameraWorker::stop() executes Thread::exit() and Thread::wait().
> However, intuitively, the two functions doesn't clean up pending
> messages in the message queue of the thread. They are rather
> cleaned in the destruction of CameraWorker, but CameraWorker's
> life time is the same as CameraDevice and thus CameraManager,
> which is never destroyed. On the other hand,
> Camera3RequestDescriptors are destroyed in CameraDevice::stop().
>
> This causes CameraWorker::Worker::queueRequest() starts to be
> executed after CameraWorker::start(), which is after
> CameraDevice::stop(). This obviously causes a segmentation fault,
> because CameraWorker::Worker::queueRequest() touches
> CaptureRequest which is owned by Camera3RequestDescriptor.
>
> This fixes the issue by cancel pending message in thread queue in
> CameraWorker::stop() with Thread::dispatchMessages().
>
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> ---
>  src/android/camera_worker.cpp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
> index 300ddde0..b68ce9d5 100644
> --- a/src/android/camera_worker.cpp
> +++ b/src/android/camera_worker.cpp
> @@ -62,6 +62,7 @@ void CameraWorker::start()
>
>  void CameraWorker::stop()
>  {
> +       thread_.dispatchMessages(Message::Type::InvokeMessage);
>         thread_.exit();
>         thread_.wait();
>  }
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
>
Hirokazu Honda May 20, 2021, 8:59 a.m. UTC | #2
Gentle ping for review.

On Mon, May 10, 2021 at 3:03 PM Hirokazu Honda <hiroh@chromium.org> wrote:

> Gentle ping for review.
>
> On Fri, Apr 23, 2021 at 1:04 PM Hirokazu Honda <hiroh@chromium.org> wrote:
>
>> CameraWorker::stop() executes Thread::exit() and Thread::wait().
>> However, intuitively, the two functions doesn't clean up pending
>> messages in the message queue of the thread. They are rather
>> cleaned in the destruction of CameraWorker, but CameraWorker's
>> life time is the same as CameraDevice and thus CameraManager,
>> which is never destroyed. On the other hand,
>> Camera3RequestDescriptors are destroyed in CameraDevice::stop().
>>
>> This causes CameraWorker::Worker::queueRequest() starts to be
>> executed after CameraWorker::start(), which is after
>> CameraDevice::stop(). This obviously causes a segmentation fault,
>> because CameraWorker::Worker::queueRequest() touches
>> CaptureRequest which is owned by Camera3RequestDescriptor.
>>
>> This fixes the issue by cancel pending message in thread queue in
>> CameraWorker::stop() with Thread::dispatchMessages().
>>
>> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
>> ---
>>  src/android/camera_worker.cpp | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
>> index 300ddde0..b68ce9d5 100644
>> --- a/src/android/camera_worker.cpp
>> +++ b/src/android/camera_worker.cpp
>> @@ -62,6 +62,7 @@ void CameraWorker::start()
>>
>>  void CameraWorker::stop()
>>  {
>> +       thread_.dispatchMessages(Message::Type::InvokeMessage);
>>         thread_.exit();
>>         thread_.wait();
>>  }
>> --
>> 2.31.1.498.g6c1eba8ee3d-goog
>>
>>
Jacopo Mondi May 20, 2021, 11:12 a.m. UTC | #3
On Fri, Apr 23, 2021 at 01:04:27PM +0900, Hirokazu Honda wrote:
> CameraWorker::stop() executes Thread::exit() and Thread::wait().
> However, intuitively, the two functions doesn't clean up pending
> messages in the message queue of the thread. They are rather
> cleaned in the destruction of CameraWorker, but CameraWorker's
> life time is the same as CameraDevice and thus CameraManager,
> which is never destroyed. On the other hand,
> Camera3RequestDescriptors are destroyed in CameraDevice::stop().
>
> This causes CameraWorker::Worker::queueRequest() starts to be
> executed after CameraWorker::start(), which is after

I'm not sure I'm following right the sequence of Worker::queueRequest(),
CameraWorker::start() and CameraDevice::stop()

Should this be CameraWorker::stop() ?

Did you ever triggered this failure ?

> CameraDevice::stop(). This obviously causes a segmentation fault,
> because CameraWorker::Worker::queueRequest() touches
> CaptureRequest which is owned by Camera3RequestDescriptor.
>
> This fixes the issue by cancel pending message in thread queue in
> CameraWorker::stop() with Thread::dispatchMessages().

For my limited understanding of the Object and Thread classes, I think
it's ok to dispatch all pending method calls.

However what concerns me is also Request that are pending in
void CameraWorker::Worker::processRequest() when the Worker gets
stopped. Should we wait for the queue of pending requests to have
finised waiting their fences (or instrument an interruption mechanism)
before returning from stop() ?

>
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> ---
>  src/android/camera_worker.cpp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
> index 300ddde0..b68ce9d5 100644
> --- a/src/android/camera_worker.cpp
> +++ b/src/android/camera_worker.cpp
> @@ -62,6 +62,7 @@ void CameraWorker::start()
>
>  void CameraWorker::stop()
>  {
> +	thread_.dispatchMessages(Message::Type::InvokeMessage);
>  	thread_.exit();
>  	thread_.wait();
>  }
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
Hirokazu Honda May 21, 2021, 4:44 a.m. UTC | #4
Hi Jacopo, thank you for reviewing.

On Thu, May 20, 2021 at 8:12 PM Jacopo Mondi <jacopo@jmondi.org> wrote:

> On Fri, Apr 23, 2021 at 01:04:27PM +0900, Hirokazu Honda wrote:
> > CameraWorker::stop() executes Thread::exit() and Thread::wait().
> > However, intuitively, the two functions doesn't clean up pending
>

s/intuitively/surprisingly/

> messages in the message queue of the thread. They are rather
> > cleaned in the destruction of CameraWorker, but CameraWorker's
> > life time is the same as CameraDevice and thus CameraManager,
> > which is never destroyed. On the other hand,
> > Camera3RequestDescriptors are destroyed in CameraDevice::stop().
> >
> > This causes CameraWorker::Worker::queueRequest() starts to be
> > executed after CameraWorker::start(), which is after
>
> I'm not sure I'm following right the sequence of Worker::queueRequest(),
> CameraWorker::start() and CameraDevice::stop()
>
> Should this be CameraWorker::stop() ?
>
> Did you ever triggered this failure ?
>
>
I have encountered this issue when I tested my flush().
The HAL API flow is CameraDevice::close() ->
(CameraDevice::configureStreams() ->) CameraDevice::processCaptureRequest().
The internal flow is
* in CameraDevice::close()
   - CameraWorker::stop()
     - Thread::exit()
     - Thread::wait()
   - Camera::stop()
* in CameraDevice::processCaptureResult()
  - CameraWorker::start()

As I said, after CameraWorker::stop(), pending requests remain in the queue.
CameraWorker::start() starts with the remained requests and because the
object of the requests are destroyed by clearing descriptors_, the SIGSEGV
happens.


> > CameraDevice::stop(). This obviously causes a segmentation fault,
> > because CameraWorker::Worker::queueRequest() touches
> > CaptureRequest which is owned by Camera3RequestDescriptor.
> >
> > This fixes the issue by cancel pending message in thread queue in
> > CameraWorker::stop() with Thread::dispatchMessages().
>
> For my limited understanding of the Object and Thread classes, I think
> it's ok to dispatch all pending method calls.
>
> However what concerns me is also Request that are pending in
> void CameraWorker::Worker::processRequest() when the Worker gets
> stopped. Should we wait for the queue of pending requests to have
> finised waiting their fences (or instrument an interruption mechanism)
> before returning from stop() ?
>
>
If I understand correctly, Thread::dispatchMessages() executes
CameraWorker::Worker::processRequest().
Thread::dispatchMessages() calls Object::message(), and it executes
InvokeMessage::invoke().
I am honestly unsure.
Laurent should know as Laurent is the original author
of Thread::dispatchMessages().

Thanks,
-Hiro

> >
> > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > ---
> >  src/android/camera_worker.cpp | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/src/android/camera_worker.cpp
> b/src/android/camera_worker.cpp
> > index 300ddde0..b68ce9d5 100644
> > --- a/src/android/camera_worker.cpp
> > +++ b/src/android/camera_worker.cpp
> > @@ -62,6 +62,7 @@ void CameraWorker::start()
> >
> >  void CameraWorker::stop()
> >  {
> > +     thread_.dispatchMessages(Message::Type::InvokeMessage);
> >       thread_.exit();
> >       thread_.wait();
> >  }
> > --
> > 2.31.1.498.g6c1eba8ee3d-goog
> >
>
Jacopo Mondi May 21, 2021, 7:25 a.m. UTC | #5
Hi Hiro

On Fri, May 21, 2021 at 01:44:44PM +0900, Hirokazu Honda wrote:
> Hi Jacopo, thank you for reviewing.

CC Laurent for his opinion

Thanks
  j

>
> On Thu, May 20, 2021 at 8:12 PM Jacopo Mondi <jacopo@jmondi.org> wrote:
>
> > On Fri, Apr 23, 2021 at 01:04:27PM +0900, Hirokazu Honda wrote:
> > > CameraWorker::stop() executes Thread::exit() and Thread::wait().
> > > However, intuitively, the two functions doesn't clean up pending
> >
>
> s/intuitively/surprisingly/
>
> > messages in the message queue of the thread. They are rather
> > > cleaned in the destruction of CameraWorker, but CameraWorker's
> > > life time is the same as CameraDevice and thus CameraManager,
> > > which is never destroyed. On the other hand,
> > > Camera3RequestDescriptors are destroyed in CameraDevice::stop().
> > >
> > > This causes CameraWorker::Worker::queueRequest() starts to be
> > > executed after CameraWorker::start(), which is after
> >
> > I'm not sure I'm following right the sequence of Worker::queueRequest(),
> > CameraWorker::start() and CameraDevice::stop()
> >
> > Should this be CameraWorker::stop() ?
> >
> > Did you ever triggered this failure ?
> >
> >
> I have encountered this issue when I tested my flush().
> The HAL API flow is CameraDevice::close() ->
> (CameraDevice::configureStreams() ->) CameraDevice::processCaptureRequest().
> The internal flow is
> * in CameraDevice::close()
>    - CameraWorker::stop()
>      - Thread::exit()
>      - Thread::wait()
>    - Camera::stop()
> * in CameraDevice::processCaptureResult()
>   - CameraWorker::start()
>
> As I said, after CameraWorker::stop(), pending requests remain in the queue.
> CameraWorker::start() starts with the remained requests and because the
> object of the requests are destroyed by clearing descriptors_, the SIGSEGV
> happens.
>
>
> > > CameraDevice::stop(). This obviously causes a segmentation fault,
> > > because CameraWorker::Worker::queueRequest() touches
> > > CaptureRequest which is owned by Camera3RequestDescriptor.
> > >
> > > This fixes the issue by cancel pending message in thread queue in
> > > CameraWorker::stop() with Thread::dispatchMessages().
> >
> > For my limited understanding of the Object and Thread classes, I think
> > it's ok to dispatch all pending method calls.
> >
> > However what concerns me is also Request that are pending in
> > void CameraWorker::Worker::processRequest() when the Worker gets
> > stopped. Should we wait for the queue of pending requests to have
> > finised waiting their fences (or instrument an interruption mechanism)
> > before returning from stop() ?
> >
> >
> If I understand correctly, Thread::dispatchMessages() executes
> CameraWorker::Worker::processRequest().
> Thread::dispatchMessages() calls Object::message(), and it executes
> InvokeMessage::invoke().
> I am honestly unsure.
> Laurent should know as Laurent is the original author
> of Thread::dispatchMessages().
>
> Thanks,
> -Hiro
>
> > >
> > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > ---
> > >  src/android/camera_worker.cpp | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/src/android/camera_worker.cpp
> > b/src/android/camera_worker.cpp
> > > index 300ddde0..b68ce9d5 100644
> > > --- a/src/android/camera_worker.cpp
> > > +++ b/src/android/camera_worker.cpp
> > > @@ -62,6 +62,7 @@ void CameraWorker::start()
> > >
> > >  void CameraWorker::stop()
> > >  {
> > > +     thread_.dispatchMessages(Message::Type::InvokeMessage);
> > >       thread_.exit();
> > >       thread_.wait();
> > >  }
> > > --
> > > 2.31.1.498.g6c1eba8ee3d-goog
> > >
> >
Laurent Pinchart May 21, 2021, 1:30 p.m. UTC | #6
Hello,

On Fri, May 21, 2021 at 01:44:44PM +0900, Hirokazu Honda wrote:
> On Thu, May 20, 2021 at 8:12 PM Jacopo Mondi wrote:
> > On Fri, Apr 23, 2021 at 01:04:27PM +0900, Hirokazu Honda wrote:
> > > CameraWorker::stop() executes Thread::exit() and Thread::wait().
> > > However, intuitively, the two functions doesn't clean up pending
> 
> s/intuitively/surprisingly/
> 
> > messages in the message queue of the thread. They are rather
> > > cleaned in the destruction of CameraWorker, but CameraWorker's
> > > life time is the same as CameraDevice and thus CameraManager,
> > > which is never destroyed. On the other hand,
> > > Camera3RequestDescriptors are destroyed in CameraDevice::stop().
> > >
> > > This causes CameraWorker::Worker::queueRequest() starts to be
> > > executed after CameraWorker::start(), which is after
> >
> > I'm not sure I'm following right the sequence of Worker::queueRequest(),
> > CameraWorker::start() and CameraDevice::stop()
> >
> > Should this be CameraWorker::stop() ?
> >
> > Did you ever triggered this failure ?
>
> I have encountered this issue when I tested my flush().
> The HAL API flow is CameraDevice::close() ->
> (CameraDevice::configureStreams() ->) CameraDevice::processCaptureRequest().
> The internal flow is
> * in CameraDevice::close()
>    - CameraWorker::stop()
>      - Thread::exit()
>      - Thread::wait()
>    - Camera::stop()
> * in CameraDevice::processCaptureResult()
>   - CameraWorker::start()
> 
> As I said, after CameraWorker::stop(), pending requests remain in the queue.
> CameraWorker::start() starts with the remained requests and because the
> object of the requests are destroyed by clearing descriptors_, the SIGSEGV
> happens.

You're right. We can have messages in the queue when the thread is
stopped, and they're not removed. I won't say it's by design :-), but
thinking about it, forcing automatic dispatching of all messages when
the thread is stopped would prevent fast stop of threads, so I think it
shouldn't be done by default (or at least not without a way to prevent
it).

> > > CameraDevice::stop(). This obviously causes a segmentation fault,
> > > because CameraWorker::Worker::queueRequest() touches
> > > CaptureRequest which is owned by Camera3RequestDescriptor.
> > >
> > > This fixes the issue by cancel pending message in thread queue in
> > > CameraWorker::stop() with Thread::dispatchMessages().
> >
> > For my limited understanding of the Object and Thread classes, I think
> > it's ok to dispatch all pending method calls.
> >
> > However what concerns me is also Request that are pending in
> > void CameraWorker::Worker::processRequest() when the Worker gets
> > stopped. Should we wait for the queue of pending requests to have
> > finised waiting their fences (or instrument an interruption mechanism)
> > before returning from stop() ?
>
> If I understand correctly, Thread::dispatchMessages() executes
> CameraWorker::Worker::processRequest().
> Thread::dispatchMessages() calls Object::message(), and it executes
> InvokeMessage::invoke().
> I am honestly unsure.
> Laurent should know as Laurent is the original author
> of Thread::dispatchMessages().
>
> > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > ---
> > >  src/android/camera_worker.cpp | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
> > > index 300ddde0..b68ce9d5 100644
> > > --- a/src/android/camera_worker.cpp
> > > +++ b/src/android/camera_worker.cpp
> > > @@ -62,6 +62,7 @@ void CameraWorker::start()
> > >
> > >  void CameraWorker::stop()
> > >  {
> > > +     thread_.dispatchMessages(Message::Type::InvokeMessage);

Thread::dispatchMessages() is meant to be called from within the thread
only. This isn't documented, I'll send a patch for that.

One option here to call dispatchMessage() from the thread would be the
following (untested):

diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
index 300ddde03645..9f727826e23f 100644
--- a/src/android/camera_worker.cpp
+++ b/src/android/camera_worker.cpp
@@ -52,18 +52,24 @@ void CaptureRequest::queue()
  */
 CameraWorker::CameraWorker()
 {
-	worker_.moveToThread(&thread_);
+	worker_.moveToThread(this);
 }

 void CameraWorker::start()
 {
-	thread_.start();
+	Thread::start();
 }

 void CameraWorker::stop()
 {
-	thread_.exit();
-	thread_.wait();
+	exit();
+	wait();
+}
+
+void CameraWorker::run()
+{
+	exec();
+	dispatchMessages(Message::Type::InvokeMessage);
 }

 void CameraWorker::queueRequest(CaptureRequest *request)
diff --git a/src/android/camera_worker.h b/src/android/camera_worker.h
index 64b1658b61b4..e289ef9b8655 100644
--- a/src/android/camera_worker.h
+++ b/src/android/camera_worker.h
@@ -42,7 +42,7 @@ private:
 	std::unique_ptr<libcamera::Request> request_;
 };

-class CameraWorker
+class CameraWorker : private libcamera::Thread
 {
 public:
 	CameraWorker();
@@ -52,6 +52,9 @@ public:

 	void queueRequest(CaptureRequest *request);

+protected:
+	void run() override;
+
 private:
 	class Worker : public libcamera::Object
 	{
@@ -63,7 +66,6 @@ private:
 	};

 	Worker worker_;
-	libcamera::Thread thread_;
 };

 #endif /* __ANDROID_CAMERA_WORKER_H__ */

Could you test this to see if it fixes your issue ?

I'm also considering deleting all posted messages when the thread is
stopped, as I really can't see a use case for keeping messages to be
processed when the thread is restarted. Does anyone have an opinion ?
I'll likely post a patch as an RFC.

> > >       thread_.exit();
> > >       thread_.wait();
> > >  }

Patch
diff mbox series

diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
index 300ddde0..b68ce9d5 100644
--- a/src/android/camera_worker.cpp
+++ b/src/android/camera_worker.cpp
@@ -62,6 +62,7 @@  void CameraWorker::start()
 
 void CameraWorker::stop()
 {
+	thread_.dispatchMessages(Message::Type::InvokeMessage);
 	thread_.exit();
 	thread_.wait();
 }