[libcamera-devel,v3] android: camera_device: Fix crash in calling CameraDevice::close()
diff mbox series

Message ID 20210831183739.901729-1-hiroh@chromium.org
State Superseded
Headers show
Series
  • [libcamera-devel,v3] android: camera_device: Fix crash in calling CameraDevice::close()
Related show

Commit Message

Hirokazu Honda Aug. 31, 2021, 6:37 p.m. UTC
The problem is happening because we seem to add a CameraStream
associated buffer(depending on the CameraStream::Type) to the Request,
in CameraDevice::processCaptureRequest().

However, when the camera stops, all the current buffers are marked with
FrameMetadata::FrameCancelled and proceed to completion. But the buffer
associated with the CameraStream (that was previously added to the
request) has now been cleared out with a part of streams_.clear(), even
before the camera stop() has been invoked. Any access to those request
buffers after they have been cleared, shall result in a crash.

Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
---
 src/android/camera_device.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Laurent Pinchart Aug. 31, 2021, 9:04 p.m. UTC | #1
Hi Hiro,

Thank you for the patch.

On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> The problem is happening because we seem to add a CameraStream
> associated buffer(depending on the CameraStream::Type) to the Request,

s/(/ (/

> in CameraDevice::processCaptureRequest().
> 
> However, when the camera stops, all the current buffers are marked with
> FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> associated with the CameraStream (that was previously added to the
> request) has now been cleared out with a part of streams_.clear(), even
> before the camera stop() has been invoked. Any access to those request
> buffers after they have been cleared, shall result in a crash.

s/shall/will/

> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>

I can fix the above when pushing.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> ---
>  src/android/camera_device.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 8ca76719..fda77db4 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>  
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> -
>  	stop();
>  
>  	camera_->release();
> @@ -457,6 +455,8 @@ void CameraDevice::stop()
>  	camera_->stop();
>  
>  	descriptors_.clear();
> +	streams_.clear();
> +
>  	state_ = State::Stopped;
>  }
>
Umang Jain Sept. 1, 2021, 5:49 a.m. UTC | #2
Hello,

On 9/1/21 2:34 AM, Laurent Pinchart wrote:
> Hi Hiro,
>
> Thank you for the patch.
>
> On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
>> The problem is happening because we seem to add a CameraStream
>> associated buffer(depending on the CameraStream::Type) to the Request,
> s/(/ (/
>
>> in CameraDevice::processCaptureRequest().
>>
>> However, when the camera stops, all the current buffers are marked with
>> FrameMetadata::FrameCancelled and proceed to completion. But the buffer
>> associated with the CameraStream (that was previously added to the
>> request) has now been cleared out with a part of streams_.clear(), even
>> before the camera stop() has been invoked. Any access to those request
>> buffers after they have been cleared, shall result in a crash.
> s/shall/will/
>
>> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> I can fix the above when pushing.
>
> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>


Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>

>
>> ---
>>   src/android/camera_device.cpp | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>> index 8ca76719..fda77db4 100644
>> --- a/src/android/camera_device.cpp
>> +++ b/src/android/camera_device.cpp
>> @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>>   
>>   void CameraDevice::close()
>>   {
>> -	streams_.clear();
>> -
>>   	stop();
>>   
>>   	camera_->release();
>> @@ -457,6 +455,8 @@ void CameraDevice::stop()
>>   	camera_->stop();
>>   
>>   	descriptors_.clear();
>> +	streams_.clear();
>> +
>>   	state_ = State::Stopped;
>>   }
>>
Jacopo Mondi Sept. 1, 2021, 3:47 p.m. UTC | #3
Hi Hiro,

On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> The problem is happening because we seem to add a CameraStream
> associated buffer(depending on the CameraStream::Type) to the Request,
> in CameraDevice::processCaptureRequest().
>
> However, when the camera stops, all the current buffers are marked with
> FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> associated with the CameraStream (that was previously added to the
> request) has now been cleared out with a part of streams_.clear(), even
> before the camera stop() has been invoked. Any access to those request
> buffers after they have been cleared, shall result in a crash.

I recall it was painful at the time when we had to deal with flush()
to get the right ordering, and I had done the same. However, since
stop() is only called by configureStreams() I removed the
streams_.clear() there and had issues, so I gave up.

But I have this patch gone through cros_camera_test and also CTS seems
fine. Also looking at the code it seems an harmless change so I wonder
what was wrong at the time.

As far as I can tell:

Tested-by: Jacopo Mondi <jacopo@jmondi.org>
Acked-by: Jacopo Mondi <jacopo@jmondi.org>

>
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> ---
>  src/android/camera_device.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 8ca76719..fda77db4 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> -
>  	stop();
>
>  	camera_->release();
> @@ -457,6 +455,8 @@ void CameraDevice::stop()
>  	camera_->stop();
>
>  	descriptors_.clear();
> +	streams_.clear();
> +
>  	state_ = State::Stopped;
>  }
>
> --
> 2.33.0.259.gc128427fd7-goog
>
Laurent Pinchart Sept. 2, 2021, 3:11 a.m. UTC | #4
Hi Jacopo,

On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > The problem is happening because we seem to add a CameraStream
> > associated buffer(depending on the CameraStream::Type) to the Request,
> > in CameraDevice::processCaptureRequest().
> >
> > However, when the camera stops, all the current buffers are marked with
> > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > associated with the CameraStream (that was previously added to the
> > request) has now been cleared out with a part of streams_.clear(), even
> > before the camera stop() has been invoked. Any access to those request
> > buffers after they have been cleared, shall result in a crash.
> 
> I recall it was painful at the time when we had to deal with flush()
> to get the right ordering, and I had done the same. However, since
> stop() is only called by configureStreams() I removed the
> streams_.clear() there and had issues, so I gave up.

Do you mean you have tried removing streams_.clear() from
configureStreams() while testing this patch and ran into issues, or that
you tried that when implementing support for flush() ?

> But I have this patch gone through cros_camera_test and also CTS seems
> fine. Also looking at the code it seems an harmless change so I wonder
> what was wrong at the time.
> 
> As far as I can tell:
> 
> Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> 
> > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > ---
> >  src/android/camera_device.cpp | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index 8ca76719..fda77db4 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> >  void CameraDevice::close()
> >  {
> > -	streams_.clear();
> > -
> >  	stop();
> >
> >  	camera_->release();
> > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> >  	camera_->stop();
> >
> >  	descriptors_.clear();
> > +	streams_.clear();
> > +
> >  	state_ = State::Stopped;
> >  }
> >
Jacopo Mondi Sept. 2, 2021, 6:59 a.m. UTC | #5
Hi Laurent,

On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> Hi Jacopo,
>
> On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > The problem is happening because we seem to add a CameraStream
> > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > in CameraDevice::processCaptureRequest().
> > >
> > > However, when the camera stops, all the current buffers are marked with
> > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > associated with the CameraStream (that was previously added to the
> > > request) has now been cleared out with a part of streams_.clear(), even
> > > before the camera stop() has been invoked. Any access to those request
> > > buffers after they have been cleared, shall result in a crash.
> >
> > I recall it was painful at the time when we had to deal with flush()
> > to get the right ordering, and I had done the same. However, since
> > stop() is only called by configureStreams() I removed the
> > streams_.clear() there and had issues, so I gave up.
>
> Do you mean you have tried removing streams_.clear() from
> configureStreams() while testing this patch and ran into issues, or that
> you tried that when implementing support for flush() ?
>

I meant when developing flush().
I haven't tried to do the same on top of this patch, but it might be
good to do so ?

Thanks
   j

> > But I have this patch gone through cros_camera_test and also CTS seems
> > fine. Also looking at the code it seems an harmless change so I wonder
> > what was wrong at the time.
> >
> > As far as I can tell:
> >
> > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> >
> > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > ---
> > >  src/android/camera_device.cpp | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index 8ca76719..fda77db4 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > >
> > >  void CameraDevice::close()
> > >  {
> > > -	streams_.clear();
> > > -
> > >  	stop();
> > >
> > >  	camera_->release();
> > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > >  	camera_->stop();
> > >
> > >  	descriptors_.clear();
> > > +	streams_.clear();
> > > +
> > >  	state_ = State::Stopped;
> > >  }
> > >
>
> --
> Regards,
>
> Laurent Pinchart
Laurent Pinchart Sept. 2, 2021, 7:40 a.m. UTC | #6
Hi Jacopo,

On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > The problem is happening because we seem to add a CameraStream
> > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > in CameraDevice::processCaptureRequest().
> > > >
> > > > However, when the camera stops, all the current buffers are marked with
> > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > associated with the CameraStream (that was previously added to the
> > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > before the camera stop() has been invoked. Any access to those request
> > > > buffers after they have been cleared, shall result in a crash.
> > >
> > > I recall it was painful at the time when we had to deal with flush()
> > > to get the right ordering, and I had done the same. However, since
> > > stop() is only called by configureStreams() I removed the
> > > streams_.clear() there and had issues, so I gave up.
> >
> > Do you mean you have tried removing streams_.clear() from
> > configureStreams() while testing this patch and ran into issues, or that
> > you tried that when implementing support for flush() ?
> 
> I meant when developing flush().
> I haven't tried to do the same on top of this patch, but it might be
> good to do so ?

I think so, because if it works, it's a good cleanup (possibly as part
of this patch actually, not on top), and if it doesn't, there's
something fishy :-) Would you be able to test it ?

> > > But I have this patch gone through cros_camera_test and also CTS seems
> > > fine. Also looking at the code it seems an harmless change so I wonder
> > > what was wrong at the time.
> > >
> > > As far as I can tell:
> > >
> > > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> > >
> > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > ---
> > > >  src/android/camera_device.cpp | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > index 8ca76719..fda77db4 100644
> > > > --- a/src/android/camera_device.cpp
> > > > +++ b/src/android/camera_device.cpp
> > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > >
> > > >  void CameraDevice::close()
> > > >  {
> > > > -	streams_.clear();
> > > > -
> > > >  	stop();
> > > >
> > > >  	camera_->release();
> > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > >  	camera_->stop();
> > > >
> > > >  	descriptors_.clear();
> > > > +	streams_.clear();
> > > > +
> > > >  	state_ = State::Stopped;
> > > >  }
> > > >
Jacopo Mondi Sept. 2, 2021, 8:25 a.m. UTC | #7
On Thu, Sep 02, 2021 at 10:40:39AM +0300, Laurent Pinchart wrote:
> Hi Jacopo,
>
> On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> > On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > > The problem is happening because we seem to add a CameraStream
> > > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > > in CameraDevice::processCaptureRequest().
> > > > >
> > > > > However, when the camera stops, all the current buffers are marked with
> > > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > > associated with the CameraStream (that was previously added to the
> > > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > > before the camera stop() has been invoked. Any access to those request
> > > > > buffers after they have been cleared, shall result in a crash.
> > > >
> > > > I recall it was painful at the time when we had to deal with flush()
> > > > to get the right ordering, and I had done the same. However, since
> > > > stop() is only called by configureStreams() I removed the
> > > > streams_.clear() there and had issues, so I gave up.
> > >
> > > Do you mean you have tried removing streams_.clear() from
> > > configureStreams() while testing this patch and ran into issues, or that
> > > you tried that when implementing support for flush() ?
> >
> > I meant when developing flush().
> > I haven't tried to do the same on top of this patch, but it might be
> > good to do so ?
>
> I think so, because if it works, it's a good cleanup (possibly as part
> of this patch actually, not on top), and if it doesn't, there's
> something fishy :-) Would you be able to test it ?
>

With this applied

--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -544,6 +544,7 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
                LOG(HAL, Error) << "No streams in configuration";
                return -EINVAL;
        }
+       streams_.reserve(stream_list->num_streams);

 #if defined(OS_CHROMEOS)
        if (!validateCropRotate(*stream_list))
@@ -560,14 +561,6 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
                return -EINVAL;
        }

-       /*
-        * Clear and remove any existing configuration from previous calls, and
-        * ensure the required entries are available without further
-        * reallocation.
-        */
-       streams_.clear();
-       streams_.reserve(stream_list->num_streams);
-
        std::vector<Camera3StreamConfig> streamConfigs;
        streamConfigs.reserve(stream_list->num_streams);

 I can easily get a segfault running CTS.
 I analyzed the coredump and the stack trace doesn't help much

(soraka-libcamera-gdb) bt
#0  0x000000043983e000 in ?? ()
#1  0x00007eee396ee5b5 in base::OnceCallback<void ()>::Run() && (this=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback.h:102
#2  base::TaskAnnotator::RunTask (this=<optimized out>, trace_event_name=<optimized out>, pending_task=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/common/task_annotator.cc:163
#3  0x00007eee396ff465 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl (this=this@entry=0x7eee2c035ed0, continuation_lazy_now=continuation_lazy_now@entry=0x7eee34bd0910)
    at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:332
#4  0x00007eee396ff27c in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork (this=0x7eee2c035ed0)
    at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:252
#5  0x00007eee396a322d in base::MessagePumpDefault::Run (this=0x7eee1c00cc70, delegate=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/message_loop/message_pump_default.cc:39
#6  0x00007eee396ff730 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run (this=<optimized out>, application_tasks_allowed=<optimized out>, timeout=...)
    at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:446
#7  0x00007eee396d0325 in base::RunLoop::Run (this=0x7eee34bd0aa0) at ../../../libchrome-0.0.1/platform2/libchrome/base/run_loop.cc:124
#8  0x00007eee3971d04e in base::Thread::ThreadMain (this=0x7eee2c034090) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:382
#9  0x00007eee39718f3d in base::(anonymous namespace)::ThreadFunc (params=0x7eee2c033c50) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
#10 0x00007eee395adfbf in start_thread (arg=0x7eee34bd1640) at pthread_create.c:463
#11 0x00007eee393f735f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(soraka-libcamera-gdb) quit

The last ?? in #0 makes me think I might not have symbols for the
camera service, which I cannot easily recompile due to the cros-debug
USE flag issue. I need to rebuild all the packages and I've now kicked
it.

One day I'll learn to shut up and ack patches without asking for
reasons to fall into rabbit holes like this one.


> > > > But I have this patch gone through cros_camera_test and also CTS seems
> > > > fine. Also looking at the code it seems an harmless change so I wonder
> > > > what was wrong at the time.
> > > >
> > > > As far as I can tell:
> > > >
> > > > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> > > >
> > > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > > ---
> > > > >  src/android/camera_device.cpp | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > index 8ca76719..fda77db4 100644
> > > > > --- a/src/android/camera_device.cpp
> > > > > +++ b/src/android/camera_device.cpp
> > > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > >
> > > > >  void CameraDevice::close()
> > > > >  {
> > > > > -	streams_.clear();
> > > > > -
> > > > >  	stop();
> > > > >
> > > > >  	camera_->release();
> > > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > > >  	camera_->stop();
> > > > >
> > > > >  	descriptors_.clear();
> > > > > +	streams_.clear();
> > > > > +
> > > > >  	state_ = State::Stopped;
> > > > >  }
> > > > >
>
> --
> Regards,
>
> Laurent Pinchart
Laurent Pinchart Sept. 2, 2021, 8:51 a.m. UTC | #8
Hi Jacopo,

On Thu, Sep 02, 2021 at 10:25:59AM +0200, Jacopo Mondi wrote:
> On Thu, Sep 02, 2021 at 10:40:39AM +0300, Laurent Pinchart wrote:
> > On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> > > On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > > > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > > > The problem is happening because we seem to add a CameraStream
> > > > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > > > in CameraDevice::processCaptureRequest().
> > > > > >
> > > > > > However, when the camera stops, all the current buffers are marked with
> > > > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > > > associated with the CameraStream (that was previously added to the
> > > > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > > > before the camera stop() has been invoked. Any access to those request
> > > > > > buffers after they have been cleared, shall result in a crash.
> > > > >
> > > > > I recall it was painful at the time when we had to deal with flush()
> > > > > to get the right ordering, and I had done the same. However, since
> > > > > stop() is only called by configureStreams() I removed the
> > > > > streams_.clear() there and had issues, so I gave up.
> > > >
> > > > Do you mean you have tried removing streams_.clear() from
> > > > configureStreams() while testing this patch and ran into issues, or that
> > > > you tried that when implementing support for flush() ?
> > >
> > > I meant when developing flush().
> > > I haven't tried to do the same on top of this patch, but it might be
> > > good to do so ?
> >
> > I think so, because if it works, it's a good cleanup (possibly as part
> > of this patch actually, not on top), and if it doesn't, there's
> > something fishy :-) Would you be able to test it ?
> 
> With this applied
> 
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -544,6 +544,7 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>                 LOG(HAL, Error) << "No streams in configuration";
>                 return -EINVAL;
>         }
> +       streams_.reserve(stream_list->num_streams);
> 
>  #if defined(OS_CHROMEOS)
>         if (!validateCropRotate(*stream_list))
> @@ -560,14 +561,6 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>                 return -EINVAL;
>         }
> 
> -       /*
> -        * Clear and remove any existing configuration from previous calls, and
> -        * ensure the required entries are available without further
> -        * reallocation.
> -        */
> -       streams_.clear();
> -       streams_.reserve(stream_list->num_streams);
> -
>         std::vector<Camera3StreamConfig> streamConfigs;
>         streamConfigs.reserve(stream_list->num_streams);
> 
>  I can easily get a segfault running CTS.
>  I analyzed the coredump and the stack trace doesn't help much
> 
> (soraka-libcamera-gdb) bt
> #0  0x000000043983e000 in ?? ()
> #1  0x00007eee396ee5b5 in base::OnceCallback<void ()>::Run() && (this=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback.h:102
> #2  base::TaskAnnotator::RunTask (this=<optimized out>, trace_event_name=<optimized out>, pending_task=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/common/task_annotator.cc:163
> #3  0x00007eee396ff465 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl (this=this@entry=0x7eee2c035ed0, continuation_lazy_now=continuation_lazy_now@entry=0x7eee34bd0910)
>     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:332
> #4  0x00007eee396ff27c in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork (this=0x7eee2c035ed0)
>     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:252
> #5  0x00007eee396a322d in base::MessagePumpDefault::Run (this=0x7eee1c00cc70, delegate=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/message_loop/message_pump_default.cc:39
> #6  0x00007eee396ff730 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run (this=<optimized out>, application_tasks_allowed=<optimized out>, timeout=...)
>     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:446
> #7  0x00007eee396d0325 in base::RunLoop::Run (this=0x7eee34bd0aa0) at ../../../libchrome-0.0.1/platform2/libchrome/base/run_loop.cc:124
> #8  0x00007eee3971d04e in base::Thread::ThreadMain (this=0x7eee2c034090) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:382
> #9  0x00007eee39718f3d in base::(anonymous namespace)::ThreadFunc (params=0x7eee2c033c50) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> #10 0x00007eee395adfbf in start_thread (arg=0x7eee34bd1640) at pthread_create.c:463
> #11 0x00007eee393f735f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (soraka-libcamera-gdb) quit
> 
> The last ?? in #0 makes me think I might not have symbols for the
> camera service, which I cannot easily recompile due to the cros-debug
> USE flag issue. I need to rebuild all the packages and I've now kicked
> it.
> 
> One day I'll learn to shut up and ack patches without asking for
> reasons to fall into rabbit holes like this one.

Thanks for the quick test.

Hiro, is this something you can look at ?

> > > > > But I have this patch gone through cros_camera_test and also CTS seems
> > > > > fine. Also looking at the code it seems an harmless change so I wonder
> > > > > what was wrong at the time.
> > > > >
> > > > > As far as I can tell:
> > > > >
> > > > > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > >
> > > > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > > > ---
> > > > > >  src/android/camera_device.cpp | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > index 8ca76719..fda77db4 100644
> > > > > > --- a/src/android/camera_device.cpp
> > > > > > +++ b/src/android/camera_device.cpp
> > > > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > >
> > > > > >  void CameraDevice::close()
> > > > > >  {
> > > > > > -	streams_.clear();
> > > > > > -
> > > > > >  	stop();
> > > > > >
> > > > > >  	camera_->release();
> > > > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > > > >  	camera_->stop();
> > > > > >
> > > > > >  	descriptors_.clear();
> > > > > > +	streams_.clear();
> > > > > > +
> > > > > >  	state_ = State::Stopped;
> > > > > >  }
> > > > > >
Jacopo Mondi Sept. 2, 2021, 9:56 a.m. UTC | #9
Hello,

On Thu, Sep 02, 2021 at 11:51:02AM +0300, Laurent Pinchart wrote:
> Hi Jacopo,
>
> On Thu, Sep 02, 2021 at 10:25:59AM +0200, Jacopo Mondi wrote:
> > On Thu, Sep 02, 2021 at 10:40:39AM +0300, Laurent Pinchart wrote:
> > > On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> > > > On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > > > > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > > > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > > > > The problem is happening because we seem to add a CameraStream
> > > > > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > > > > in CameraDevice::processCaptureRequest().
> > > > > > >
> > > > > > > However, when the camera stops, all the current buffers are marked with
> > > > > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > > > > associated with the CameraStream (that was previously added to the
> > > > > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > > > > before the camera stop() has been invoked. Any access to those request
> > > > > > > buffers after they have been cleared, shall result in a crash.
> > > > > >
> > > > > > I recall it was painful at the time when we had to deal with flush()
> > > > > > to get the right ordering, and I had done the same. However, since
> > > > > > stop() is only called by configureStreams() I removed the
> > > > > > streams_.clear() there and had issues, so I gave up.
> > > > >
> > > > > Do you mean you have tried removing streams_.clear() from
> > > > > configureStreams() while testing this patch and ran into issues, or that
> > > > > you tried that when implementing support for flush() ?
> > > >
> > > > I meant when developing flush().
> > > > I haven't tried to do the same on top of this patch, but it might be
> > > > good to do so ?
> > >
> > > I think so, because if it works, it's a good cleanup (possibly as part
> > > of this patch actually, not on top), and if it doesn't, there's
> > > something fishy :-) Would you be able to test it ?
> >
> > With this applied
> >
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -544,6 +544,7 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >                 LOG(HAL, Error) << "No streams in configuration";
> >                 return -EINVAL;
> >         }
> > +       streams_.reserve(stream_list->num_streams);
> >
> >  #if defined(OS_CHROMEOS)
> >         if (!validateCropRotate(*stream_list))
> > @@ -560,14 +561,6 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >                 return -EINVAL;
> >         }
> >
> > -       /*
> > -        * Clear and remove any existing configuration from previous calls, and
> > -        * ensure the required entries are available without further
> > -        * reallocation.
> > -        */
> > -       streams_.clear();
> > -       streams_.reserve(stream_list->num_streams);
> > -
> >         std::vector<Camera3StreamConfig> streamConfigs;
> >         streamConfigs.reserve(stream_list->num_streams);
> >
> >  I can easily get a segfault running CTS.
> >  I analyzed the coredump and the stack trace doesn't help much
> >
> > (soraka-libcamera-gdb) bt
> > #0  0x000000043983e000 in ?? ()
> > #1  0x00007eee396ee5b5 in base::OnceCallback<void ()>::Run() && (this=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback.h:102
> > #2  base::TaskAnnotator::RunTask (this=<optimized out>, trace_event_name=<optimized out>, pending_task=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/common/task_annotator.cc:163
> > #3  0x00007eee396ff465 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl (this=this@entry=0x7eee2c035ed0, continuation_lazy_now=continuation_lazy_now@entry=0x7eee34bd0910)
> >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:332
> > #4  0x00007eee396ff27c in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork (this=0x7eee2c035ed0)
> >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:252
> > #5  0x00007eee396a322d in base::MessagePumpDefault::Run (this=0x7eee1c00cc70, delegate=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/message_loop/message_pump_default.cc:39
> > #6  0x00007eee396ff730 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run (this=<optimized out>, application_tasks_allowed=<optimized out>, timeout=...)
> >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:446
> > #7  0x00007eee396d0325 in base::RunLoop::Run (this=0x7eee34bd0aa0) at ../../../libchrome-0.0.1/platform2/libchrome/base/run_loop.cc:124
> > #8  0x00007eee3971d04e in base::Thread::ThreadMain (this=0x7eee2c034090) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:382
> > #9  0x00007eee39718f3d in base::(anonymous namespace)::ThreadFunc (params=0x7eee2c033c50) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> > #10 0x00007eee395adfbf in start_thread (arg=0x7eee34bd1640) at pthread_create.c:463
> > #11 0x00007eee393f735f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > (soraka-libcamera-gdb) quit
> >
> > The last ?? in #0 makes me think I might not have symbols for the
> > camera service, which I cannot easily recompile due to the cros-debug
> > USE flag issue. I need to rebuild all the packages and I've now kicked
> > it.
> >
> > One day I'll learn to shut up and ack patches without asking for
> > reasons to fall into rabbit holes like this one.
>
> Thanks for the quick test.

I can confirm removing streams_.clear() from configureStreams() easily
triggers a segfault when running CTS. That's indeed the same issue I
was facing when developing flush().

I really cannot figure out why it happens looking at the code. I got
a few more stack traces with debuga symbols in the camera service from the
coredumps but they do not seem that helpful at a first look.

[Current thread is 1 (Thread 0x7c372ffff640 (LWP 8265))]
(soraka-libcamera-gdb) bt
#0  std::__1::__cxx_atomic_fetch_sub<int> (__a=0x400000000, __delta=1, __order=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1072
#1  std::__1::__atomic_base<int, true>::fetch_sub (this=0x400000000, __op=1, __m=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1719
#2  base::AtomicRefCount::Decrement (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/atomic_ref_count.h:39
#3  base::subtle::RefCountedThreadSafeBase::ReleaseImpl (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:218
#4  base::subtle::RefCountedThreadSafeBase::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:170
#5  base::RefCountedThreadSafe<base::internal::BindStateBase, base::internal::BindStateBaseRefCountTraits>::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:398
#6  scoped_refptr<base::internal::BindStateBase>::Release (ptr=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:322
#7  scoped_refptr<base::internal::BindStateBase>::~scoped_refptr (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:224
#8  base::internal::CallbackBase::~CallbackBase (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback_internal.cc:85
#9  0x00007c37561eb82b in base::sequence_manager::internal::AtomicFlagSet::Group::~Group (this=0x7c373000ef80) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:132
#10 std::__1::default_delete<base::sequence_manager::internal::AtomicFlagSet::Group>::operator() (this=<optimized out>, __ptr=0x7c373000ef80) at /usr/bin/../include/c++/v1/memory:1335
#11 0x00007c37561eaede in base::sequence_manager::internal::AtomicFlagSet::RemoveFromAllocList (group=0x7c373000ef80, this=<optimized out>) at /usr/bin/../include/c++/v1/memory:1595
#12 base::sequence_manager::internal::AtomicFlagSet::AtomicFlag::ReleaseAtomicFlag (this=0x7c3730013b90) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:77
#13 0x00007c37561f3cec in base::sequence_manager::internal::TaskQueueImpl::UnregisterTaskQueue (this=0x7c37300139e0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/task_queue_impl.cc:201
#14 0x00007c37561ec1df in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:234
#15 0x00007c37561ec52e in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:212
#16 0x00007c3756218317 in std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl>::operator() (this=0x7c3730014848, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
#17 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::reset (this=0x7c3730014848, __p=0x0)
    at /usr/bin/../include/c++/v1/memory:1596
#18 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::~unique_ptr (this=0x7c3730014848)
    at /usr/bin/../include/c++/v1/memory:1550
#19 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
#20 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
#21 0x00007c3756218098 in std::__1::default_delete<base::Thread::Delegate>::operator() (this=0x7c3744039bc0, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
#22 std::__1::unique_ptr<base::Thread::Delegate, std::__1::default_delete<base::Thread::Delegate> >::reset (this=0x7c3744039bc0, __p=0x0) at /usr/bin/../include/c++/v1/memory:1596
#23 base::Thread::ThreadMain (this=0x7c3744039b40) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:400
#24 0x00007c3756213f3d in base::(anonymous namespace)::ThreadFunc (params=0x7c3730011840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
#25 0x00007c37560a8fbf in start_thread (arg=0x7c372ffff640) at pthread_create.c:463
#26 0x00007c3755ef235f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

>
> Hiro, is this something you can look at ?
>
> > > > > > But I have this patch gone through cros_camera_test and also CTS seems
> > > > > > fine. Also looking at the code it seems an harmless change so I wonder
> > > > > > what was wrong at the time.
> > > > > >
> > > > > > As far as I can tell:
> > > > > >
> > > > > > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > >
> > > > > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > > > > ---
> > > > > > >  src/android/camera_device.cpp | 4 ++--
> > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > > index 8ca76719..fda77db4 100644
> > > > > > > --- a/src/android/camera_device.cpp
> > > > > > > +++ b/src/android/camera_device.cpp
> > > > > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > > >
> > > > > > >  void CameraDevice::close()
> > > > > > >  {
> > > > > > > -	streams_.clear();
> > > > > > > -
> > > > > > >  	stop();
> > > > > > >
> > > > > > >  	camera_->release();
> > > > > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > > > > >  	camera_->stop();
> > > > > > >
> > > > > > >  	descriptors_.clear();
> > > > > > > +	streams_.clear();
> > > > > > > +
> > > > > > >  	state_ = State::Stopped;
> > > > > > >  }
> > > > > > >
>
> --
> Regards,
>
> Laurent Pinchart
Hirokazu Honda Sept. 3, 2021, 8:26 a.m. UTC | #10
Hi Jacopo,

On Thu, Sep 2, 2021 at 6:55 PM Jacopo Mondi <jacopo@jmondi.org> wrote:
>
> Hello,
>
> On Thu, Sep 02, 2021 at 11:51:02AM +0300, Laurent Pinchart wrote:
> > Hi Jacopo,
> >
> > On Thu, Sep 02, 2021 at 10:25:59AM +0200, Jacopo Mondi wrote:
> > > On Thu, Sep 02, 2021 at 10:40:39AM +0300, Laurent Pinchart wrote:
> > > > On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> > > > > On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > > > > > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > > > > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > > > > > The problem is happening because we seem to add a CameraStream
> > > > > > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > > > > > in CameraDevice::processCaptureRequest().
> > > > > > > >
> > > > > > > > However, when the camera stops, all the current buffers are marked with
> > > > > > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > > > > > associated with the CameraStream (that was previously added to the
> > > > > > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > > > > > before the camera stop() has been invoked. Any access to those request
> > > > > > > > buffers after they have been cleared, shall result in a crash.
> > > > > > >
> > > > > > > I recall it was painful at the time when we had to deal with flush()
> > > > > > > to get the right ordering, and I had done the same. However, since
> > > > > > > stop() is only called by configureStreams() I removed the
> > > > > > > streams_.clear() there and had issues, so I gave up.
> > > > > >
> > > > > > Do you mean you have tried removing streams_.clear() from
> > > > > > configureStreams() while testing this patch and ran into issues, or that
> > > > > > you tried that when implementing support for flush() ?
> > > > >
> > > > > I meant when developing flush().
> > > > > I haven't tried to do the same on top of this patch, but it might be
> > > > > good to do so ?
> > > >
> > > > I think so, because if it works, it's a good cleanup (possibly as part
> > > > of this patch actually, not on top), and if it doesn't, there's
> > > > something fishy :-) Would you be able to test it ?
> > >
> > > With this applied
> > >
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -544,6 +544,7 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >                 LOG(HAL, Error) << "No streams in configuration";
> > >                 return -EINVAL;
> > >         }
> > > +       streams_.reserve(stream_list->num_streams);
> > >
> > >  #if defined(OS_CHROMEOS)
> > >         if (!validateCropRotate(*stream_list))
> > > @@ -560,14 +561,6 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >                 return -EINVAL;
> > >         }
> > >
> > > -       /*
> > > -        * Clear and remove any existing configuration from previous calls, and
> > > -        * ensure the required entries are available without further
> > > -        * reallocation.
> > > -        */
> > > -       streams_.clear();
> > > -       streams_.reserve(stream_list->num_streams);
> > > -
> > >         std::vector<Camera3StreamConfig> streamConfigs;
> > >         streamConfigs.reserve(stream_list->num_streams);
> > >
> > >  I can easily get a segfault running CTS.
> > >  I analyzed the coredump and the stack trace doesn't help much
> > >
> > > (soraka-libcamera-gdb) bt
> > > #0  0x000000043983e000 in ?? ()
> > > #1  0x00007eee396ee5b5 in base::OnceCallback<void ()>::Run() && (this=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback.h:102
> > > #2  base::TaskAnnotator::RunTask (this=<optimized out>, trace_event_name=<optimized out>, pending_task=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/common/task_annotator.cc:163
> > > #3  0x00007eee396ff465 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl (this=this@entry=0x7eee2c035ed0, continuation_lazy_now=continuation_lazy_now@entry=0x7eee34bd0910)
> > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:332
> > > #4  0x00007eee396ff27c in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork (this=0x7eee2c035ed0)
> > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:252
> > > #5  0x00007eee396a322d in base::MessagePumpDefault::Run (this=0x7eee1c00cc70, delegate=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/message_loop/message_pump_default.cc:39
> > > #6  0x00007eee396ff730 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run (this=<optimized out>, application_tasks_allowed=<optimized out>, timeout=...)
> > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:446
> > > #7  0x00007eee396d0325 in base::RunLoop::Run (this=0x7eee34bd0aa0) at ../../../libchrome-0.0.1/platform2/libchrome/base/run_loop.cc:124
> > > #8  0x00007eee3971d04e in base::Thread::ThreadMain (this=0x7eee2c034090) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:382
> > > #9  0x00007eee39718f3d in base::(anonymous namespace)::ThreadFunc (params=0x7eee2c033c50) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> > > #10 0x00007eee395adfbf in start_thread (arg=0x7eee34bd1640) at pthread_create.c:463
> > > #11 0x00007eee393f735f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > > (soraka-libcamera-gdb) quit
> > >
> > > The last ?? in #0 makes me think I might not have symbols for the
> > > camera service, which I cannot easily recompile due to the cros-debug
> > > USE flag issue. I need to rebuild all the packages and I've now kicked
> > > it.
> > >
> > > One day I'll learn to shut up and ack patches without asking for
> > > reasons to fall into rabbit holes like this one.
> >
> > Thanks for the quick test.
>
> I can confirm removing streams_.clear() from configureStreams() easily
> triggers a segfault when running CTS. That's indeed the same issue I
> was facing when developing flush().
>
> I really cannot figure out why it happens looking at the code. I got
> a few more stack traces with debuga symbols in the camera service from the
> coredumps but they do not seem that helpful at a first look.
>
> [Current thread is 1 (Thread 0x7c372ffff640 (LWP 8265))]
> (soraka-libcamera-gdb) bt
> #0  std::__1::__cxx_atomic_fetch_sub<int> (__a=0x400000000, __delta=1, __order=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1072
> #1  std::__1::__atomic_base<int, true>::fetch_sub (this=0x400000000, __op=1, __m=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1719
> #2  base::AtomicRefCount::Decrement (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/atomic_ref_count.h:39
> #3  base::subtle::RefCountedThreadSafeBase::ReleaseImpl (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:218
> #4  base::subtle::RefCountedThreadSafeBase::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:170
> #5  base::RefCountedThreadSafe<base::internal::BindStateBase, base::internal::BindStateBaseRefCountTraits>::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:398
> #6  scoped_refptr<base::internal::BindStateBase>::Release (ptr=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:322
> #7  scoped_refptr<base::internal::BindStateBase>::~scoped_refptr (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:224
> #8  base::internal::CallbackBase::~CallbackBase (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback_internal.cc:85
> #9  0x00007c37561eb82b in base::sequence_manager::internal::AtomicFlagSet::Group::~Group (this=0x7c373000ef80) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:132
> #10 std::__1::default_delete<base::sequence_manager::internal::AtomicFlagSet::Group>::operator() (this=<optimized out>, __ptr=0x7c373000ef80) at /usr/bin/../include/c++/v1/memory:1335
> #11 0x00007c37561eaede in base::sequence_manager::internal::AtomicFlagSet::RemoveFromAllocList (group=0x7c373000ef80, this=<optimized out>) at /usr/bin/../include/c++/v1/memory:1595
> #12 base::sequence_manager::internal::AtomicFlagSet::AtomicFlag::ReleaseAtomicFlag (this=0x7c3730013b90) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:77
> #13 0x00007c37561f3cec in base::sequence_manager::internal::TaskQueueImpl::UnregisterTaskQueue (this=0x7c37300139e0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/task_queue_impl.cc:201
> #14 0x00007c37561ec1df in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:234
> #15 0x00007c37561ec52e in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:212
> #16 0x00007c3756218317 in std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl>::operator() (this=0x7c3730014848, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
> #17 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::reset (this=0x7c3730014848, __p=0x0)
>     at /usr/bin/../include/c++/v1/memory:1596
> #18 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::~unique_ptr (this=0x7c3730014848)
>     at /usr/bin/../include/c++/v1/memory:1550
> #19 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
> #20 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
> #21 0x00007c3756218098 in std::__1::default_delete<base::Thread::Delegate>::operator() (this=0x7c3744039bc0, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
> #22 std::__1::unique_ptr<base::Thread::Delegate, std::__1::default_delete<base::Thread::Delegate> >::reset (this=0x7c3744039bc0, __p=0x0) at /usr/bin/../include/c++/v1/memory:1596
> #23 base::Thread::ThreadMain (this=0x7c3744039b40) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:400
> #24 0x00007c3756213f3d in base::(anonymous namespace)::ThreadFunc (params=0x7c3730011840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> #25 0x00007c37560a8fbf in start_thread (arg=0x7c372ffff640) at pthread_create.c:463
> #26 0x00007c3755ef235f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> >
> > Hiro, is this something you can look at ?

I guess this is because streams_ is not cleared when state_ is Stopped.
Could you try again by modifying as follows?

--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -448,8 +448,10 @@ void CameraDevice::flush()
 void CameraDevice::stop()
 {
        MutexLocker stateLock(stateMutex_);
-       if (state_ == State::Stopped)
+       if (state_ == State::Stopped) {
+               streams_.clear();
                return;
+       }

Thanks,
-Hiro
> >
> > > > > > > But I have this patch gone through cros_camera_test and also CTS seems
> > > > > > > fine. Also looking at the code it seems an harmless change so I wonder
> > > > > > > what was wrong at the time.
> > > > > > >
> > > > > > > As far as I can tell:
> > > > > > >
> > > > > > > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > >
> > > > > > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > > > > > ---
> > > > > > > >  src/android/camera_device.cpp | 4 ++--
> > > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > > > index 8ca76719..fda77db4 100644
> > > > > > > > --- a/src/android/camera_device.cpp
> > > > > > > > +++ b/src/android/camera_device.cpp
> > > > > > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > > > >
> > > > > > > >  void CameraDevice::close()
> > > > > > > >  {
> > > > > > > > - streams_.clear();
> > > > > > > > -
> > > > > > > >   stop();
> > > > > > > >
> > > > > > > >   camera_->release();
> > > > > > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > > > > > >   camera_->stop();
> > > > > > > >
> > > > > > > >   descriptors_.clear();
> > > > > > > > + streams_.clear();
> > > > > > > > +
> > > > > > > >   state_ = State::Stopped;
> > > > > > > >  }
> > > > > > > >
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
Jacopo Mondi Sept. 3, 2021, 5:06 p.m. UTC | #11
On Fri, Sep 03, 2021 at 05:26:41PM +0900, Hirokazu Honda wrote:
> Hi Jacopo,
>
> On Thu, Sep 2, 2021 at 6:55 PM Jacopo Mondi <jacopo@jmondi.org> wrote:
> >
> > Hello,
> >
> > On Thu, Sep 02, 2021 at 11:51:02AM +0300, Laurent Pinchart wrote:
> > > Hi Jacopo,
> > >
> > > On Thu, Sep 02, 2021 at 10:25:59AM +0200, Jacopo Mondi wrote:
> > > > On Thu, Sep 02, 2021 at 10:40:39AM +0300, Laurent Pinchart wrote:
> > > > > On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> > > > > > On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > > > > > > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > > > > > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > > > > > > The problem is happening because we seem to add a CameraStream
> > > > > > > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > > > > > > in CameraDevice::processCaptureRequest().
> > > > > > > > >
> > > > > > > > > However, when the camera stops, all the current buffers are marked with
> > > > > > > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > > > > > > associated with the CameraStream (that was previously added to the
> > > > > > > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > > > > > > before the camera stop() has been invoked. Any access to those request
> > > > > > > > > buffers after they have been cleared, shall result in a crash.
> > > > > > > >
> > > > > > > > I recall it was painful at the time when we had to deal with flush()
> > > > > > > > to get the right ordering, and I had done the same. However, since
> > > > > > > > stop() is only called by configureStreams() I removed the
> > > > > > > > streams_.clear() there and had issues, so I gave up.
> > > > > > >
> > > > > > > Do you mean you have tried removing streams_.clear() from
> > > > > > > configureStreams() while testing this patch and ran into issues, or that
> > > > > > > you tried that when implementing support for flush() ?
> > > > > >
> > > > > > I meant when developing flush().
> > > > > > I haven't tried to do the same on top of this patch, but it might be
> > > > > > good to do so ?
> > > > >
> > > > > I think so, because if it works, it's a good cleanup (possibly as part
> > > > > of this patch actually, not on top), and if it doesn't, there's
> > > > > something fishy :-) Would you be able to test it ?
> > > >
> > > > With this applied
> > > >
> > > > --- a/src/android/camera_device.cpp
> > > > +++ b/src/android/camera_device.cpp
> > > > @@ -544,6 +544,7 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >                 LOG(HAL, Error) << "No streams in configuration";
> > > >                 return -EINVAL;
> > > >         }
> > > > +       streams_.reserve(stream_list->num_streams);
> > > >
> > > >  #if defined(OS_CHROMEOS)
> > > >         if (!validateCropRotate(*stream_list))
> > > > @@ -560,14 +561,6 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >                 return -EINVAL;
> > > >         }
> > > >
> > > > -       /*
> > > > -        * Clear and remove any existing configuration from previous calls, and
> > > > -        * ensure the required entries are available without further
> > > > -        * reallocation.
> > > > -        */
> > > > -       streams_.clear();
> > > > -       streams_.reserve(stream_list->num_streams);
> > > > -
> > > >         std::vector<Camera3StreamConfig> streamConfigs;
> > > >         streamConfigs.reserve(stream_list->num_streams);
> > > >
> > > >  I can easily get a segfault running CTS.
> > > >  I analyzed the coredump and the stack trace doesn't help much
> > > >
> > > > (soraka-libcamera-gdb) bt
> > > > #0  0x000000043983e000 in ?? ()
> > > > #1  0x00007eee396ee5b5 in base::OnceCallback<void ()>::Run() && (this=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback.h:102
> > > > #2  base::TaskAnnotator::RunTask (this=<optimized out>, trace_event_name=<optimized out>, pending_task=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/common/task_annotator.cc:163
> > > > #3  0x00007eee396ff465 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl (this=this@entry=0x7eee2c035ed0, continuation_lazy_now=continuation_lazy_now@entry=0x7eee34bd0910)
> > > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:332
> > > > #4  0x00007eee396ff27c in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork (this=0x7eee2c035ed0)
> > > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:252
> > > > #5  0x00007eee396a322d in base::MessagePumpDefault::Run (this=0x7eee1c00cc70, delegate=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/message_loop/message_pump_default.cc:39
> > > > #6  0x00007eee396ff730 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run (this=<optimized out>, application_tasks_allowed=<optimized out>, timeout=...)
> > > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:446
> > > > #7  0x00007eee396d0325 in base::RunLoop::Run (this=0x7eee34bd0aa0) at ../../../libchrome-0.0.1/platform2/libchrome/base/run_loop.cc:124
> > > > #8  0x00007eee3971d04e in base::Thread::ThreadMain (this=0x7eee2c034090) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:382
> > > > #9  0x00007eee39718f3d in base::(anonymous namespace)::ThreadFunc (params=0x7eee2c033c50) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> > > > #10 0x00007eee395adfbf in start_thread (arg=0x7eee34bd1640) at pthread_create.c:463
> > > > #11 0x00007eee393f735f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > > > (soraka-libcamera-gdb) quit
> > > >
> > > > The last ?? in #0 makes me think I might not have symbols for the
> > > > camera service, which I cannot easily recompile due to the cros-debug
> > > > USE flag issue. I need to rebuild all the packages and I've now kicked
> > > > it.
> > > >
> > > > One day I'll learn to shut up and ack patches without asking for
> > > > reasons to fall into rabbit holes like this one.
> > >
> > > Thanks for the quick test.
> >
> > I can confirm removing streams_.clear() from configureStreams() easily
> > triggers a segfault when running CTS. That's indeed the same issue I
> > was facing when developing flush().
> >
> > I really cannot figure out why it happens looking at the code. I got
> > a few more stack traces with debuga symbols in the camera service from the
> > coredumps but they do not seem that helpful at a first look.
> >
> > [Current thread is 1 (Thread 0x7c372ffff640 (LWP 8265))]
> > (soraka-libcamera-gdb) bt
> > #0  std::__1::__cxx_atomic_fetch_sub<int> (__a=0x400000000, __delta=1, __order=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1072
> > #1  std::__1::__atomic_base<int, true>::fetch_sub (this=0x400000000, __op=1, __m=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1719
> > #2  base::AtomicRefCount::Decrement (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/atomic_ref_count.h:39
> > #3  base::subtle::RefCountedThreadSafeBase::ReleaseImpl (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:218
> > #4  base::subtle::RefCountedThreadSafeBase::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:170
> > #5  base::RefCountedThreadSafe<base::internal::BindStateBase, base::internal::BindStateBaseRefCountTraits>::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:398
> > #6  scoped_refptr<base::internal::BindStateBase>::Release (ptr=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:322
> > #7  scoped_refptr<base::internal::BindStateBase>::~scoped_refptr (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:224
> > #8  base::internal::CallbackBase::~CallbackBase (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback_internal.cc:85
> > #9  0x00007c37561eb82b in base::sequence_manager::internal::AtomicFlagSet::Group::~Group (this=0x7c373000ef80) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:132
> > #10 std::__1::default_delete<base::sequence_manager::internal::AtomicFlagSet::Group>::operator() (this=<optimized out>, __ptr=0x7c373000ef80) at /usr/bin/../include/c++/v1/memory:1335
> > #11 0x00007c37561eaede in base::sequence_manager::internal::AtomicFlagSet::RemoveFromAllocList (group=0x7c373000ef80, this=<optimized out>) at /usr/bin/../include/c++/v1/memory:1595
> > #12 base::sequence_manager::internal::AtomicFlagSet::AtomicFlag::ReleaseAtomicFlag (this=0x7c3730013b90) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:77
> > #13 0x00007c37561f3cec in base::sequence_manager::internal::TaskQueueImpl::UnregisterTaskQueue (this=0x7c37300139e0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/task_queue_impl.cc:201
> > #14 0x00007c37561ec1df in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:234
> > #15 0x00007c37561ec52e in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:212
> > #16 0x00007c3756218317 in std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl>::operator() (this=0x7c3730014848, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
> > #17 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::reset (this=0x7c3730014848, __p=0x0)
> >     at /usr/bin/../include/c++/v1/memory:1596
> > #18 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::~unique_ptr (this=0x7c3730014848)
> >     at /usr/bin/../include/c++/v1/memory:1550
> > #19 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
> > #20 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
> > #21 0x00007c3756218098 in std::__1::default_delete<base::Thread::Delegate>::operator() (this=0x7c3744039bc0, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
> > #22 std::__1::unique_ptr<base::Thread::Delegate, std::__1::default_delete<base::Thread::Delegate> >::reset (this=0x7c3744039bc0, __p=0x0) at /usr/bin/../include/c++/v1/memory:1596
> > #23 base::Thread::ThreadMain (this=0x7c3744039b40) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:400
> > #24 0x00007c3756213f3d in base::(anonymous namespace)::ThreadFunc (params=0x7c3730011840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> > #25 0x00007c37560a8fbf in start_thread (arg=0x7c372ffff640) at pthread_create.c:463
> > #26 0x00007c3755ef235f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> >
> > >
> > > Hiro, is this something you can look at ?
>
> I guess this is because streams_ is not cleared when state_ is Stopped.
> Could you try again by modifying as follows?
>
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -448,8 +448,10 @@ void CameraDevice::flush()
>  void CameraDevice::stop()
>  {
>         MutexLocker stateLock(stateMutex_);
> -       if (state_ == State::Stopped)
> +       if (state_ == State::Stopped) {
> +               streams_.clear();
>                 return;
> +       }

Oh I see your reasoning!
It seems right, if we have a flush, it will stop the camera, and the
next call to stop() won't clear streams_.

However, I've not been able to run a single meaningful test the whole
day due to spurious crashes in the FrameBuffer class.

I'll wait for the dust to settle and re-test this change.
>
> Thanks,
> -Hiro
> > >
> > > > > > > > But I have this patch gone through cros_camera_test and also CTS seems
> > > > > > > > fine. Also looking at the code it seems an harmless change so I wonder
> > > > > > > > what was wrong at the time.
> > > > > > > >
> > > > > > > > As far as I can tell:
> > > > > > > >
> > > > > > > > Tested-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > > > Acked-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > > >
> > > > > > > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > > > > > > ---
> > > > > > > > >  src/android/camera_device.cpp | 4 ++--
> > > > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > > > > index 8ca76719..fda77db4 100644
> > > > > > > > > --- a/src/android/camera_device.cpp
> > > > > > > > > +++ b/src/android/camera_device.cpp
> > > > > > > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > > > > >
> > > > > > > > >  void CameraDevice::close()
> > > > > > > > >  {
> > > > > > > > > - streams_.clear();
> > > > > > > > > -
> > > > > > > > >   stop();
> > > > > > > > >
> > > > > > > > >   camera_->release();
> > > > > > > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > > > > > > >   camera_->stop();
> > > > > > > > >
> > > > > > > > >   descriptors_.clear();
> > > > > > > > > + streams_.clear();
> > > > > > > > > +
> > > > > > > > >   state_ = State::Stopped;
> > > > > > > > >  }
> > > > > > > > >
> > >
> > > --
> > > Regards,
> > >
> > > Laurent Pinchart
Jacopo Mondi Sept. 6, 2021, 1:27 p.m. UTC | #12
Hi Hiro,

On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> The problem is happening because we seem to add a CameraStream
> associated buffer(depending on the CameraStream::Type) to the Request,
> in CameraDevice::processCaptureRequest().
>
> However, when the camera stops, all the current buffers are marked with
> FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> associated with the CameraStream (that was previously added to the
> request) has now been cleared out with a part of streams_.clear(), even
> before the camera stop() has been invoked. Any access to those request
> buffers after they have been cleared, shall result in a crash.
>
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>

I have re-tested today this patch, and piled some other on top.

I have fixed the commit message and collected tags, can I resend this
as part of a larger series ?

Thanks
   j

> ---
>  src/android/camera_device.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 8ca76719..fda77db4 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> -
>  	stop();
>
>  	camera_->release();
> @@ -457,6 +455,8 @@ void CameraDevice::stop()
>  	camera_->stop();
>
>  	descriptors_.clear();
> +	streams_.clear();
> +
>  	state_ = State::Stopped;
>  }
>
> --
> 2.33.0.259.gc128427fd7-goog
>
Hirokazu Honda Sept. 6, 2021, 1:32 p.m. UTC | #13
Hi Jacopo,

On Mon, Sep 6, 2021 at 10:26 PM Jacopo Mondi <jacopo@jmondi.org> wrote:
>
> Hi Hiro,
>
> On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > The problem is happening because we seem to add a CameraStream
> > associated buffer(depending on the CameraStream::Type) to the Request,
> > in CameraDevice::processCaptureRequest().
> >
> > However, when the camera stops, all the current buffers are marked with
> > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > associated with the CameraStream (that was previously added to the
> > request) has now been cleared out with a part of streams_.clear(), even
> > before the camera stop() has been invoked. Any access to those request
> > buffers after they have been cleared, shall result in a crash.
> >
> > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
>
> I have re-tested today this patch, and piled some other on top.
>
> I have fixed the commit message and collected tags, can I resend this
> as part of a larger series ?
>

Sure. If you think it is necessary, please do so.

-Hiro
> Thanks
>    j
>
> > ---
> >  src/android/camera_device.cpp | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index 8ca76719..fda77db4 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> >  void CameraDevice::close()
> >  {
> > -     streams_.clear();
> > -
> >       stop();
> >
> >       camera_->release();
> > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> >       camera_->stop();
> >
> >       descriptors_.clear();
> > +     streams_.clear();
> > +
> >       state_ = State::Stopped;
> >  }
> >
> > --
> > 2.33.0.259.gc128427fd7-goog
> >

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index 8ca76719..fda77db4 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -423,8 +423,6 @@  int CameraDevice::open(const hw_module_t *hardwareModule)
 
 void CameraDevice::close()
 {
-	streams_.clear();
-
 	stop();
 
 	camera_->release();
@@ -457,6 +455,8 @@  void CameraDevice::stop()
 	camera_->stop();
 
 	descriptors_.clear();
+	streams_.clear();
+
 	state_ = State::Stopped;
 }