android: camera_device: Fix camera blocked issue when stopping capture
diff mbox series

Message ID 20240304093539.546973-1-anle.pan@nxp.com
State Superseded
Headers show
Series
  • android: camera_device: Fix camera blocked issue when stopping capture
Related show

Commit Message

Anle Pan March 4, 2024, 9:35 a.m. UTC
The issue occurs when stopping capture soon after starting capture.

In this case, no frame get from the device, the
related capture request has been pushed to the
queue descriptors_, but the queuedRequests_ was
still empty due to no requests will be queue to
the device since the stream will be stopped soon,
so there will be no camera->requestComplete called
later, then the descriptors_ can not pop normally,
this will cause the pending if we want to start capture next time.

To fix the issue, ensure the descriptors_ is
empty after the camera device is stopped.

Signed-off-by: Anle Pan <anle.pan@nxp.com>
---
 src/android/camera_device.cpp | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Umang Jain March 5, 2024, 8:26 a.m. UTC | #1
Hi Anle Pan, Hui Fang

On 04/03/24 3:05 pm, Anle Pan wrote:
> The issue occurs when stopping capture soon after starting capture.

Can you describe the issue/use case in more detail? It's really hard to 
infer from one line.
>
> In this case, no frame get from the device, the
> related capture request has been pushed to the
> queue descriptors_, but the queuedRequests_ was
> still empty due to no requests will be queue to
> the device since the stream will be stopped soon,

I don't get it how is this possible? Since processCaptureRequest() will 
push the request in the descriptors_ queue along with queuing the 
request to the camera.

How will the call path be interleaved /before/ processCaptureRequest() 
returns? Possibly we need more details on what you are trying to do, 
before reviewing the code here. Is this related to CTS ?
> so there will be no camera->requestComplete called
> later, then the descriptors_ can not pop normally,
> this will cause the pending if we want to start capture next time.
>
> To fix the issue, ensure the descriptors_ is
> empty after the camera device is stopped.
>
> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> ---
>   src/android/camera_device.cpp | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 25cedd44..d452992d 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>   void CameraDevice::stop()
>   {
>   	MutexLocker stateLock(stateMutex_);
> -	if (state_ == State::Stopped)
> +	if (state_ == State::Stopped) {
> +		MutexLocker descriptorsLock(descriptorsMutex_);
> +		descriptors_ = {};
>   		return;
> +	}
>   
>   	camera_->stop();
>
Anle Pan March 5, 2024, 9:28 a.m. UTC | #2
Hi Umang Jain,

this issue was random(around 1/10), and can be reproduced by single run below cts test:
	./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90

•	3 times configuring streams in this Test:
1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1

•	when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.

Best Regards
Anle Pan

-----Original Message-----
From: Umang Jain <umang.jain@ideasonboard.com> 
Sent: 2024年3月5日 16:27
To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
Cc: Hui Fang <hui.fang@nxp.com>
Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture

Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button


Hi Anle Pan, Hui Fang

On 04/03/24 3:05 pm, Anle Pan wrote:
> The issue occurs when stopping capture soon after starting capture.

Can you describe the issue/use case in more detail? It's really hard to infer from one line.
>
> In this case, no frame get from the device, the related capture 
> request has been pushed to the queue descriptors_, but the 
> queuedRequests_ was still empty due to no requests will be queue to 
> the device since the stream will be stopped soon,

I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.

How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
> so there will be no camera->requestComplete called later, then the 
> descriptors_ can not pop normally, this will cause the pending if we 
> want to start capture next time.
>
> To fix the issue, ensure the descriptors_ is empty after the camera 
> device is stopped.
>
> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> ---
>   src/android/camera_device.cpp | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/android/camera_device.cpp 
> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>   void CameraDevice::stop()
>   {
>       MutexLocker stateLock(stateMutex_);
> -     if (state_ == State::Stopped)
> +     if (state_ == State::Stopped) {
> +             MutexLocker descriptorsLock(descriptorsMutex_);
> +             descriptors_ = {};
>               return;
> +     }
>
>       camera_->stop();
>
Jacopo Mondi March 22, 2024, 10:36 a.m. UTC | #3
Hello

On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
> Hi Umang Jain,
>
> this issue was random(around 1/10), and can be reproduced by single run below cts test:
> 	./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
>
> •	3 times configuring streams in this Test:
> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>
> •	when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
>

The only code path I see that could lead to descriptor_ not being
cleared is flush() being called and then stop() called just after
possibily as the result of a new configuration

void CameraDevice::flush()
{
	{
		MutexLocker stateLock(stateMutex_);
		if (state_ != State::Running)
			return;

		state_ = State::Flushing;
	}

	camera_->stop();

	MutexLocker stateLock(stateMutex_);
	state_ = State::Stopped;
}

void CameraDevice::stop()
{
	MutexLocker stateLock(stateMutex_);
	if (state_ == State::Stopped)
		return;

	camera_->stop();

	{
		MutexLocker descriptorsLock(descriptorsMutex_);
		descriptors_ = {};
	}

	streams_.clear();

	state_ = State::Stopped;
}

int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
{
	/* Before any configuration attempt, stop the camera. */
	stop();

        ...

}

Looking at the flush() implementation right now I wonder why we stop
the camera without the stateMutex_ held (the only reason I see is
because it can be concurrent with processCaptureRequest() which keeps
the mutex held), but I certainly don't want to touch that code right
now without a good reason.

All in all, I think there's indeed a path that could lead to the
descriptor not being cleaned up, and we can't do it in flush() because
a concurrent call to processCaptureRequest() might be using the
descriptors to complete the in-flight requests with error state.

Anle, does this match your understanding ? If so, this is what should
be recorded in the commit message (the cause of an issue, not the
symptoms)


> Best Regards
> Anle Pan
>
> -----Original Message-----
> From: Umang Jain <umang.jain@ideasonboard.com>
> Sent: 2024年3月5日 16:27
> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
> Cc: Hui Fang <hui.fang@nxp.com>
> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
>
> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
>
>
> Hi Anle Pan, Hui Fang
>
> On 04/03/24 3:05 pm, Anle Pan wrote:
> > The issue occurs when stopping capture soon after starting capture.
>
> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
> >
> > In this case, no frame get from the device, the related capture
> > request has been pushed to the queue descriptors_, but the
> > queuedRequests_ was still empty due to no requests will be queue to
> > the device since the stream will be stopped soon,
>
> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
>
> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
> > so there will be no camera->requestComplete called later, then the
> > descriptors_ can not pop normally, this will cause the pending if we
> > want to start capture next time.
> >
> > To fix the issue, ensure the descriptors_ is empty after the camera
> > device is stopped.

Do you know precisely why a non-empty descriptors_ stalls the camera ?
Does it get stuck on

void CameraDevice::sendCaptureResults()
{
	while (!descriptors_.empty() && !descriptors_.front()->isPending()) {

?

> >
> > Signed-off-by: Anle Pan <anle.pan@nxp.com>

Tested with CTS, no regressions detected

Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>

> > ---
> >   src/android/camera_device.cpp | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/android/camera_device.cpp
> > b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -433,8 +433,11 @@ void CameraDevice::flush()
> >   void CameraDevice::stop()
> >   {
> >       MutexLocker stateLock(stateMutex_);
> > -     if (state_ == State::Stopped)
> > +     if (state_ == State::Stopped) {
> > +             MutexLocker descriptorsLock(descriptorsMutex_);
> > +             descriptors_ = {};
> >               return;
> > +     }
> >
> >       camera_->stop();
> >
>
Umang Jain March 25, 2024, 7:43 a.m. UTC | #4
Hi Jacopo,

Thanks for testing,

On 22/03/24 4:06 pm, Jacopo Mondi wrote:
> Hello
>
> On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
>> Hi Umang Jain,
>>
>> this issue was random(around 1/10), and can be reproduced by single run below cts test:
>> 	./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
>>
>> •	3 times configuring streams in this Test:
>> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
>> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>>
>> •	when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
>>
> The only code path I see that could lead to descriptor_ not being
> cleared is flush() being called and then stop() called just after
> possibily as the result of a new configuration
>
> void CameraDevice::flush()
> {
> 	{
> 		MutexLocker stateLock(stateMutex_);
> 		if (state_ != State::Running)
> 			return;
>
> 		state_ = State::Flushing;
> 	}
>
> 	camera_->stop();
>
> 	MutexLocker stateLock(stateMutex_);
> 	state_ = State::Stopped;
> }
>
> void CameraDevice::stop()
> {
> 	MutexLocker stateLock(stateMutex_);
> 	if (state_ == State::Stopped)
> 		return;
>
> 	camera_->stop();
>
> 	{
> 		MutexLocker descriptorsLock(descriptorsMutex_);
> 		descriptors_ = {};
> 	}
>
> 	streams_.clear();
>
> 	state_ = State::Stopped;
> }
>
> int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> {
> 	/* Before any configuration attempt, stop the camera. */
> 	stop();
>
>          ...
>
> }
>
> Looking at the flush() implementation right now I wonder why we stop
> the camera without the stateMutex_ held (the only reason I see is
> because it can be concurrent with processCaptureRequest() which keeps
> the mutex held), but I certainly don't want to touch that code right
> now without a good reason.
>
> All in all, I think there's indeed a path that could lead to the
> descriptor not being cleaned up, and we can't do it in flush() because
> a concurrent call to processCaptureRequest() might be using the
> descriptors to complete the in-flight requests with error state.
>
> Anle, does this match your understanding ? If so, this is what should
> be recorded in the commit message (the cause of an issue, not the
> symptoms)
>
>
>> Best Regards
>> Anle Pan
>>
>> -----Original Message-----
>> From: Umang Jain <umang.jain@ideasonboard.com>
>> Sent: 2024年3月5日 16:27
>> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
>> Cc: Hui Fang <hui.fang@nxp.com>
>> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
>>
>> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
>>
>>
>> Hi Anle Pan, Hui Fang
>>
>> On 04/03/24 3:05 pm, Anle Pan wrote:
>>> The issue occurs when stopping capture soon after starting capture.
>> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
>>> In this case, no frame get from the device, the related capture
>>> request has been pushed to the queue descriptors_, but the
>>> queuedRequests_ was still empty due to no requests will be queue to
>>> the device since the stream will be stopped soon,
>> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
>>
>> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
>>> so there will be no camera->requestComplete called later, then the
>>> descriptors_ can not pop normally, this will cause the pending if we
>>> want to start capture next time.
>>>
>>> To fix the issue, ensure the descriptors_ is empty after the camera
>>> device is stopped.
> Do you know precisely why a non-empty descriptors_ stalls the camera ?
> Does it get stuck on
>
> void CameraDevice::sendCaptureResults()
> {
> 	while (!descriptors_.empty() && !descriptors_.front()->isPending()) {
>
> ?

We should to formalise this as an proper issue, after getting more 
information.
>>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> Tested with CTS, no regressions detected
>
> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>
>>> ---
>>>    src/android/camera_device.cpp | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/android/camera_device.cpp
>>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
>>> --- a/src/android/camera_device.cpp
>>> +++ b/src/android/camera_device.cpp
>>> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>>>    void CameraDevice::stop()
>>>    {
>>>        MutexLocker stateLock(stateMutex_);
>>> -     if (state_ == State::Stopped)
>>> +     if (state_ == State::Stopped) {
>>> +             MutexLocker descriptorsLock(descriptorsMutex_);
>>> +             descriptors_ = {};
>>>                return;
>>> +     }

I am not comfortable with this patch. Consider my points below

- First we don't have a exact description of the issue it is fixing

- Second, if the camera state is ::Stopped already. Clears descriptors_ 
doesn't look a good idea to me. The function which has set the camera 
state::Stopped already, should be ideally be emptying the descriptors_...

- If we are accepting this, why not just let the Camera::stop() fall 
through and remove

             if (state_ == State::Stopped)
                 return;

as well. Then, CameraDevice::stop() will behave same as Camera::stop() - 
a no-op when called even when State::Stopped and responsible to clean up 
queues(and stuff) for the HAL layer.

I should ideally back up my thoughts with some testing, but I don't have 
any kind of CTS testing available to me right now :(
>>>        camera_->stop();
>>>
Anle Pan March 25, 2024, 9:20 a.m. UTC | #5
Hi Umang, Jacopo.

Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10)

		void CameraDevice::sendCaptureResults()
		{
        	while (!descriptors_.empty() && !descriptors_.front()->isPending()) {

And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() .
will ask hui to help update the commit message since the "git send" configuration has issue on my side.

Best Regards
Anle Pan

-----Original Message-----
From: Umang Jain <umang.jain@ideasonboard.com> 
Sent: 2024年3月25日 15:44
To: Jacopo Mondi <jacopo.mondi@ideasonboard.com>; Anle Pan <anle.pan@nxp.com>
Cc: libcamera-devel@lists.libcamera.org; Hui Fang <hui.fang@nxp.com>
Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture

Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button


Hi Jacopo,

Thanks for testing,

On 22/03/24 4:06 pm, Jacopo Mondi wrote:
> Hello
>
> On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
>> Hi Umang Jain,
>>
>> this issue was random(around 1/10), and can be reproduced by single run below cts test:
>>      ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m 
>> CtsMediaPlayerTestCases -t 
>> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
>>
>> •    3 times configuring streams in this Test:
>> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
>> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>>
>> •    when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
>>
> The only code path I see that could lead to descriptor_ not being 
> cleared is flush() being called and then stop() called just after 
> possibily as the result of a new configuration
>
> void CameraDevice::flush()
> {
>       {
>               MutexLocker stateLock(stateMutex_);
>               if (state_ != State::Running)
>                       return;
>
>               state_ = State::Flushing;
>       }
>
>       camera_->stop();
>
>       MutexLocker stateLock(stateMutex_);
>       state_ = State::Stopped;
> }
>
> void CameraDevice::stop()
> {
>       MutexLocker stateLock(stateMutex_);
>       if (state_ == State::Stopped)
>               return;
>
>       camera_->stop();
>
>       {
>               MutexLocker descriptorsLock(descriptorsMutex_);
>               descriptors_ = {};
>       }
>
>       streams_.clear();
>
>       state_ = State::Stopped;
> }
>
> int CameraDevice::configureStreams(camera3_stream_configuration_t 
> *stream_list) {
>       /* Before any configuration attempt, stop the camera. */
>       stop();
>
>          ...
>
> }
>
> Looking at the flush() implementation right now I wonder why we stop 
> the camera without the stateMutex_ held (the only reason I see is 
> because it can be concurrent with processCaptureRequest() which keeps 
> the mutex held), but I certainly don't want to touch that code right 
> now without a good reason.
>
> All in all, I think there's indeed a path that could lead to the 
> descriptor not being cleaned up, and we can't do it in flush() because 
> a concurrent call to processCaptureRequest() might be using the 
> descriptors to complete the in-flight requests with error state.
>
> Anle, does this match your understanding ? If so, this is what should 
> be recorded in the commit message (the cause of an issue, not the
> symptoms)
>
>
>> Best Regards
>> Anle Pan
>>
>> -----Original Message-----
>> From: Umang Jain <umang.jain@ideasonboard.com>
>> Sent: 2024年3月5日 16:27
>> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
>> Cc: Hui Fang <hui.fang@nxp.com>
>> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked 
>> issue when stopping capture
>>
>> Caution: This is an external email. Please take care when clicking 
>> links or opening attachments. When in doubt, report the message using 
>> the 'Report this email' button
>>
>>
>> Hi Anle Pan, Hui Fang
>>
>> On 04/03/24 3:05 pm, Anle Pan wrote:
>>> The issue occurs when stopping capture soon after starting capture.
>> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
>>> In this case, no frame get from the device, the related capture 
>>> request has been pushed to the queue descriptors_, but the 
>>> queuedRequests_ was still empty due to no requests will be queue to 
>>> the device since the stream will be stopped soon,
>> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
>>
>> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
>>> so there will be no camera->requestComplete called later, then the 
>>> descriptors_ can not pop normally, this will cause the pending if we 
>>> want to start capture next time.
>>>
>>> To fix the issue, ensure the descriptors_ is empty after the camera 
>>> device is stopped.
> Do you know precisely why a non-empty descriptors_ stalls the camera ?
> Does it get stuck on
>
> void CameraDevice::sendCaptureResults()
> {
>       while (!descriptors_.empty() && 
> !descriptors_.front()->isPending()) {
>
> ?

We should to formalise this as an proper issue, after getting more information.
>>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> Tested with CTS, no regressions detected
>
> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>
>>> ---
>>>    src/android/camera_device.cpp | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/android/camera_device.cpp 
>>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
>>> --- a/src/android/camera_device.cpp
>>> +++ b/src/android/camera_device.cpp
>>> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>>>    void CameraDevice::stop()
>>>    {
>>>        MutexLocker stateLock(stateMutex_);
>>> -     if (state_ == State::Stopped)
>>> +     if (state_ == State::Stopped) {
>>> +             MutexLocker descriptorsLock(descriptorsMutex_);
>>> +             descriptors_ = {};
>>>                return;
>>> +     }

I am not comfortable with this patch. Consider my points below

- First we don't have a exact description of the issue it is fixing

- Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_...

- If we are accepting this, why not just let the Camera::stop() fall through and remove

             if (state_ == State::Stopped)
                 return;

as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer.

I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :(
>>>        camera_->stop();
>>>
Fang Hui March 26, 2024, 3:13 a.m. UTC | #6
@Anle Pan<mailto:anle.pan@nxp.com> No need "git send-email" again, will new another topic.

Just reply the refined patch.

Author: Anle Pan <anle.pan@nxp.com>
Date:   Fri Mar 1 17:32:54 2024 +0000

    android: camera_device: Fix the stuck issue in sendCaptureResults.

    When flush() is being called and then a new stream configuration is set,
    descriptor_  may has a chance to be not cleared, which was skipped due to
    the Stopped State. This will cause a stuck in sendCaptureResults.

    To fix the issue, ensure the descriptors_ is always empty when the camera
    State is Stopped.

    Signed-off-by: Anle Pan <anle.pan@nxp.com>

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index d2679a97..7fda6ce6 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -433,8 +433,11 @@ void CameraDevice::flush()
 void CameraDevice::stop()
 {
        MutexLocker stateLock(stateMutex_);
-       if (state_ == State::Stopped)
+       if (state_ == State::Stopped) {
+               MutexLocker descriptorsLock(descriptorsMutex_);
+               descriptors_ = {};
                return;
+       }

        camera_->stop();
Jacopo Mondi March 26, 2024, 8:08 a.m. UTC | #7
Hi,

On Tue, Mar 26, 2024 at 03:13:05AM +0000, Hui Fang wrote:
> @Anle Pan<mailto:anle.pan@nxp.com> No need "git send-email" again, will new another topic.
>
> Just reply the refined patch.

not a problem for this patch, but as a general rule, do not reply to a
patch with a new version of the same patch, just send a new one. It's
easier to track and I think it is also required for patchwork to
identify it correctly.

>
> Author: Anle Pan <anle.pan@nxp.com>
> Date:   Fri Mar 1 17:32:54 2024 +0000
>
>     android: camera_device: Fix the stuck issue in sendCaptureResults.

No '.' at the end of the commit message.

The actual subject should be something like

android: camera_device: Always clear descriptors_ in stop()

>
>     When flush() is being called and then a new stream configuration is set,
>     descriptor_  may has a chance to be not cleared, which was skipped due to

descriptors_

>     the Stopped State. This will cause a stuck in sendCaptureResults.

due to the Camera being in Stopped state.
>
>     To fix the issue, ensure the descriptors_ is always empty when the camera
>     State is Stopped.
>
>     Signed-off-by: Anle Pan <anle.pan@nxp.com>

Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>

With the few fixes above, I'll merge.

Also, as you have sent this new version, I'll add

Signed-off-by: Fang Hui <hui.fang@nxp.com>

Thanks
   j
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index d2679a97..7fda6ce6 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>  void CameraDevice::stop()
>  {
>         MutexLocker stateLock(stateMutex_);
> -       if (state_ == State::Stopped)
> +       if (state_ == State::Stopped) {
> +               MutexLocker descriptorsLock(descriptorsMutex_);
> +               descriptors_ = {};
>                 return;
> +       }
>
>         camera_->stop();
> ________________________________
> From: Anle Pan <anle.pan@nxp.com>
> Sent: Monday, March 25, 2024 5:20 PM
> To: Umang Jain <umang.jain@ideasonboard.com>; Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> Cc: libcamera-devel@lists.libcamera.org <libcamera-devel@lists.libcamera.org>; Hui Fang <hui.fang@nxp.com>
> Subject: RE: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
>
> Hi Umang, Jacopo.
>
> Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10)
>
>                 void CameraDevice::sendCaptureResults()
>                 {
>          while (!descriptors_.empty() && !descriptors_.front()->isPending()) {
>
> And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() .
> will ask hui to help update the commit message since the "git send" configuration has issue on my side.
>
> Best Regards
> Anle Pan
>
> -----Original Message-----
> From: Umang Jain <umang.jain@ideasonboard.com>
> Sent: 2024年3月25日 15:44
> To: Jacopo Mondi <jacopo.mondi@ideasonboard.com>; Anle Pan <anle.pan@nxp.com>
> Cc: libcamera-devel@lists.libcamera.org; Hui Fang <hui.fang@nxp.com>
> Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
>
> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
>
>
> Hi Jacopo,
>
> Thanks for testing,
>
> On 22/03/24 4:06 pm, Jacopo Mondi wrote:
> > Hello
> >
> > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
> >> Hi Umang Jain,
> >>
> >> this issue was random(around 1/10), and can be reproduced by single run below cts test:
> >>      ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m
> >> CtsMediaPlayerTestCases -t
> >> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
> >>
> >> •    3 times configuring streams in this Test:
> >> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> >> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
> >> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> >>
> >> •    when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
> >>
> > The only code path I see that could lead to descriptor_ not being
> > cleared is flush() being called and then stop() called just after
> > possibily as the result of a new configuration
> >
> > void CameraDevice::flush()
> > {
> >       {
> >               MutexLocker stateLock(stateMutex_);
> >               if (state_ != State::Running)
> >                       return;
> >
> >               state_ = State::Flushing;
> >       }
> >
> >       camera_->stop();
> >
> >       MutexLocker stateLock(stateMutex_);
> >       state_ = State::Stopped;
> > }
> >
> > void CameraDevice::stop()
> > {
> >       MutexLocker stateLock(stateMutex_);
> >       if (state_ == State::Stopped)
> >               return;
> >
> >       camera_->stop();
> >
> >       {
> >               MutexLocker descriptorsLock(descriptorsMutex_);
> >               descriptors_ = {};
> >       }
> >
> >       streams_.clear();
> >
> >       state_ = State::Stopped;
> > }
> >
> > int CameraDevice::configureStreams(camera3_stream_configuration_t
> > *stream_list) {
> >       /* Before any configuration attempt, stop the camera. */
> >       stop();
> >
> >          ...
> >
> > }
> >
> > Looking at the flush() implementation right now I wonder why we stop
> > the camera without the stateMutex_ held (the only reason I see is
> > because it can be concurrent with processCaptureRequest() which keeps
> > the mutex held), but I certainly don't want to touch that code right
> > now without a good reason.
> >
> > All in all, I think there's indeed a path that could lead to the
> > descriptor not being cleaned up, and we can't do it in flush() because
> > a concurrent call to processCaptureRequest() might be using the
> > descriptors to complete the in-flight requests with error state.
> >
> > Anle, does this match your understanding ? If so, this is what should
> > be recorded in the commit message (the cause of an issue, not the
> > symptoms)
> >
> >
> >> Best Regards
> >> Anle Pan
> >>
> >> -----Original Message-----
> >> From: Umang Jain <umang.jain@ideasonboard.com>
> >> Sent: 2024年3月5日 16:27
> >> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
> >> Cc: Hui Fang <hui.fang@nxp.com>
> >> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked
> >> issue when stopping capture
> >>
> >> Caution: This is an external email. Please take care when clicking
> >> links or opening attachments. When in doubt, report the message using
> >> the 'Report this email' button
> >>
> >>
> >> Hi Anle Pan, Hui Fang
> >>
> >> On 04/03/24 3:05 pm, Anle Pan wrote:
> >>> The issue occurs when stopping capture soon after starting capture.
> >> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
> >>> In this case, no frame get from the device, the related capture
> >>> request has been pushed to the queue descriptors_, but the
> >>> queuedRequests_ was still empty due to no requests will be queue to
> >>> the device since the stream will be stopped soon,
> >> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
> >>
> >> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
> >>> so there will be no camera->requestComplete called later, then the
> >>> descriptors_ can not pop normally, this will cause the pending if we
> >>> want to start capture next time.
> >>>
> >>> To fix the issue, ensure the descriptors_ is empty after the camera
> >>> device is stopped.
> > Do you know precisely why a non-empty descriptors_ stalls the camera ?
> > Does it get stuck on
> >
> > void CameraDevice::sendCaptureResults()
> > {
> >       while (!descriptors_.empty() &&
> > !descriptors_.front()->isPending()) {
> >
> > ?
>
> We should to formalise this as an proper issue, after getting more information.
> >>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> > Tested with CTS, no regressions detected
> >
> > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> >
> >>> ---
> >>>    src/android/camera_device.cpp | 5 ++++-
> >>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/src/android/camera_device.cpp
> >>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
> >>> --- a/src/android/camera_device.cpp
> >>> +++ b/src/android/camera_device.cpp
> >>> @@ -433,8 +433,11 @@ void CameraDevice::flush()
> >>>    void CameraDevice::stop()
> >>>    {
> >>>        MutexLocker stateLock(stateMutex_);
> >>> -     if (state_ == State::Stopped)
> >>> +     if (state_ == State::Stopped) {
> >>> +             MutexLocker descriptorsLock(descriptorsMutex_);
> >>> +             descriptors_ = {};
> >>>                return;
> >>> +     }
>
> I am not comfortable with this patch. Consider my points below
>
> - First we don't have a exact description of the issue it is fixing
>
> - Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_...
>
> - If we are accepting this, why not just let the Camera::stop() fall through and remove
>
>              if (state_ == State::Stopped)
>                  return;
>
> as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer.
>
> I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :(
> >>>        camera_->stop();
> >>>
>
Fang Hui March 26, 2024, 8:20 a.m. UTC | #8
Hi, Jacopo
  For refined patch, always use "git send-email",  Even the subject changed?

BRs,
Fang Hui
Jacopo Mondi March 26, 2024, 8:22 a.m. UTC | #9
No worries, no need to re-send. Small things can be fixed (if you're
fine with what I proposed) when I'll apply the patch.

On Tue, Mar 26, 2024 at 08:20:31AM +0000, Hui Fang wrote:
> Hi, Jacopo
>   For refined patch, always use "git send-email",  Even the subject changed?
>
> BRs,
> Fang Hui
> ________________________________
> From: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> Sent: Tuesday, March 26, 2024 4:08 PM
> To: Hui Fang <hui.fang@nxp.com>
> Cc: Anle Pan <anle.pan@nxp.com>; Umang Jain <umang.jain@ideasonboard.com>; Jacopo Mondi <jacopo.mondi@ideasonboard.com>; libcamera-devel@lists.libcamera.org <libcamera-devel@lists.libcamera.org>
> Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
>
> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
>
>
> Hi,
>
> On Tue, Mar 26, 2024 at 03:13:05AM +0000, Hui Fang wrote:
> > @Anle Pan<mailto:anle.pan@nxp.com> No need "git send-email" again, will new another topic.
> >
> > Just reply the refined patch.
>
> not a problem for this patch, but as a general rule, do not reply to a
> patch with a new version of the same patch, just send a new one. It's
> easier to track and I think it is also required for patchwork to
> identify it correctly.
>
> >
> > Author: Anle Pan <anle.pan@nxp.com>
> > Date:   Fri Mar 1 17:32:54 2024 +0000
> >
> >     android: camera_device: Fix the stuck issue in sendCaptureResults.
>
> No '.' at the end of the commit message.
>
> The actual subject should be something like
>
> android: camera_device: Always clear descriptors_ in stop()
>
> >
> >     When flush() is being called and then a new stream configuration is set,
> >     descriptor_  may has a chance to be not cleared, which was skipped due to
>
> descriptors_
>
> >     the Stopped State. This will cause a stuck in sendCaptureResults.
>
> due to the Camera being in Stopped state.
> >
> >     To fix the issue, ensure the descriptors_ is always empty when the camera
> >     State is Stopped.
> >
> >     Signed-off-by: Anle Pan <anle.pan@nxp.com>
>
> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>
> With the few fixes above, I'll merge.
>
> Also, as you have sent this new version, I'll add
>
> Signed-off-by: Fang Hui <hui.fang@nxp.com>
>
> Thanks
>    j
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index d2679a97..7fda6ce6 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -433,8 +433,11 @@ void CameraDevice::flush()
> >  void CameraDevice::stop()
> >  {
> >         MutexLocker stateLock(stateMutex_);
> > -       if (state_ == State::Stopped)
> > +       if (state_ == State::Stopped) {
> > +               MutexLocker descriptorsLock(descriptorsMutex_);
> > +               descriptors_ = {};
> >                 return;
> > +       }
> >
> >         camera_->stop();
> > ________________________________
> > From: Anle Pan <anle.pan@nxp.com>
> > Sent: Monday, March 25, 2024 5:20 PM
> > To: Umang Jain <umang.jain@ideasonboard.com>; Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > Cc: libcamera-devel@lists.libcamera.org <libcamera-devel@lists.libcamera.org>; Hui Fang <hui.fang@nxp.com>
> > Subject: RE: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
> >
> > Hi Umang, Jacopo.
> >
> > Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10)
> >
> >                 void CameraDevice::sendCaptureResults()
> >                 {
> >          while (!descriptors_.empty() && !descriptors_.front()->isPending()) {
> >
> > And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() .
> > will ask hui to help update the commit message since the "git send" configuration has issue on my side.
> >
> > Best Regards
> > Anle Pan
> >
> > -----Original Message-----
> > From: Umang Jain <umang.jain@ideasonboard.com>
> > Sent: 2024年3月25日 15:44
> > To: Jacopo Mondi <jacopo.mondi@ideasonboard.com>; Anle Pan <anle.pan@nxp.com>
> > Cc: libcamera-devel@lists.libcamera.org; Hui Fang <hui.fang@nxp.com>
> > Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
> >
> > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
> >
> >
> > Hi Jacopo,
> >
> > Thanks for testing,
> >
> > On 22/03/24 4:06 pm, Jacopo Mondi wrote:
> > > Hello
> > >
> > > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
> > >> Hi Umang Jain,
> > >>
> > >> this issue was random(around 1/10), and can be reproduced by single run below cts test:
> > >>      ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m
> > >> CtsMediaPlayerTestCases -t
> > >> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
> > >>
> > >> •    3 times configuring streams in this Test:
> > >> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> > >> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
> > >> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> > >>
> > >> •    when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
> > >>
> > > The only code path I see that could lead to descriptor_ not being
> > > cleared is flush() being called and then stop() called just after
> > > possibily as the result of a new configuration
> > >
> > > void CameraDevice::flush()
> > > {
> > >       {
> > >               MutexLocker stateLock(stateMutex_);
> > >               if (state_ != State::Running)
> > >                       return;
> > >
> > >               state_ = State::Flushing;
> > >       }
> > >
> > >       camera_->stop();
> > >
> > >       MutexLocker stateLock(stateMutex_);
> > >       state_ = State::Stopped;
> > > }
> > >
> > > void CameraDevice::stop()
> > > {
> > >       MutexLocker stateLock(stateMutex_);
> > >       if (state_ == State::Stopped)
> > >               return;
> > >
> > >       camera_->stop();
> > >
> > >       {
> > >               MutexLocker descriptorsLock(descriptorsMutex_);
> > >               descriptors_ = {};
> > >       }
> > >
> > >       streams_.clear();
> > >
> > >       state_ = State::Stopped;
> > > }
> > >
> > > int CameraDevice::configureStreams(camera3_stream_configuration_t
> > > *stream_list) {
> > >       /* Before any configuration attempt, stop the camera. */
> > >       stop();
> > >
> > >          ...
> > >
> > > }
> > >
> > > Looking at the flush() implementation right now I wonder why we stop
> > > the camera without the stateMutex_ held (the only reason I see is
> > > because it can be concurrent with processCaptureRequest() which keeps
> > > the mutex held), but I certainly don't want to touch that code right
> > > now without a good reason.
> > >
> > > All in all, I think there's indeed a path that could lead to the
> > > descriptor not being cleaned up, and we can't do it in flush() because
> > > a concurrent call to processCaptureRequest() might be using the
> > > descriptors to complete the in-flight requests with error state.
> > >
> > > Anle, does this match your understanding ? If so, this is what should
> > > be recorded in the commit message (the cause of an issue, not the
> > > symptoms)
> > >
> > >
> > >> Best Regards
> > >> Anle Pan
> > >>
> > >> -----Original Message-----
> > >> From: Umang Jain <umang.jain@ideasonboard.com>
> > >> Sent: 2024年3月5日 16:27
> > >> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
> > >> Cc: Hui Fang <hui.fang@nxp.com>
> > >> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked
> > >> issue when stopping capture
> > >>
> > >> Caution: This is an external email. Please take care when clicking
> > >> links or opening attachments. When in doubt, report the message using
> > >> the 'Report this email' button
> > >>
> > >>
> > >> Hi Anle Pan, Hui Fang
> > >>
> > >> On 04/03/24 3:05 pm, Anle Pan wrote:
> > >>> The issue occurs when stopping capture soon after starting capture.
> > >> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
> > >>> In this case, no frame get from the device, the related capture
> > >>> request has been pushed to the queue descriptors_, but the
> > >>> queuedRequests_ was still empty due to no requests will be queue to
> > >>> the device since the stream will be stopped soon,
> > >> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
> > >>
> > >> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
> > >>> so there will be no camera->requestComplete called later, then the
> > >>> descriptors_ can not pop normally, this will cause the pending if we
> > >>> want to start capture next time.
> > >>>
> > >>> To fix the issue, ensure the descriptors_ is empty after the camera
> > >>> device is stopped.
> > > Do you know precisely why a non-empty descriptors_ stalls the camera ?
> > > Does it get stuck on
> > >
> > > void CameraDevice::sendCaptureResults()
> > > {
> > >       while (!descriptors_.empty() &&
> > > !descriptors_.front()->isPending()) {
> > >
> > > ?
> >
> > We should to formalise this as an proper issue, after getting more information.
> > >>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> > > Tested with CTS, no regressions detected
> > >
> > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > >
> > >>> ---
> > >>>    src/android/camera_device.cpp | 5 ++++-
> > >>>    1 file changed, 4 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/src/android/camera_device.cpp
> > >>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
> > >>> --- a/src/android/camera_device.cpp
> > >>> +++ b/src/android/camera_device.cpp
> > >>> @@ -433,8 +433,11 @@ void CameraDevice::flush()
> > >>>    void CameraDevice::stop()
> > >>>    {
> > >>>        MutexLocker stateLock(stateMutex_);
> > >>> -     if (state_ == State::Stopped)
> > >>> +     if (state_ == State::Stopped) {
> > >>> +             MutexLocker descriptorsLock(descriptorsMutex_);
> > >>> +             descriptors_ = {};
> > >>>                return;
> > >>> +     }
> >
> > I am not comfortable with this patch. Consider my points below
> >
> > - First we don't have a exact description of the issue it is fixing
> >
> > - Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_...
> >
> > - If we are accepting this, why not just let the Camera::stop() fall through and remove
> >
> >              if (state_ == State::Stopped)
> >                  return;
> >
> > as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer.
> >
> > I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :(
> > >>>        camera_->stop();
> > >>>
> >
Fang Hui March 26, 2024, 8:51 a.m. UTC | #10
Hi, Jacopo
  We (Anle and me) are ok with your proposition, please help refine and apply, thanks!

BRs,
Fang Hui
Jacopo Mondi March 26, 2024, 8:56 a.m. UTC | #11
Hi Umang,
  sorry, I missed your reply

On Mon, Mar 25, 2024 at 01:13:37PM +0530, Umang Jain wrote:
> Hi Jacopo,
>
> Thanks for testing,
>
> On 22/03/24 4:06 pm, Jacopo Mondi wrote:
> > Hello
> >
> > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
> > > Hi Umang Jain,
> > >
> > > this issue was random(around 1/10), and can be reproduced by single run below cts test:
> > > 	./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
> > >
> > > •	3 times configuring streams in this Test:
> > > 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> > > 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
> > > 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
> > >
> > > •	when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
> > >
> > The only code path I see that could lead to descriptor_ not being
> > cleared is flush() being called and then stop() called just after
> > possibily as the result of a new configuration
> >
> > void CameraDevice::flush()
> > {
> > 	{
> > 		MutexLocker stateLock(stateMutex_);
> > 		if (state_ != State::Running)
> > 			return;
> >
> > 		state_ = State::Flushing;
> > 	}
> >
> > 	camera_->stop();
> >
> > 	MutexLocker stateLock(stateMutex_);
> > 	state_ = State::Stopped;
> > }
> >
> > void CameraDevice::stop()
> > {
> > 	MutexLocker stateLock(stateMutex_);
> > 	if (state_ == State::Stopped)
> > 		return;
> >
> > 	camera_->stop();
> >
> > 	{
> > 		MutexLocker descriptorsLock(descriptorsMutex_);
> > 		descriptors_ = {};
> > 	}
> >
> > 	streams_.clear();
> >
> > 	state_ = State::Stopped;
> > }
> >
> > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > {
> > 	/* Before any configuration attempt, stop the camera. */
> > 	stop();
> >
> >          ...
> >
> > }
> >
> > Looking at the flush() implementation right now I wonder why we stop
> > the camera without the stateMutex_ held (the only reason I see is
> > because it can be concurrent with processCaptureRequest() which keeps
> > the mutex held), but I certainly don't want to touch that code right
> > now without a good reason.
> >
> > All in all, I think there's indeed a path that could lead to the
> > descriptor not being cleaned up, and we can't do it in flush() because
> > a concurrent call to processCaptureRequest() might be using the
> > descriptors to complete the in-flight requests with error state.
> >
> > Anle, does this match your understanding ? If so, this is what should
> > be recorded in the commit message (the cause of an issue, not the
> > symptoms)
> >
> >
> > > Best Regards
> > > Anle Pan
> > >
> > > -----Original Message-----
> > > From: Umang Jain <umang.jain@ideasonboard.com>
> > > Sent: 2024年3月5日 16:27
> > > To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
> > > Cc: Hui Fang <hui.fang@nxp.com>
> > > Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
> > >
> > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
> > >
> > >
> > > Hi Anle Pan, Hui Fang
> > >
> > > On 04/03/24 3:05 pm, Anle Pan wrote:
> > > > The issue occurs when stopping capture soon after starting capture.
> > > Can you describe the issue/use case in more detail? It's really hard to infer from one line.
> > > > In this case, no frame get from the device, the related capture
> > > > request has been pushed to the queue descriptors_, but the
> > > > queuedRequests_ was still empty due to no requests will be queue to
> > > > the device since the stream will be stopped soon,
> > > I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
> > >
> > > How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
> > > > so there will be no camera->requestComplete called later, then the
> > > > descriptors_ can not pop normally, this will cause the pending if we
> > > > want to start capture next time.
> > > >
> > > > To fix the issue, ensure the descriptors_ is empty after the camera
> > > > device is stopped.
> > Do you know precisely why a non-empty descriptors_ stalls the camera ?
> > Does it get stuck on
> >
> > void CameraDevice::sendCaptureResults()
> > {
> > 	while (!descriptors_.empty() && !descriptors_.front()->isPending()) {
> >
> > ?
>
> We should to formalise this as an proper issue, after getting more
> information.
> > > > Signed-off-by: Anle Pan <anle.pan@nxp.com>
> > Tested with CTS, no regressions detected
> >
> > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> >
> > > > ---
> > > >    src/android/camera_device.cpp | 5 ++++-
> > > >    1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/src/android/camera_device.cpp
> > > > b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
> > > > --- a/src/android/camera_device.cpp
> > > > +++ b/src/android/camera_device.cpp
> > > > @@ -433,8 +433,11 @@ void CameraDevice::flush()
> > > >    void CameraDevice::stop()
> > > >    {
> > > >        MutexLocker stateLock(stateMutex_);
> > > > -     if (state_ == State::Stopped)
> > > > +     if (state_ == State::Stopped) {
> > > > +             MutexLocker descriptorsLock(descriptorsMutex_);
> > > > +             descriptors_ = {};
> > > >                return;
> > > > +     }
>
> I am not comfortable with this patch. Consider my points below
>
> - First we don't have a exact description of the issue it is fixing

No we do, don't we ?

>
> - Second, if the camera state is ::Stopped already. Clears descriptors_
> doesn't look a good idea to me. The function which has set the camera
> state::Stopped already, should be ideally be emptying the descriptors_...
>

We can't clear descriptors_ in flush() because a concurrent
processCaptureRequests() might be dealing with descriptors_ (and
complete it if state == Stopped, so no need to forcefully clear it)

> - If we are accepting this, why not just let the Camera::stop() fall through
> and remove
>
>             if (state_ == State::Stopped)
>                 return;
>
> as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a
> no-op when called even when State::Stopped and responsible to clean up
> queues(and stuff) for the HAL layer.

This might be worth experimenting with indeed!

CameraDevice::stop() should then look like:

        void CameraDevice::stop()
        {
                MutexLocker stateLock(stateMutex_);

                camera_->stop();

                {
                        MutexLocker descriptorsLock(descriptorsMutex_);
                        descriptors_ = {};
                }

                streams_.clear();

                state_ = State::Stopped;
        }

But before testing it further, do we agree the issue to fix is real ?

>
> I should ideally back up my thoughts with some testing, but I don't have any
> kind of CTS testing available to me right now :(
> > > >        camera_->stop();
> > > >
>
Umang Jain March 26, 2024, 12:53 p.m. UTC | #12
Hi Jacopo

On 26/03/24 2:26 pm, Jacopo Mondi wrote:
> Hi Umang,
>    sorry, I missed your reply
>
> On Mon, Mar 25, 2024 at 01:13:37PM +0530, Umang Jain wrote:
>> Hi Jacopo,
>>
>> Thanks for testing,
>>
>> On 22/03/24 4:06 pm, Jacopo Mondi wrote:
>>> Hello
>>>
>>> On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
>>>> Hi Umang Jain,
>>>>
>>>> this issue was random(around 1/10), and can be reproduced by single run below cts test:
>>>> 	./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
>>>>
>>>> •	3 times configuring streams in this Test:
>>>> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>>>> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
>>>> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>>>>
>>>> •	when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
>>>>
>>> The only code path I see that could lead to descriptor_ not being
>>> cleared is flush() being called and then stop() called just after
>>> possibily as the result of a new configuration
>>>
>>> void CameraDevice::flush()
>>> {
>>> 	{
>>> 		MutexLocker stateLock(stateMutex_);
>>> 		if (state_ != State::Running)
>>> 			return;
>>>
>>> 		state_ = State::Flushing;
>>> 	}
>>>
>>> 	camera_->stop();
>>>
>>> 	MutexLocker stateLock(stateMutex_);
>>> 	state_ = State::Stopped;
>>> }
>>>
>>> void CameraDevice::stop()
>>> {
>>> 	MutexLocker stateLock(stateMutex_);
>>> 	if (state_ == State::Stopped)
>>> 		return;
>>>
>>> 	camera_->stop();
>>>
>>> 	{
>>> 		MutexLocker descriptorsLock(descriptorsMutex_);
>>> 		descriptors_ = {};
>>> 	}
>>>
>>> 	streams_.clear();
>>>
>>> 	state_ = State::Stopped;
>>> }
>>>
>>> int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>>> {
>>> 	/* Before any configuration attempt, stop the camera. */
>>> 	stop();
>>>
>>>           ...
>>>
>>> }
>>>
>>> Looking at the flush() implementation right now I wonder why we stop
>>> the camera without the stateMutex_ held (the only reason I see is
>>> because it can be concurrent with processCaptureRequest() which keeps
>>> the mutex held), but I certainly don't want to touch that code right
>>> now without a good reason.
>>>
>>> All in all, I think there's indeed a path that could lead to the
>>> descriptor not being cleaned up, and we can't do it in flush() because
>>> a concurrent call to processCaptureRequest() might be using the
>>> descriptors to complete the in-flight requests with error state.
>>>
>>> Anle, does this match your understanding ? If so, this is what should
>>> be recorded in the commit message (the cause of an issue, not the
>>> symptoms)
>>>
>>>
>>>> Best Regards
>>>> Anle Pan
>>>>
>>>> -----Original Message-----
>>>> From: Umang Jain <umang.jain@ideasonboard.com>
>>>> Sent: 2024年3月5日 16:27
>>>> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org
>>>> Cc: Hui Fang <hui.fang@nxp.com>
>>>> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture
>>>>
>>>> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
>>>>
>>>>
>>>> Hi Anle Pan, Hui Fang
>>>>
>>>> On 04/03/24 3:05 pm, Anle Pan wrote:
>>>>> The issue occurs when stopping capture soon after starting capture.
>>>> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
>>>>> In this case, no frame get from the device, the related capture
>>>>> request has been pushed to the queue descriptors_, but the
>>>>> queuedRequests_ was still empty due to no requests will be queue to
>>>>> the device since the stream will be stopped soon,
>>>> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
>>>>
>>>> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
>>>>> so there will be no camera->requestComplete called later, then the
>>>>> descriptors_ can not pop normally, this will cause the pending if we
>>>>> want to start capture next time.
>>>>>
>>>>> To fix the issue, ensure the descriptors_ is empty after the camera
>>>>> device is stopped.
>>> Do you know precisely why a non-empty descriptors_ stalls the camera ?
>>> Does it get stuck on
>>>
>>> void CameraDevice::sendCaptureResults()
>>> {
>>> 	while (!descriptors_.empty() && !descriptors_.front()->isPending()) {
>>>
>>> ?
>> We should to formalise this as an proper issue, after getting more
>> information.
>>>>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
>>> Tested with CTS, no regressions detected
>>>
>>> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>>> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>>>
>>>>> ---
>>>>>     src/android/camera_device.cpp | 5 ++++-
>>>>>     1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/src/android/camera_device.cpp
>>>>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
>>>>> --- a/src/android/camera_device.cpp
>>>>> +++ b/src/android/camera_device.cpp
>>>>> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>>>>>     void CameraDevice::stop()
>>>>>     {
>>>>>         MutexLocker stateLock(stateMutex_);
>>>>> -     if (state_ == State::Stopped)
>>>>> +     if (state_ == State::Stopped) {
>>>>> +             MutexLocker descriptorsLock(descriptorsMutex_);
>>>>> +             descriptors_ = {};
>>>>>                 return;
>>>>> +     }
>> I am not comfortable with this patch. Consider my points below
>>
>> - First we don't have a exact description of the issue it is fixing
> No we do, don't we ?

ack, i re-read ...
>
>> - Second, if the camera state is ::Stopped already. Clears descriptors_
>> doesn't look a good idea to me. The function which has set the camera
>> state::Stopped already, should be ideally be emptying the descriptors_...
>>
> We can't clear descriptors_ in flush() because a concurrent
> processCaptureRequests() might be dealing with descriptors_ (and
> complete it if state == Stopped, so no need to forcefully clear it)

Yea, I get the idea that we can't clear the descriptors_ there.
>
>> - If we are accepting this, why not just let the Camera::stop() fall through
>> and remove
>>
>>              if (state_ == State::Stopped)
>>                  return;
>>
>> as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a
>> no-op when called even when State::Stopped and responsible to clean up
>> queues(and stuff) for the HAL layer.
> This might be worth experimenting with indeed!

I would like myself to test the implementation of flush() and stop() at 
the same time probably.. because currently flush is calling the 
camera_->stop() directly.
>
> CameraDevice::stop() should then look like:
>
>          void CameraDevice::stop()
>          {
>                  MutexLocker stateLock(stateMutex_);
>
>                  camera_->stop();
>
>                  {
>                          MutexLocker descriptorsLock(descriptorsMutex_);
>                          descriptors_ = {};
>                  }
>
>                  streams_.clear();
>
>                  state_ = State::Stopped;
>          }
>
> But before testing it further, do we agree the issue to fix is real ?

Agreed.
Thanks for handling this!

>
>> I should ideally back up my thoughts with some testing, but I don't have any
>> kind of CTS testing available to me right now :(
>>>>>         camera_->stop();
>>>>>

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index 25cedd44..d452992d 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -433,8 +433,11 @@  void CameraDevice::flush()
 void CameraDevice::stop()
 {
 	MutexLocker stateLock(stateMutex_);
-	if (state_ == State::Stopped)
+	if (state_ == State::Stopped) {
+		MutexLocker descriptorsLock(descriptorsMutex_);
+		descriptors_ = {};
 		return;
+	}
 
 	camera_->stop();