Message ID | 20240304093539.546973-1-anle.pan@nxp.com |
---|---|
State | Superseded |
Headers | show |
Series |
|
Related | show |
Hi Anle Pan, Hui Fang On 04/03/24 3:05 pm, Anle Pan wrote: > The issue occurs when stopping capture soon after starting capture. Can you describe the issue/use case in more detail? It's really hard to infer from one line. > > In this case, no frame get from the device, the > related capture request has been pushed to the > queue descriptors_, but the queuedRequests_ was > still empty due to no requests will be queue to > the device since the stream will be stopped soon, I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? > so there will be no camera->requestComplete called > later, then the descriptors_ can not pop normally, > this will cause the pending if we want to start capture next time. > > To fix the issue, ensure the descriptors_ is > empty after the camera device is stopped. > > Signed-off-by: Anle Pan <anle.pan@nxp.com> > --- > src/android/camera_device.cpp | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > index 25cedd44..d452992d 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -433,8 +433,11 @@ void CameraDevice::flush() > void CameraDevice::stop() > { > MutexLocker stateLock(stateMutex_); > - if (state_ == State::Stopped) > + if (state_ == State::Stopped) { > + MutexLocker descriptorsLock(descriptorsMutex_); > + descriptors_ = {}; > return; > + } > > camera_->stop(); >
Hi Umang Jain, this issue was random(around 1/10), and can be reproduced by single run below cts test: ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 • 3 times configuring streams in this Test: 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. Best Regards Anle Pan -----Original Message----- From: Umang Jain <umang.jain@ideasonboard.com> Sent: 2024年3月5日 16:27 To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org Cc: Hui Fang <hui.fang@nxp.com> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button Hi Anle Pan, Hui Fang On 04/03/24 3:05 pm, Anle Pan wrote: > The issue occurs when stopping capture soon after starting capture. Can you describe the issue/use case in more detail? It's really hard to infer from one line. > > In this case, no frame get from the device, the related capture > request has been pushed to the queue descriptors_, but the > queuedRequests_ was still empty due to no requests will be queue to > the device since the stream will be stopped soon, I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? > so there will be no camera->requestComplete called later, then the > descriptors_ can not pop normally, this will cause the pending if we > want to start capture next time. > > To fix the issue, ensure the descriptors_ is empty after the camera > device is stopped. > > Signed-off-by: Anle Pan <anle.pan@nxp.com> > --- > src/android/camera_device.cpp | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/src/android/camera_device.cpp > b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -433,8 +433,11 @@ void CameraDevice::flush() > void CameraDevice::stop() > { > MutexLocker stateLock(stateMutex_); > - if (state_ == State::Stopped) > + if (state_ == State::Stopped) { > + MutexLocker descriptorsLock(descriptorsMutex_); > + descriptors_ = {}; > return; > + } > > camera_->stop(); >
Hello On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: > Hi Umang Jain, > > this issue was random(around 1/10), and can be reproduced by single run below cts test: > ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 > > • 3 times configuring streams in this Test: > 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time > 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > > • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. > The only code path I see that could lead to descriptor_ not being cleared is flush() being called and then stop() called just after possibily as the result of a new configuration void CameraDevice::flush() { { MutexLocker stateLock(stateMutex_); if (state_ != State::Running) return; state_ = State::Flushing; } camera_->stop(); MutexLocker stateLock(stateMutex_); state_ = State::Stopped; } void CameraDevice::stop() { MutexLocker stateLock(stateMutex_); if (state_ == State::Stopped) return; camera_->stop(); { MutexLocker descriptorsLock(descriptorsMutex_); descriptors_ = {}; } streams_.clear(); state_ = State::Stopped; } int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) { /* Before any configuration attempt, stop the camera. */ stop(); ... } Looking at the flush() implementation right now I wonder why we stop the camera without the stateMutex_ held (the only reason I see is because it can be concurrent with processCaptureRequest() which keeps the mutex held), but I certainly don't want to touch that code right now without a good reason. All in all, I think there's indeed a path that could lead to the descriptor not being cleaned up, and we can't do it in flush() because a concurrent call to processCaptureRequest() might be using the descriptors to complete the in-flight requests with error state. Anle, does this match your understanding ? If so, this is what should be recorded in the commit message (the cause of an issue, not the symptoms) > Best Regards > Anle Pan > > -----Original Message----- > From: Umang Jain <umang.jain@ideasonboard.com> > Sent: 2024年3月5日 16:27 > To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org > Cc: Hui Fang <hui.fang@nxp.com> > Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button > > > Hi Anle Pan, Hui Fang > > On 04/03/24 3:05 pm, Anle Pan wrote: > > The issue occurs when stopping capture soon after starting capture. > > Can you describe the issue/use case in more detail? It's really hard to infer from one line. > > > > In this case, no frame get from the device, the related capture > > request has been pushed to the queue descriptors_, but the > > queuedRequests_ was still empty due to no requests will be queue to > > the device since the stream will be stopped soon, > > I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. > > How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? > > so there will be no camera->requestComplete called later, then the > > descriptors_ can not pop normally, this will cause the pending if we > > want to start capture next time. > > > > To fix the issue, ensure the descriptors_ is empty after the camera > > device is stopped. Do you know precisely why a non-empty descriptors_ stalls the camera ? Does it get stuck on void CameraDevice::sendCaptureResults() { while (!descriptors_.empty() && !descriptors_.front()->isPending()) { ? > > > > Signed-off-by: Anle Pan <anle.pan@nxp.com> Tested with CTS, no regressions detected Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > --- > > src/android/camera_device.cpp | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/src/android/camera_device.cpp > > b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 > > --- a/src/android/camera_device.cpp > > +++ b/src/android/camera_device.cpp > > @@ -433,8 +433,11 @@ void CameraDevice::flush() > > void CameraDevice::stop() > > { > > MutexLocker stateLock(stateMutex_); > > - if (state_ == State::Stopped) > > + if (state_ == State::Stopped) { > > + MutexLocker descriptorsLock(descriptorsMutex_); > > + descriptors_ = {}; > > return; > > + } > > > > camera_->stop(); > > >
Hi Jacopo, Thanks for testing, On 22/03/24 4:06 pm, Jacopo Mondi wrote: > Hello > > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: >> Hi Umang Jain, >> >> this issue was random(around 1/10), and can be reproduced by single run below cts test: >> ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 >> >> • 3 times configuring streams in this Test: >> 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 >> 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time >> 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 >> >> • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. >> > The only code path I see that could lead to descriptor_ not being > cleared is flush() being called and then stop() called just after > possibily as the result of a new configuration > > void CameraDevice::flush() > { > { > MutexLocker stateLock(stateMutex_); > if (state_ != State::Running) > return; > > state_ = State::Flushing; > } > > camera_->stop(); > > MutexLocker stateLock(stateMutex_); > state_ = State::Stopped; > } > > void CameraDevice::stop() > { > MutexLocker stateLock(stateMutex_); > if (state_ == State::Stopped) > return; > > camera_->stop(); > > { > MutexLocker descriptorsLock(descriptorsMutex_); > descriptors_ = {}; > } > > streams_.clear(); > > state_ = State::Stopped; > } > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > { > /* Before any configuration attempt, stop the camera. */ > stop(); > > ... > > } > > Looking at the flush() implementation right now I wonder why we stop > the camera without the stateMutex_ held (the only reason I see is > because it can be concurrent with processCaptureRequest() which keeps > the mutex held), but I certainly don't want to touch that code right > now without a good reason. > > All in all, I think there's indeed a path that could lead to the > descriptor not being cleaned up, and we can't do it in flush() because > a concurrent call to processCaptureRequest() might be using the > descriptors to complete the in-flight requests with error state. > > Anle, does this match your understanding ? If so, this is what should > be recorded in the commit message (the cause of an issue, not the > symptoms) > > >> Best Regards >> Anle Pan >> >> -----Original Message----- >> From: Umang Jain <umang.jain@ideasonboard.com> >> Sent: 2024年3月5日 16:27 >> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org >> Cc: Hui Fang <hui.fang@nxp.com> >> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture >> >> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button >> >> >> Hi Anle Pan, Hui Fang >> >> On 04/03/24 3:05 pm, Anle Pan wrote: >>> The issue occurs when stopping capture soon after starting capture. >> Can you describe the issue/use case in more detail? It's really hard to infer from one line. >>> In this case, no frame get from the device, the related capture >>> request has been pushed to the queue descriptors_, but the >>> queuedRequests_ was still empty due to no requests will be queue to >>> the device since the stream will be stopped soon, >> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. >> >> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? >>> so there will be no camera->requestComplete called later, then the >>> descriptors_ can not pop normally, this will cause the pending if we >>> want to start capture next time. >>> >>> To fix the issue, ensure the descriptors_ is empty after the camera >>> device is stopped. > Do you know precisely why a non-empty descriptors_ stalls the camera ? > Does it get stuck on > > void CameraDevice::sendCaptureResults() > { > while (!descriptors_.empty() && !descriptors_.front()->isPending()) { > > ? We should to formalise this as an proper issue, after getting more information. >>> Signed-off-by: Anle Pan <anle.pan@nxp.com> > Tested with CTS, no regressions detected > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > >>> --- >>> src/android/camera_device.cpp | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/android/camera_device.cpp >>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 >>> --- a/src/android/camera_device.cpp >>> +++ b/src/android/camera_device.cpp >>> @@ -433,8 +433,11 @@ void CameraDevice::flush() >>> void CameraDevice::stop() >>> { >>> MutexLocker stateLock(stateMutex_); >>> - if (state_ == State::Stopped) >>> + if (state_ == State::Stopped) { >>> + MutexLocker descriptorsLock(descriptorsMutex_); >>> + descriptors_ = {}; >>> return; >>> + } I am not comfortable with this patch. Consider my points below - First we don't have a exact description of the issue it is fixing - Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_... - If we are accepting this, why not just let the Camera::stop() fall through and remove if (state_ == State::Stopped) return; as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer. I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :( >>> camera_->stop(); >>>
Hi Umang, Jacopo. Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10) void CameraDevice::sendCaptureResults() { while (!descriptors_.empty() && !descriptors_.front()->isPending()) { And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() . will ask hui to help update the commit message since the "git send" configuration has issue on my side. Best Regards Anle Pan -----Original Message----- From: Umang Jain <umang.jain@ideasonboard.com> Sent: 2024年3月25日 15:44 To: Jacopo Mondi <jacopo.mondi@ideasonboard.com>; Anle Pan <anle.pan@nxp.com> Cc: libcamera-devel@lists.libcamera.org; Hui Fang <hui.fang@nxp.com> Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button Hi Jacopo, Thanks for testing, On 22/03/24 4:06 pm, Jacopo Mondi wrote: > Hello > > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: >> Hi Umang Jain, >> >> this issue was random(around 1/10), and can be reproduced by single run below cts test: >> ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m >> CtsMediaPlayerTestCases -t >> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 >> >> • 3 times configuring streams in this Test: >> 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 >> 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time >> 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 >> >> • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. >> > The only code path I see that could lead to descriptor_ not being > cleared is flush() being called and then stop() called just after > possibily as the result of a new configuration > > void CameraDevice::flush() > { > { > MutexLocker stateLock(stateMutex_); > if (state_ != State::Running) > return; > > state_ = State::Flushing; > } > > camera_->stop(); > > MutexLocker stateLock(stateMutex_); > state_ = State::Stopped; > } > > void CameraDevice::stop() > { > MutexLocker stateLock(stateMutex_); > if (state_ == State::Stopped) > return; > > camera_->stop(); > > { > MutexLocker descriptorsLock(descriptorsMutex_); > descriptors_ = {}; > } > > streams_.clear(); > > state_ = State::Stopped; > } > > int CameraDevice::configureStreams(camera3_stream_configuration_t > *stream_list) { > /* Before any configuration attempt, stop the camera. */ > stop(); > > ... > > } > > Looking at the flush() implementation right now I wonder why we stop > the camera without the stateMutex_ held (the only reason I see is > because it can be concurrent with processCaptureRequest() which keeps > the mutex held), but I certainly don't want to touch that code right > now without a good reason. > > All in all, I think there's indeed a path that could lead to the > descriptor not being cleaned up, and we can't do it in flush() because > a concurrent call to processCaptureRequest() might be using the > descriptors to complete the in-flight requests with error state. > > Anle, does this match your understanding ? If so, this is what should > be recorded in the commit message (the cause of an issue, not the > symptoms) > > >> Best Regards >> Anle Pan >> >> -----Original Message----- >> From: Umang Jain <umang.jain@ideasonboard.com> >> Sent: 2024年3月5日 16:27 >> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org >> Cc: Hui Fang <hui.fang@nxp.com> >> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked >> issue when stopping capture >> >> Caution: This is an external email. Please take care when clicking >> links or opening attachments. When in doubt, report the message using >> the 'Report this email' button >> >> >> Hi Anle Pan, Hui Fang >> >> On 04/03/24 3:05 pm, Anle Pan wrote: >>> The issue occurs when stopping capture soon after starting capture. >> Can you describe the issue/use case in more detail? It's really hard to infer from one line. >>> In this case, no frame get from the device, the related capture >>> request has been pushed to the queue descriptors_, but the >>> queuedRequests_ was still empty due to no requests will be queue to >>> the device since the stream will be stopped soon, >> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. >> >> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? >>> so there will be no camera->requestComplete called later, then the >>> descriptors_ can not pop normally, this will cause the pending if we >>> want to start capture next time. >>> >>> To fix the issue, ensure the descriptors_ is empty after the camera >>> device is stopped. > Do you know precisely why a non-empty descriptors_ stalls the camera ? > Does it get stuck on > > void CameraDevice::sendCaptureResults() > { > while (!descriptors_.empty() && > !descriptors_.front()->isPending()) { > > ? We should to formalise this as an proper issue, after getting more information. >>> Signed-off-by: Anle Pan <anle.pan@nxp.com> > Tested with CTS, no regressions detected > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > >>> --- >>> src/android/camera_device.cpp | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/android/camera_device.cpp >>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 >>> --- a/src/android/camera_device.cpp >>> +++ b/src/android/camera_device.cpp >>> @@ -433,8 +433,11 @@ void CameraDevice::flush() >>> void CameraDevice::stop() >>> { >>> MutexLocker stateLock(stateMutex_); >>> - if (state_ == State::Stopped) >>> + if (state_ == State::Stopped) { >>> + MutexLocker descriptorsLock(descriptorsMutex_); >>> + descriptors_ = {}; >>> return; >>> + } I am not comfortable with this patch. Consider my points below - First we don't have a exact description of the issue it is fixing - Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_... - If we are accepting this, why not just let the Camera::stop() fall through and remove if (state_ == State::Stopped) return; as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer. I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :( >>> camera_->stop(); >>>
@Anle Pan<mailto:anle.pan@nxp.com> No need "git send-email" again, will new another topic.
Just reply the refined patch.
Author: Anle Pan <anle.pan@nxp.com>
Date: Fri Mar 1 17:32:54 2024 +0000
android: camera_device: Fix the stuck issue in sendCaptureResults.
When flush() is being called and then a new stream configuration is set,
descriptor_ may has a chance to be not cleared, which was skipped due to
the Stopped State. This will cause a stuck in sendCaptureResults.
To fix the issue, ensure the descriptors_ is always empty when the camera
State is Stopped.
Signed-off-by: Anle Pan <anle.pan@nxp.com>
diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index d2679a97..7fda6ce6 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -433,8 +433,11 @@ void CameraDevice::flush()
void CameraDevice::stop()
{
MutexLocker stateLock(stateMutex_);
- if (state_ == State::Stopped)
+ if (state_ == State::Stopped) {
+ MutexLocker descriptorsLock(descriptorsMutex_);
+ descriptors_ = {};
return;
+ }
camera_->stop();
Hi, On Tue, Mar 26, 2024 at 03:13:05AM +0000, Hui Fang wrote: > @Anle Pan<mailto:anle.pan@nxp.com> No need "git send-email" again, will new another topic. > > Just reply the refined patch. not a problem for this patch, but as a general rule, do not reply to a patch with a new version of the same patch, just send a new one. It's easier to track and I think it is also required for patchwork to identify it correctly. > > Author: Anle Pan <anle.pan@nxp.com> > Date: Fri Mar 1 17:32:54 2024 +0000 > > android: camera_device: Fix the stuck issue in sendCaptureResults. No '.' at the end of the commit message. The actual subject should be something like android: camera_device: Always clear descriptors_ in stop() > > When flush() is being called and then a new stream configuration is set, > descriptor_ may has a chance to be not cleared, which was skipped due to descriptors_ > the Stopped State. This will cause a stuck in sendCaptureResults. due to the Camera being in Stopped state. > > To fix the issue, ensure the descriptors_ is always empty when the camera > State is Stopped. > > Signed-off-by: Anle Pan <anle.pan@nxp.com> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> With the few fixes above, I'll merge. Also, as you have sent this new version, I'll add Signed-off-by: Fang Hui <hui.fang@nxp.com> Thanks j > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > index d2679a97..7fda6ce6 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -433,8 +433,11 @@ void CameraDevice::flush() > void CameraDevice::stop() > { > MutexLocker stateLock(stateMutex_); > - if (state_ == State::Stopped) > + if (state_ == State::Stopped) { > + MutexLocker descriptorsLock(descriptorsMutex_); > + descriptors_ = {}; > return; > + } > > camera_->stop(); > ________________________________ > From: Anle Pan <anle.pan@nxp.com> > Sent: Monday, March 25, 2024 5:20 PM > To: Umang Jain <umang.jain@ideasonboard.com>; Jacopo Mondi <jacopo.mondi@ideasonboard.com> > Cc: libcamera-devel@lists.libcamera.org <libcamera-devel@lists.libcamera.org>; Hui Fang <hui.fang@nxp.com> > Subject: RE: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > Hi Umang, Jacopo. > > Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10) > > void CameraDevice::sendCaptureResults() > { > while (!descriptors_.empty() && !descriptors_.front()->isPending()) { > > And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() . > will ask hui to help update the commit message since the "git send" configuration has issue on my side. > > Best Regards > Anle Pan > > -----Original Message----- > From: Umang Jain <umang.jain@ideasonboard.com> > Sent: 2024年3月25日 15:44 > To: Jacopo Mondi <jacopo.mondi@ideasonboard.com>; Anle Pan <anle.pan@nxp.com> > Cc: libcamera-devel@lists.libcamera.org; Hui Fang <hui.fang@nxp.com> > Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button > > > Hi Jacopo, > > Thanks for testing, > > On 22/03/24 4:06 pm, Jacopo Mondi wrote: > > Hello > > > > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: > >> Hi Umang Jain, > >> > >> this issue was random(around 1/10), and can be reproduced by single run below cts test: > >> ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m > >> CtsMediaPlayerTestCases -t > >> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 > >> > >> • 3 times configuring streams in this Test: > >> 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > >> 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time > >> 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > >> > >> • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. > >> > > The only code path I see that could lead to descriptor_ not being > > cleared is flush() being called and then stop() called just after > > possibily as the result of a new configuration > > > > void CameraDevice::flush() > > { > > { > > MutexLocker stateLock(stateMutex_); > > if (state_ != State::Running) > > return; > > > > state_ = State::Flushing; > > } > > > > camera_->stop(); > > > > MutexLocker stateLock(stateMutex_); > > state_ = State::Stopped; > > } > > > > void CameraDevice::stop() > > { > > MutexLocker stateLock(stateMutex_); > > if (state_ == State::Stopped) > > return; > > > > camera_->stop(); > > > > { > > MutexLocker descriptorsLock(descriptorsMutex_); > > descriptors_ = {}; > > } > > > > streams_.clear(); > > > > state_ = State::Stopped; > > } > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t > > *stream_list) { > > /* Before any configuration attempt, stop the camera. */ > > stop(); > > > > ... > > > > } > > > > Looking at the flush() implementation right now I wonder why we stop > > the camera without the stateMutex_ held (the only reason I see is > > because it can be concurrent with processCaptureRequest() which keeps > > the mutex held), but I certainly don't want to touch that code right > > now without a good reason. > > > > All in all, I think there's indeed a path that could lead to the > > descriptor not being cleaned up, and we can't do it in flush() because > > a concurrent call to processCaptureRequest() might be using the > > descriptors to complete the in-flight requests with error state. > > > > Anle, does this match your understanding ? If so, this is what should > > be recorded in the commit message (the cause of an issue, not the > > symptoms) > > > > > >> Best Regards > >> Anle Pan > >> > >> -----Original Message----- > >> From: Umang Jain <umang.jain@ideasonboard.com> > >> Sent: 2024年3月5日 16:27 > >> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org > >> Cc: Hui Fang <hui.fang@nxp.com> > >> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked > >> issue when stopping capture > >> > >> Caution: This is an external email. Please take care when clicking > >> links or opening attachments. When in doubt, report the message using > >> the 'Report this email' button > >> > >> > >> Hi Anle Pan, Hui Fang > >> > >> On 04/03/24 3:05 pm, Anle Pan wrote: > >>> The issue occurs when stopping capture soon after starting capture. > >> Can you describe the issue/use case in more detail? It's really hard to infer from one line. > >>> In this case, no frame get from the device, the related capture > >>> request has been pushed to the queue descriptors_, but the > >>> queuedRequests_ was still empty due to no requests will be queue to > >>> the device since the stream will be stopped soon, > >> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. > >> > >> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? > >>> so there will be no camera->requestComplete called later, then the > >>> descriptors_ can not pop normally, this will cause the pending if we > >>> want to start capture next time. > >>> > >>> To fix the issue, ensure the descriptors_ is empty after the camera > >>> device is stopped. > > Do you know precisely why a non-empty descriptors_ stalls the camera ? > > Does it get stuck on > > > > void CameraDevice::sendCaptureResults() > > { > > while (!descriptors_.empty() && > > !descriptors_.front()->isPending()) { > > > > ? > > We should to formalise this as an proper issue, after getting more information. > >>> Signed-off-by: Anle Pan <anle.pan@nxp.com> > > Tested with CTS, no regressions detected > > > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > > >>> --- > >>> src/android/camera_device.cpp | 5 ++++- > >>> 1 file changed, 4 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/src/android/camera_device.cpp > >>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 > >>> --- a/src/android/camera_device.cpp > >>> +++ b/src/android/camera_device.cpp > >>> @@ -433,8 +433,11 @@ void CameraDevice::flush() > >>> void CameraDevice::stop() > >>> { > >>> MutexLocker stateLock(stateMutex_); > >>> - if (state_ == State::Stopped) > >>> + if (state_ == State::Stopped) { > >>> + MutexLocker descriptorsLock(descriptorsMutex_); > >>> + descriptors_ = {}; > >>> return; > >>> + } > > I am not comfortable with this patch. Consider my points below > > - First we don't have a exact description of the issue it is fixing > > - Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_... > > - If we are accepting this, why not just let the Camera::stop() fall through and remove > > if (state_ == State::Stopped) > return; > > as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer. > > I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :( > >>> camera_->stop(); > >>> >
Hi, Jacopo For refined patch, always use "git send-email", Even the subject changed? BRs, Fang Hui
No worries, no need to re-send. Small things can be fixed (if you're fine with what I proposed) when I'll apply the patch. On Tue, Mar 26, 2024 at 08:20:31AM +0000, Hui Fang wrote: > Hi, Jacopo > For refined patch, always use "git send-email", Even the subject changed? > > BRs, > Fang Hui > ________________________________ > From: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > Sent: Tuesday, March 26, 2024 4:08 PM > To: Hui Fang <hui.fang@nxp.com> > Cc: Anle Pan <anle.pan@nxp.com>; Umang Jain <umang.jain@ideasonboard.com>; Jacopo Mondi <jacopo.mondi@ideasonboard.com>; libcamera-devel@lists.libcamera.org <libcamera-devel@lists.libcamera.org> > Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button > > > Hi, > > On Tue, Mar 26, 2024 at 03:13:05AM +0000, Hui Fang wrote: > > @Anle Pan<mailto:anle.pan@nxp.com> No need "git send-email" again, will new another topic. > > > > Just reply the refined patch. > > not a problem for this patch, but as a general rule, do not reply to a > patch with a new version of the same patch, just send a new one. It's > easier to track and I think it is also required for patchwork to > identify it correctly. > > > > > Author: Anle Pan <anle.pan@nxp.com> > > Date: Fri Mar 1 17:32:54 2024 +0000 > > > > android: camera_device: Fix the stuck issue in sendCaptureResults. > > No '.' at the end of the commit message. > > The actual subject should be something like > > android: camera_device: Always clear descriptors_ in stop() > > > > > When flush() is being called and then a new stream configuration is set, > > descriptor_ may has a chance to be not cleared, which was skipped due to > > descriptors_ > > > the Stopped State. This will cause a stuck in sendCaptureResults. > > due to the Camera being in Stopped state. > > > > To fix the issue, ensure the descriptors_ is always empty when the camera > > State is Stopped. > > > > Signed-off-by: Anle Pan <anle.pan@nxp.com> > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > With the few fixes above, I'll merge. > > Also, as you have sent this new version, I'll add > > Signed-off-by: Fang Hui <hui.fang@nxp.com> > > Thanks > j > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > index d2679a97..7fda6ce6 100644 > > --- a/src/android/camera_device.cpp > > +++ b/src/android/camera_device.cpp > > @@ -433,8 +433,11 @@ void CameraDevice::flush() > > void CameraDevice::stop() > > { > > MutexLocker stateLock(stateMutex_); > > - if (state_ == State::Stopped) > > + if (state_ == State::Stopped) { > > + MutexLocker descriptorsLock(descriptorsMutex_); > > + descriptors_ = {}; > > return; > > + } > > > > camera_->stop(); > > ________________________________ > > From: Anle Pan <anle.pan@nxp.com> > > Sent: Monday, March 25, 2024 5:20 PM > > To: Umang Jain <umang.jain@ideasonboard.com>; Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > Cc: libcamera-devel@lists.libcamera.org <libcamera-devel@lists.libcamera.org>; Hui Fang <hui.fang@nxp.com> > > Subject: RE: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > > > Hi Umang, Jacopo. > > > > Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10) > > > > void CameraDevice::sendCaptureResults() > > { > > while (!descriptors_.empty() && !descriptors_.front()->isPending()) { > > > > And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() . > > will ask hui to help update the commit message since the "git send" configuration has issue on my side. > > > > Best Regards > > Anle Pan > > > > -----Original Message----- > > From: Umang Jain <umang.jain@ideasonboard.com> > > Sent: 2024年3月25日 15:44 > > To: Jacopo Mondi <jacopo.mondi@ideasonboard.com>; Anle Pan <anle.pan@nxp.com> > > Cc: libcamera-devel@lists.libcamera.org; Hui Fang <hui.fang@nxp.com> > > Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button > > > > > > Hi Jacopo, > > > > Thanks for testing, > > > > On 22/03/24 4:06 pm, Jacopo Mondi wrote: > > > Hello > > > > > > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: > > >> Hi Umang Jain, > > >> > > >> this issue was random(around 1/10), and can be reproduced by single run below cts test: > > >> ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m > > >> CtsMediaPlayerTestCases -t > > >> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 > > >> > > >> • 3 times configuring streams in this Test: > > >> 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > > >> 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time > > >> 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > > >> > > >> • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. > > >> > > > The only code path I see that could lead to descriptor_ not being > > > cleared is flush() being called and then stop() called just after > > > possibily as the result of a new configuration > > > > > > void CameraDevice::flush() > > > { > > > { > > > MutexLocker stateLock(stateMutex_); > > > if (state_ != State::Running) > > > return; > > > > > > state_ = State::Flushing; > > > } > > > > > > camera_->stop(); > > > > > > MutexLocker stateLock(stateMutex_); > > > state_ = State::Stopped; > > > } > > > > > > void CameraDevice::stop() > > > { > > > MutexLocker stateLock(stateMutex_); > > > if (state_ == State::Stopped) > > > return; > > > > > > camera_->stop(); > > > > > > { > > > MutexLocker descriptorsLock(descriptorsMutex_); > > > descriptors_ = {}; > > > } > > > > > > streams_.clear(); > > > > > > state_ = State::Stopped; > > > } > > > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t > > > *stream_list) { > > > /* Before any configuration attempt, stop the camera. */ > > > stop(); > > > > > > ... > > > > > > } > > > > > > Looking at the flush() implementation right now I wonder why we stop > > > the camera without the stateMutex_ held (the only reason I see is > > > because it can be concurrent with processCaptureRequest() which keeps > > > the mutex held), but I certainly don't want to touch that code right > > > now without a good reason. > > > > > > All in all, I think there's indeed a path that could lead to the > > > descriptor not being cleaned up, and we can't do it in flush() because > > > a concurrent call to processCaptureRequest() might be using the > > > descriptors to complete the in-flight requests with error state. > > > > > > Anle, does this match your understanding ? If so, this is what should > > > be recorded in the commit message (the cause of an issue, not the > > > symptoms) > > > > > > > > >> Best Regards > > >> Anle Pan > > >> > > >> -----Original Message----- > > >> From: Umang Jain <umang.jain@ideasonboard.com> > > >> Sent: 2024年3月5日 16:27 > > >> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org > > >> Cc: Hui Fang <hui.fang@nxp.com> > > >> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked > > >> issue when stopping capture > > >> > > >> Caution: This is an external email. Please take care when clicking > > >> links or opening attachments. When in doubt, report the message using > > >> the 'Report this email' button > > >> > > >> > > >> Hi Anle Pan, Hui Fang > > >> > > >> On 04/03/24 3:05 pm, Anle Pan wrote: > > >>> The issue occurs when stopping capture soon after starting capture. > > >> Can you describe the issue/use case in more detail? It's really hard to infer from one line. > > >>> In this case, no frame get from the device, the related capture > > >>> request has been pushed to the queue descriptors_, but the > > >>> queuedRequests_ was still empty due to no requests will be queue to > > >>> the device since the stream will be stopped soon, > > >> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. > > >> > > >> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? > > >>> so there will be no camera->requestComplete called later, then the > > >>> descriptors_ can not pop normally, this will cause the pending if we > > >>> want to start capture next time. > > >>> > > >>> To fix the issue, ensure the descriptors_ is empty after the camera > > >>> device is stopped. > > > Do you know precisely why a non-empty descriptors_ stalls the camera ? > > > Does it get stuck on > > > > > > void CameraDevice::sendCaptureResults() > > > { > > > while (!descriptors_.empty() && > > > !descriptors_.front()->isPending()) { > > > > > > ? > > > > We should to formalise this as an proper issue, after getting more information. > > >>> Signed-off-by: Anle Pan <anle.pan@nxp.com> > > > Tested with CTS, no regressions detected > > > > > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > > > > >>> --- > > >>> src/android/camera_device.cpp | 5 ++++- > > >>> 1 file changed, 4 insertions(+), 1 deletion(-) > > >>> > > >>> diff --git a/src/android/camera_device.cpp > > >>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 > > >>> --- a/src/android/camera_device.cpp > > >>> +++ b/src/android/camera_device.cpp > > >>> @@ -433,8 +433,11 @@ void CameraDevice::flush() > > >>> void CameraDevice::stop() > > >>> { > > >>> MutexLocker stateLock(stateMutex_); > > >>> - if (state_ == State::Stopped) > > >>> + if (state_ == State::Stopped) { > > >>> + MutexLocker descriptorsLock(descriptorsMutex_); > > >>> + descriptors_ = {}; > > >>> return; > > >>> + } > > > > I am not comfortable with this patch. Consider my points below > > > > - First we don't have a exact description of the issue it is fixing > > > > - Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_... > > > > - If we are accepting this, why not just let the Camera::stop() fall through and remove > > > > if (state_ == State::Stopped) > > return; > > > > as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer. > > > > I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :( > > >>> camera_->stop(); > > >>> > >
Hi, Jacopo We (Anle and me) are ok with your proposition, please help refine and apply, thanks! BRs, Fang Hui
Hi Umang, sorry, I missed your reply On Mon, Mar 25, 2024 at 01:13:37PM +0530, Umang Jain wrote: > Hi Jacopo, > > Thanks for testing, > > On 22/03/24 4:06 pm, Jacopo Mondi wrote: > > Hello > > > > On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: > > > Hi Umang Jain, > > > > > > this issue was random(around 1/10), and can be reproduced by single run below cts test: > > > ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 > > > > > > • 3 times configuring streams in this Test: > > > 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > > > 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time > > > 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 > > > > > > • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. > > > > > The only code path I see that could lead to descriptor_ not being > > cleared is flush() being called and then stop() called just after > > possibily as the result of a new configuration > > > > void CameraDevice::flush() > > { > > { > > MutexLocker stateLock(stateMutex_); > > if (state_ != State::Running) > > return; > > > > state_ = State::Flushing; > > } > > > > camera_->stop(); > > > > MutexLocker stateLock(stateMutex_); > > state_ = State::Stopped; > > } > > > > void CameraDevice::stop() > > { > > MutexLocker stateLock(stateMutex_); > > if (state_ == State::Stopped) > > return; > > > > camera_->stop(); > > > > { > > MutexLocker descriptorsLock(descriptorsMutex_); > > descriptors_ = {}; > > } > > > > streams_.clear(); > > > > state_ = State::Stopped; > > } > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > { > > /* Before any configuration attempt, stop the camera. */ > > stop(); > > > > ... > > > > } > > > > Looking at the flush() implementation right now I wonder why we stop > > the camera without the stateMutex_ held (the only reason I see is > > because it can be concurrent with processCaptureRequest() which keeps > > the mutex held), but I certainly don't want to touch that code right > > now without a good reason. > > > > All in all, I think there's indeed a path that could lead to the > > descriptor not being cleaned up, and we can't do it in flush() because > > a concurrent call to processCaptureRequest() might be using the > > descriptors to complete the in-flight requests with error state. > > > > Anle, does this match your understanding ? If so, this is what should > > be recorded in the commit message (the cause of an issue, not the > > symptoms) > > > > > > > Best Regards > > > Anle Pan > > > > > > -----Original Message----- > > > From: Umang Jain <umang.jain@ideasonboard.com> > > > Sent: 2024年3月5日 16:27 > > > To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org > > > Cc: Hui Fang <hui.fang@nxp.com> > > > Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture > > > > > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button > > > > > > > > > Hi Anle Pan, Hui Fang > > > > > > On 04/03/24 3:05 pm, Anle Pan wrote: > > > > The issue occurs when stopping capture soon after starting capture. > > > Can you describe the issue/use case in more detail? It's really hard to infer from one line. > > > > In this case, no frame get from the device, the related capture > > > > request has been pushed to the queue descriptors_, but the > > > > queuedRequests_ was still empty due to no requests will be queue to > > > > the device since the stream will be stopped soon, > > > I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. > > > > > > How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? > > > > so there will be no camera->requestComplete called later, then the > > > > descriptors_ can not pop normally, this will cause the pending if we > > > > want to start capture next time. > > > > > > > > To fix the issue, ensure the descriptors_ is empty after the camera > > > > device is stopped. > > Do you know precisely why a non-empty descriptors_ stalls the camera ? > > Does it get stuck on > > > > void CameraDevice::sendCaptureResults() > > { > > while (!descriptors_.empty() && !descriptors_.front()->isPending()) { > > > > ? > > We should to formalise this as an proper issue, after getting more > information. > > > > Signed-off-by: Anle Pan <anle.pan@nxp.com> > > Tested with CTS, no regressions detected > > > > Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> > > > > > > --- > > > > src/android/camera_device.cpp | 5 ++++- > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/src/android/camera_device.cpp > > > > b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 > > > > --- a/src/android/camera_device.cpp > > > > +++ b/src/android/camera_device.cpp > > > > @@ -433,8 +433,11 @@ void CameraDevice::flush() > > > > void CameraDevice::stop() > > > > { > > > > MutexLocker stateLock(stateMutex_); > > > > - if (state_ == State::Stopped) > > > > + if (state_ == State::Stopped) { > > > > + MutexLocker descriptorsLock(descriptorsMutex_); > > > > + descriptors_ = {}; > > > > return; > > > > + } > > I am not comfortable with this patch. Consider my points below > > - First we don't have a exact description of the issue it is fixing No we do, don't we ? > > - Second, if the camera state is ::Stopped already. Clears descriptors_ > doesn't look a good idea to me. The function which has set the camera > state::Stopped already, should be ideally be emptying the descriptors_... > We can't clear descriptors_ in flush() because a concurrent processCaptureRequests() might be dealing with descriptors_ (and complete it if state == Stopped, so no need to forcefully clear it) > - If we are accepting this, why not just let the Camera::stop() fall through > and remove > > if (state_ == State::Stopped) > return; > > as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a > no-op when called even when State::Stopped and responsible to clean up > queues(and stuff) for the HAL layer. This might be worth experimenting with indeed! CameraDevice::stop() should then look like: void CameraDevice::stop() { MutexLocker stateLock(stateMutex_); camera_->stop(); { MutexLocker descriptorsLock(descriptorsMutex_); descriptors_ = {}; } streams_.clear(); state_ = State::Stopped; } But before testing it further, do we agree the issue to fix is real ? > > I should ideally back up my thoughts with some testing, but I don't have any > kind of CTS testing available to me right now :( > > > > camera_->stop(); > > > > >
Hi Jacopo On 26/03/24 2:26 pm, Jacopo Mondi wrote: > Hi Umang, > sorry, I missed your reply > > On Mon, Mar 25, 2024 at 01:13:37PM +0530, Umang Jain wrote: >> Hi Jacopo, >> >> Thanks for testing, >> >> On 22/03/24 4:06 pm, Jacopo Mondi wrote: >>> Hello >>> >>> On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote: >>>> Hi Umang Jain, >>>> >>>> this issue was random(around 1/10), and can be reproduced by single run below cts test: >>>> ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m CtsMediaPlayerTestCases -t android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90 >>>> >>>> • 3 times configuring streams in this Test: >>>> 1) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 >>>> 2) 3 streams: (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time >>>> 3) 2 streams: (0) 320x240-NV12 (1) 1280x720-YUYV: capture several frames on stream1 >>>> >>>> • when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch. >>>> >>> The only code path I see that could lead to descriptor_ not being >>> cleared is flush() being called and then stop() called just after >>> possibily as the result of a new configuration >>> >>> void CameraDevice::flush() >>> { >>> { >>> MutexLocker stateLock(stateMutex_); >>> if (state_ != State::Running) >>> return; >>> >>> state_ = State::Flushing; >>> } >>> >>> camera_->stop(); >>> >>> MutexLocker stateLock(stateMutex_); >>> state_ = State::Stopped; >>> } >>> >>> void CameraDevice::stop() >>> { >>> MutexLocker stateLock(stateMutex_); >>> if (state_ == State::Stopped) >>> return; >>> >>> camera_->stop(); >>> >>> { >>> MutexLocker descriptorsLock(descriptorsMutex_); >>> descriptors_ = {}; >>> } >>> >>> streams_.clear(); >>> >>> state_ = State::Stopped; >>> } >>> >>> int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) >>> { >>> /* Before any configuration attempt, stop the camera. */ >>> stop(); >>> >>> ... >>> >>> } >>> >>> Looking at the flush() implementation right now I wonder why we stop >>> the camera without the stateMutex_ held (the only reason I see is >>> because it can be concurrent with processCaptureRequest() which keeps >>> the mutex held), but I certainly don't want to touch that code right >>> now without a good reason. >>> >>> All in all, I think there's indeed a path that could lead to the >>> descriptor not being cleaned up, and we can't do it in flush() because >>> a concurrent call to processCaptureRequest() might be using the >>> descriptors to complete the in-flight requests with error state. >>> >>> Anle, does this match your understanding ? If so, this is what should >>> be recorded in the commit message (the cause of an issue, not the >>> symptoms) >>> >>> >>>> Best Regards >>>> Anle Pan >>>> >>>> -----Original Message----- >>>> From: Umang Jain <umang.jain@ideasonboard.com> >>>> Sent: 2024年3月5日 16:27 >>>> To: Anle Pan <anle.pan@nxp.com>; libcamera-devel@lists.libcamera.org >>>> Cc: Hui Fang <hui.fang@nxp.com> >>>> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture >>>> >>>> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button >>>> >>>> >>>> Hi Anle Pan, Hui Fang >>>> >>>> On 04/03/24 3:05 pm, Anle Pan wrote: >>>>> The issue occurs when stopping capture soon after starting capture. >>>> Can you describe the issue/use case in more detail? It's really hard to infer from one line. >>>>> In this case, no frame get from the device, the related capture >>>>> request has been pushed to the queue descriptors_, but the >>>>> queuedRequests_ was still empty due to no requests will be queue to >>>>> the device since the stream will be stopped soon, >>>> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera. >>>> >>>> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ? >>>>> so there will be no camera->requestComplete called later, then the >>>>> descriptors_ can not pop normally, this will cause the pending if we >>>>> want to start capture next time. >>>>> >>>>> To fix the issue, ensure the descriptors_ is empty after the camera >>>>> device is stopped. >>> Do you know precisely why a non-empty descriptors_ stalls the camera ? >>> Does it get stuck on >>> >>> void CameraDevice::sendCaptureResults() >>> { >>> while (!descriptors_.empty() && !descriptors_.front()->isPending()) { >>> >>> ? >> We should to formalise this as an proper issue, after getting more >> information. >>>>> Signed-off-by: Anle Pan <anle.pan@nxp.com> >>> Tested with CTS, no regressions detected >>> >>> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> >>> Tested-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> >>> >>>>> --- >>>>> src/android/camera_device.cpp | 5 ++++- >>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/src/android/camera_device.cpp >>>>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 >>>>> --- a/src/android/camera_device.cpp >>>>> +++ b/src/android/camera_device.cpp >>>>> @@ -433,8 +433,11 @@ void CameraDevice::flush() >>>>> void CameraDevice::stop() >>>>> { >>>>> MutexLocker stateLock(stateMutex_); >>>>> - if (state_ == State::Stopped) >>>>> + if (state_ == State::Stopped) { >>>>> + MutexLocker descriptorsLock(descriptorsMutex_); >>>>> + descriptors_ = {}; >>>>> return; >>>>> + } >> I am not comfortable with this patch. Consider my points below >> >> - First we don't have a exact description of the issue it is fixing > No we do, don't we ? ack, i re-read ... > >> - Second, if the camera state is ::Stopped already. Clears descriptors_ >> doesn't look a good idea to me. The function which has set the camera >> state::Stopped already, should be ideally be emptying the descriptors_... >> > We can't clear descriptors_ in flush() because a concurrent > processCaptureRequests() might be dealing with descriptors_ (and > complete it if state == Stopped, so no need to forcefully clear it) Yea, I get the idea that we can't clear the descriptors_ there. > >> - If we are accepting this, why not just let the Camera::stop() fall through >> and remove >> >> if (state_ == State::Stopped) >> return; >> >> as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a >> no-op when called even when State::Stopped and responsible to clean up >> queues(and stuff) for the HAL layer. > This might be worth experimenting with indeed! I would like myself to test the implementation of flush() and stop() at the same time probably.. because currently flush is calling the camera_->stop() directly. > > CameraDevice::stop() should then look like: > > void CameraDevice::stop() > { > MutexLocker stateLock(stateMutex_); > > camera_->stop(); > > { > MutexLocker descriptorsLock(descriptorsMutex_); > descriptors_ = {}; > } > > streams_.clear(); > > state_ = State::Stopped; > } > > But before testing it further, do we agree the issue to fix is real ? Agreed. Thanks for handling this! > >> I should ideally back up my thoughts with some testing, but I don't have any >> kind of CTS testing available to me right now :( >>>>> camera_->stop(); >>>>>
diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp index 25cedd44..d452992d 100644 --- a/src/android/camera_device.cpp +++ b/src/android/camera_device.cpp @@ -433,8 +433,11 @@ void CameraDevice::flush() void CameraDevice::stop() { MutexLocker stateLock(stateMutex_); - if (state_ == State::Stopped) + if (state_ == State::Stopped) { + MutexLocker descriptorsLock(descriptorsMutex_); + descriptors_ = {}; return; + } camera_->stop();
The issue occurs when stopping capture soon after starting capture. In this case, no frame get from the device, the related capture request has been pushed to the queue descriptors_, but the queuedRequests_ was still empty due to no requests will be queue to the device since the stream will be stopped soon, so there will be no camera->requestComplete called later, then the descriptors_ can not pop normally, this will cause the pending if we want to start capture next time. To fix the issue, ensure the descriptors_ is empty after the camera device is stopped. Signed-off-by: Anle Pan <anle.pan@nxp.com> --- src/android/camera_device.cpp | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)