[libcamera-devel] android: Re-order out-of-order completion path
diff mbox series

Message ID 20211014103757.21161-1-jacopo@jmondi.org
State Accepted
Headers show
Series
  • [libcamera-devel] android: Re-order out-of-order completion path
Related show

Commit Message

Jacopo Mondi Oct. 14, 2021, 10:37 a.m. UTC
When the camera HAL detects an out-of-order completion of a request, it
sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error.

Such error not only forces the service to close the camera as prescribed
by the camera3 specification, but in some implementation (specifically
the ChromeOS one) it causes the camera service to abort and exit.

This prevents any error messages to be printed by libcamera, as the
library gets terminated before getting to that point, and also hides the
printout of error messages that lead to out-of-order completion, making
it impossible to get from the output log what happened.

Move the call to notifyError() at the end of the error path and demote
the error message to LogLevels::Error from Fatal to let the service
implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors.

Before this patch, when waiting on a fence fails and the capture
request is not queued to the Camera, we get an out-of-order completion
but no backtrace. With this patch applied the error path is visible:

ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired
ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70

Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
---
 src/android/camera_device.cpp | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Laurent Pinchart Oct. 15, 2021, 2:02 a.m. UTC | #1
Hi Jacopo,

Thank you for the patch.

On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote:
> When the camera HAL detects an out-of-order completion of a request, it
> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error.
> 
> Such error not only forces the service to close the camera as prescribed
> by the camera3 specification, but in some implementation (specifically
> the ChromeOS one) it causes the camera service to abort and exit.
> 
> This prevents any error messages to be printed by libcamera, as the

s/to be printed/from being printed/

> library gets terminated before getting to that point, and also hides the
> printout of error messages that lead to out-of-order completion, making
> it impossible to get from the output log what happened.
> 
> Move the call to notifyError() at the end of the error path and demote
> the error message to LogLevels::Error from Fatal to let the service
> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors.
> 
> Before this patch, when waiting on a fence fails and the capture
> request is not queued to the Camera, we get an out-of-order completion
> but no backtrace. With this patch applied the error path is visible:
> 
> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired
> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70
> 
> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> ---
>  src/android/camera_device.cpp | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 901867105085..f60297e13c8b 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request)
>  	if (descriptor->request_->cookie() != request->cookie()) {
>  		/*
>  		 * \todo Clarify if the Camera has to be closed on
> -		 * ERROR_DEVICE and possibly demote the Fatal to simple
> -		 * Error.
> +		 * ERROR_DEVICE.
>  		 */
> -		notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
> -		LOG(HAL, Fatal)
> +		LOG(HAL, Error)
>  			<< "Out-of-order completion for request "
>  			<< utils::hex(request->cookie());
>  
>  		MutexLocker descriptorsLock(descriptorsMutex_);
>  		descriptors_.pop();
> +
> +		notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);

Given that we've removed the descriptor at the head of the queue, which
doesn't correspond to the request, from this point onwards all hell will
likely break loose, possibly even in the close() path. It's probably not
worth than a Fatal though, so

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> +
>  		return;
>  	}
>
Umang Jain Oct. 15, 2021, 3:17 a.m. UTC | #2
Hello,

On 10/15/21 7:32 AM, Laurent Pinchart wrote:
> Hi Jacopo,
>
> Thank you for the patch.
>
> On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote:
>> When the camera HAL detects an out-of-order completion of a request, it
>> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error.
>>
>> Such error not only forces the service to close the camera as prescribed
>> by the camera3 specification, but in some implementation (specifically
>> the ChromeOS one) it causes the camera service to abort and exit.
>>
>> This prevents any error messages to be printed by libcamera, as the
> s/to be printed/from being printed/
>
>> library gets terminated before getting to that point, and also hides the
>> printout of error messages that lead to out-of-order completion, making
>> it impossible to get from the output log what happened.
>>
>> Move the call to notifyError() at the end of the error path and demote
>> the error message to LogLevels::Error from Fatal to let the service
>> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors.
>>
>> Before this patch, when waiting on a fence fails and the capture
>> request is not queued to the Camera, we get an out-of-order completion
>> but no backtrace. With this patch applied the error path is visible:
>>
>> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired
>> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70
>>
>> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
>> ---
>>   src/android/camera_device.cpp | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>> index 901867105085..f60297e13c8b 100644
>> --- a/src/android/camera_device.cpp
>> +++ b/src/android/camera_device.cpp
>> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request)
>>   	if (descriptor->request_->cookie() != request->cookie()) {
>>   		/*
>>   		 * \todo Clarify if the Camera has to be closed on
>> -		 * ERROR_DEVICE and possibly demote the Fatal to simple
>> -		 * Error.
>> +		 * ERROR_DEVICE.
>>   		 */
>> -		notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
>> -		LOG(HAL, Fatal)
>> +		LOG(HAL, Error)
>>   			<< "Out-of-order completion for request "
>>   			<< utils::hex(request->cookie());
>>   
>>   		MutexLocker descriptorsLock(descriptorsMutex_);
>>   		descriptors_.pop();
>> +
>> +		notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
> Given that we've removed the descriptor at the head of the queue, which
> doesn't correspond to the request, from this point onwards all hell will
> likely break loose, possibly even in the close() path. It's probably not


Can we use this as an error message, " All hell broke loose, Run!!!" ;-)

> worth than a Fatal though, so
>
> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>


Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>

>
>> +
>>   		return;
>>   	}
>>
Hirokazu Honda Oct. 19, 2021, 4 a.m. UTC | #3
Hi Jacopo, thank you for the patch.

On Fri, Oct 15, 2021 at 12:17 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>
> Hello,
>
> On 10/15/21 7:32 AM, Laurent Pinchart wrote:
> > Hi Jacopo,
> >
> > Thank you for the patch.
> >
> > On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote:
> >> When the camera HAL detects an out-of-order completion of a request, it
> >> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error.
> >>
> >> Such error not only forces the service to close the camera as prescribed
> >> by the camera3 specification, but in some implementation (specifically
> >> the ChromeOS one) it causes the camera service to abort and exit.
> >>
> >> This prevents any error messages to be printed by libcamera, as the
> > s/to be printed/from being printed/
> >
> >> library gets terminated before getting to that point, and also hides the
> >> printout of error messages that lead to out-of-order completion, making
> >> it impossible to get from the output log what happened.
> >>
> >> Move the call to notifyError() at the end of the error path and demote
> >> the error message to LogLevels::Error from Fatal to let the service
> >> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors.
> >>
> >> Before this patch, when waiting on a fence fails and the capture
> >> request is not queued to the Camera, we get an out-of-order completion
> >> but no backtrace. With this patch applied the error path is visible:

Is this a bug in libcamera? Will we need to fix this, or intended behavior?

> >>
> >> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired
> >> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70
> >>
> >> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> >> ---
> >>   src/android/camera_device.cpp | 9 +++++----
> >>   1 file changed, 5 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> >> index 901867105085..f60297e13c8b 100644
> >> --- a/src/android/camera_device.cpp
> >> +++ b/src/android/camera_device.cpp
> >> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request)
> >>      if (descriptor->request_->cookie() != request->cookie()) {
> >>              /*
> >>               * \todo Clarify if the Camera has to be closed on
> >> -             * ERROR_DEVICE and possibly demote the Fatal to simple
> >> -             * Error.
> >> +             * ERROR_DEVICE.
> >>               */
> >> -            notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
> >> -            LOG(HAL, Fatal)
> >> +            LOG(HAL, Error)
> >>                      << "Out-of-order completion for request "
> >>                      << utils::hex(request->cookie());
> >>
> >>              MutexLocker descriptorsLock(descriptorsMutex_);
> >>              descriptors_.pop();
> >> +
> >> +            notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
> > Given that we've removed the descriptor at the head of the queue, which
> > doesn't correspond to the request, from this point onwards all hell will
> > likely break loose, possibly even in the close() path. It's probably not
>
>
> Can we use this as an error message, " All hell broke loose, Run!!!" ;-)
>
> > worth than a Fatal though, so
> >
> > Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
>
>
> Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>
>

Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
> >
> >> +
> >>              return;
> >>      }
> >>
Laurent Pinchart Oct. 19, 2021, 6:42 a.m. UTC | #4
Hi Hiro,

On Tue, Oct 19, 2021 at 01:00:50PM +0900, Hirokazu Honda wrote:
> On Fri, Oct 15, 2021 at 12:17 PM Umang Jain wrote:
> > On 10/15/21 7:32 AM, Laurent Pinchart wrote:
> > > On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote:
> > >> When the camera HAL detects an out-of-order completion of a request, it
> > >> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error.
> > >>
> > >> Such error not only forces the service to close the camera as prescribed
> > >> by the camera3 specification, but in some implementation (specifically
> > >> the ChromeOS one) it causes the camera service to abort and exit.
> > >>
> > >> This prevents any error messages to be printed by libcamera, as the
> > > s/to be printed/from being printed/
> > >
> > >> library gets terminated before getting to that point, and also hides the
> > >> printout of error messages that lead to out-of-order completion, making
> > >> it impossible to get from the output log what happened.
> > >>
> > >> Move the call to notifyError() at the end of the error path and demote
> > >> the error message to LogLevels::Error from Fatal to let the service
> > >> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors.
> > >>
> > >> Before this patch, when waiting on a fence fails and the capture
> > >> request is not queued to the Camera, we get an out-of-order completion
> > >> but no backtrace. With this patch applied the error path is visible:
> 
> Is this a bug in libcamera? Will we need to fix this, or intended behavior?

The timeout is caused by the fence not being signalled for 300ms, which
seems to be a bug in Chrome OS. The HAL should recover gracefully from
this condition, Jacopo is working on it.

> > >> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired
> > >> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70
> > >>
> > >> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > >> ---
> > >>   src/android/camera_device.cpp | 9 +++++----
> > >>   1 file changed, 5 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > >> index 901867105085..f60297e13c8b 100644
> > >> --- a/src/android/camera_device.cpp
> > >> +++ b/src/android/camera_device.cpp
> > >> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request)
> > >>      if (descriptor->request_->cookie() != request->cookie()) {
> > >>              /*
> > >>               * \todo Clarify if the Camera has to be closed on
> > >> -             * ERROR_DEVICE and possibly demote the Fatal to simple
> > >> -             * Error.
> > >> +             * ERROR_DEVICE.
> > >>               */
> > >> -            notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
> > >> -            LOG(HAL, Fatal)
> > >> +            LOG(HAL, Error)
> > >>                      << "Out-of-order completion for request "
> > >>                      << utils::hex(request->cookie());
> > >>
> > >>              MutexLocker descriptorsLock(descriptorsMutex_);
> > >>              descriptors_.pop();
> > >> +
> > >> +            notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
> > >
> > > Given that we've removed the descriptor at the head of the queue, which
> > > doesn't correspond to the request, from this point onwards all hell will
> > > likely break loose, possibly even in the close() path. It's probably not
> >
> > Can we use this as an error message, " All hell broke loose, Run!!!" ;-)
> >
> > > worth than a Fatal though, so
> > >
> > > Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> >
> > Reviewed-by: Umang Jain <umang.jain@ideasonboard.com>
> 
> Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
>
> > >> +
> > >>              return;
> > >>      }
> > >>

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index 901867105085..f60297e13c8b 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -1104,16 +1104,17 @@  void CameraDevice::requestComplete(Request *request)
 	if (descriptor->request_->cookie() != request->cookie()) {
 		/*
 		 * \todo Clarify if the Camera has to be closed on
-		 * ERROR_DEVICE and possibly demote the Fatal to simple
-		 * Error.
+		 * ERROR_DEVICE.
 		 */
-		notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
-		LOG(HAL, Fatal)
+		LOG(HAL, Error)
 			<< "Out-of-order completion for request "
 			<< utils::hex(request->cookie());
 
 		MutexLocker descriptorsLock(descriptorsMutex_);
 		descriptors_.pop();
+
+		notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE);
+
 		return;
 	}