Message ID | 20211014103757.21161-1-jacopo@jmondi.org |
---|---|
State | Accepted |
Headers | show |
Series |
|
Related | show |
Hi Jacopo, Thank you for the patch. On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote: > When the camera HAL detects an out-of-order completion of a request, it > sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error. > > Such error not only forces the service to close the camera as prescribed > by the camera3 specification, but in some implementation (specifically > the ChromeOS one) it causes the camera service to abort and exit. > > This prevents any error messages to be printed by libcamera, as the s/to be printed/from being printed/ > library gets terminated before getting to that point, and also hides the > printout of error messages that lead to out-of-order completion, making > it impossible to get from the output log what happened. > > Move the call to notifyError() at the end of the error path and demote > the error message to LogLevels::Error from Fatal to let the service > implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors. > > Before this patch, when waiting on a fence fails and the capture > request is not queued to the Camera, we get an out-of-order completion > but no backtrace. With this patch applied the error path is visible: > > ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired > ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70 > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > --- > src/android/camera_device.cpp | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > index 901867105085..f60297e13c8b 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request) > if (descriptor->request_->cookie() != request->cookie()) { > /* > * \todo Clarify if the Camera has to be closed on > - * ERROR_DEVICE and possibly demote the Fatal to simple > - * Error. > + * ERROR_DEVICE. > */ > - notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); > - LOG(HAL, Fatal) > + LOG(HAL, Error) > << "Out-of-order completion for request " > << utils::hex(request->cookie()); > > MutexLocker descriptorsLock(descriptorsMutex_); > descriptors_.pop(); > + > + notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); Given that we've removed the descriptor at the head of the queue, which doesn't correspond to the request, from this point onwards all hell will likely break loose, possibly even in the close() path. It's probably not worth than a Fatal though, so Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > + > return; > } >
Hello, On 10/15/21 7:32 AM, Laurent Pinchart wrote: > Hi Jacopo, > > Thank you for the patch. > > On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote: >> When the camera HAL detects an out-of-order completion of a request, it >> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error. >> >> Such error not only forces the service to close the camera as prescribed >> by the camera3 specification, but in some implementation (specifically >> the ChromeOS one) it causes the camera service to abort and exit. >> >> This prevents any error messages to be printed by libcamera, as the > s/to be printed/from being printed/ > >> library gets terminated before getting to that point, and also hides the >> printout of error messages that lead to out-of-order completion, making >> it impossible to get from the output log what happened. >> >> Move the call to notifyError() at the end of the error path and demote >> the error message to LogLevels::Error from Fatal to let the service >> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors. >> >> Before this patch, when waiting on a fence fails and the capture >> request is not queued to the Camera, we get an out-of-order completion >> but no backtrace. With this patch applied the error path is visible: >> >> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired >> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70 >> >> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> >> --- >> src/android/camera_device.cpp | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp >> index 901867105085..f60297e13c8b 100644 >> --- a/src/android/camera_device.cpp >> +++ b/src/android/camera_device.cpp >> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request) >> if (descriptor->request_->cookie() != request->cookie()) { >> /* >> * \todo Clarify if the Camera has to be closed on >> - * ERROR_DEVICE and possibly demote the Fatal to simple >> - * Error. >> + * ERROR_DEVICE. >> */ >> - notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); >> - LOG(HAL, Fatal) >> + LOG(HAL, Error) >> << "Out-of-order completion for request " >> << utils::hex(request->cookie()); >> >> MutexLocker descriptorsLock(descriptorsMutex_); >> descriptors_.pop(); >> + >> + notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); > Given that we've removed the descriptor at the head of the queue, which > doesn't correspond to the request, from this point onwards all hell will > likely break loose, possibly even in the close() path. It's probably not Can we use this as an error message, " All hell broke loose, Run!!!" ;-) > worth than a Fatal though, so > > Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Umang Jain <umang.jain@ideasonboard.com> > >> + >> return; >> } >>
Hi Jacopo, thank you for the patch. On Fri, Oct 15, 2021 at 12:17 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > Hello, > > On 10/15/21 7:32 AM, Laurent Pinchart wrote: > > Hi Jacopo, > > > > Thank you for the patch. > > > > On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote: > >> When the camera HAL detects an out-of-order completion of a request, it > >> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error. > >> > >> Such error not only forces the service to close the camera as prescribed > >> by the camera3 specification, but in some implementation (specifically > >> the ChromeOS one) it causes the camera service to abort and exit. > >> > >> This prevents any error messages to be printed by libcamera, as the > > s/to be printed/from being printed/ > > > >> library gets terminated before getting to that point, and also hides the > >> printout of error messages that lead to out-of-order completion, making > >> it impossible to get from the output log what happened. > >> > >> Move the call to notifyError() at the end of the error path and demote > >> the error message to LogLevels::Error from Fatal to let the service > >> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors. > >> > >> Before this patch, when waiting on a fence fails and the capture > >> request is not queued to the Camera, we get an out-of-order completion > >> but no backtrace. With this patch applied the error path is visible: Is this a bug in libcamera? Will we need to fix this, or intended behavior? > >> > >> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired > >> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70 > >> > >> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > >> --- > >> src/android/camera_device.cpp | 9 +++++---- > >> 1 file changed, 5 insertions(+), 4 deletions(-) > >> > >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > >> index 901867105085..f60297e13c8b 100644 > >> --- a/src/android/camera_device.cpp > >> +++ b/src/android/camera_device.cpp > >> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request) > >> if (descriptor->request_->cookie() != request->cookie()) { > >> /* > >> * \todo Clarify if the Camera has to be closed on > >> - * ERROR_DEVICE and possibly demote the Fatal to simple > >> - * Error. > >> + * ERROR_DEVICE. > >> */ > >> - notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); > >> - LOG(HAL, Fatal) > >> + LOG(HAL, Error) > >> << "Out-of-order completion for request " > >> << utils::hex(request->cookie()); > >> > >> MutexLocker descriptorsLock(descriptorsMutex_); > >> descriptors_.pop(); > >> + > >> + notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); > > Given that we've removed the descriptor at the head of the queue, which > > doesn't correspond to the request, from this point onwards all hell will > > likely break loose, possibly even in the close() path. It's probably not > > > Can we use this as an error message, " All hell broke loose, Run!!!" ;-) > > > worth than a Fatal though, so > > > > Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > > > Reviewed-by: Umang Jain <umang.jain@ideasonboard.com> > Reviewed-by: Hirokazu Honda <hiroh@chromium.org> > > > >> + > >> return; > >> } > >>
Hi Hiro, On Tue, Oct 19, 2021 at 01:00:50PM +0900, Hirokazu Honda wrote: > On Fri, Oct 15, 2021 at 12:17 PM Umang Jain wrote: > > On 10/15/21 7:32 AM, Laurent Pinchart wrote: > > > On Thu, Oct 14, 2021 at 12:37:57PM +0200, Jacopo Mondi wrote: > > >> When the camera HAL detects an out-of-order completion of a request, it > > >> sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error. > > >> > > >> Such error not only forces the service to close the camera as prescribed > > >> by the camera3 specification, but in some implementation (specifically > > >> the ChromeOS one) it causes the camera service to abort and exit. > > >> > > >> This prevents any error messages to be printed by libcamera, as the > > > s/to be printed/from being printed/ > > > > > >> library gets terminated before getting to that point, and also hides the > > >> printout of error messages that lead to out-of-order completion, making > > >> it impossible to get from the output log what happened. > > >> > > >> Move the call to notifyError() at the end of the error path and demote > > >> the error message to LogLevels::Error from Fatal to let the service > > >> implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors. > > >> > > >> Before this patch, when waiting on a fence fails and the capture > > >> request is not queued to the Camera, we get an out-of-order completion > > >> but no backtrace. With this patch applied the error path is visible: > > Is this a bug in libcamera? Will we need to fix this, or intended behavior? The timeout is caused by the fence not being signalled for 300ms, which seems to be a bug in Chrome OS. The HAL should recover gracefully from this condition, Jacopo is working on it. > > >> ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired > > >> ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70 > > >> > > >> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > >> --- > > >> src/android/camera_device.cpp | 9 +++++---- > > >> 1 file changed, 5 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > >> index 901867105085..f60297e13c8b 100644 > > >> --- a/src/android/camera_device.cpp > > >> +++ b/src/android/camera_device.cpp > > >> @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request) > > >> if (descriptor->request_->cookie() != request->cookie()) { > > >> /* > > >> * \todo Clarify if the Camera has to be closed on > > >> - * ERROR_DEVICE and possibly demote the Fatal to simple > > >> - * Error. > > >> + * ERROR_DEVICE. > > >> */ > > >> - notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); > > >> - LOG(HAL, Fatal) > > >> + LOG(HAL, Error) > > >> << "Out-of-order completion for request " > > >> << utils::hex(request->cookie()); > > >> > > >> MutexLocker descriptorsLock(descriptorsMutex_); > > >> descriptors_.pop(); > > >> + > > >> + notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); > > > > > > Given that we've removed the descriptor at the head of the queue, which > > > doesn't correspond to the request, from this point onwards all hell will > > > likely break loose, possibly even in the close() path. It's probably not > > > > Can we use this as an error message, " All hell broke loose, Run!!!" ;-) > > > > > worth than a Fatal though, so > > > > > > Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > > > > Reviewed-by: Umang Jain <umang.jain@ideasonboard.com> > > Reviewed-by: Hirokazu Honda <hiroh@chromium.org> > > > >> + > > >> return; > > >> } > > >>
diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp index 901867105085..f60297e13c8b 100644 --- a/src/android/camera_device.cpp +++ b/src/android/camera_device.cpp @@ -1104,16 +1104,17 @@ void CameraDevice::requestComplete(Request *request) if (descriptor->request_->cookie() != request->cookie()) { /* * \todo Clarify if the Camera has to be closed on - * ERROR_DEVICE and possibly demote the Fatal to simple - * Error. + * ERROR_DEVICE. */ - notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); - LOG(HAL, Fatal) + LOG(HAL, Error) << "Out-of-order completion for request " << utils::hex(request->cookie()); MutexLocker descriptorsLock(descriptorsMutex_); descriptors_.pop(); + + notifyError(0, nullptr, CAMERA3_MSG_ERROR_DEVICE); + return; }
When the camera HAL detects an out-of-order completion of a request, it sends to the camera framework a CAMERA3_MSG_ERROR_DEVICE error. Such error not only forces the service to close the camera as prescribed by the camera3 specification, but in some implementation (specifically the ChromeOS one) it causes the camera service to abort and exit. This prevents any error messages to be printed by libcamera, as the library gets terminated before getting to that point, and also hides the printout of error messages that lead to out-of-order completion, making it impossible to get from the output log what happened. Move the call to notifyError() at the end of the error path and demote the error message to LogLevels::Error from Fatal to let the service implementation decide how to handle CAMERA3_MSG_ERROR_DEVICE errors. Before this patch, when waiting on a fence fails and the capture request is not queued to the Camera, we get an out-of-order completion but no backtrace. With this patch applied the error path is visible: ERROR HAL camera_worker.cpp:122 Failed waiting for fence: 82: Timer expired ERROR HAL camera_device.cpp:1110 '\_SB_.PCI0.I2C2.CAM0': Out-of-order completion for request 0x00007e6de4004c70 Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> --- src/android/camera_device.cpp | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)