[libcamera-devel,v3,2/3] android: CameraDevice: Prevent race due to concurrent HAL calls
diff mbox series

Message ID 20210426083830.2965095-3-hiroh@chromium.org
State Superseded
Headers show
Series
  • Support HAL3 API flush
Related show

Commit Message

Hirokazu Honda April 26, 2021, 8:38 a.m. UTC
HAL API functions might be called by different threads in Android
Camera HAL3 API, while process_capture_request() must be called
by a single thread. Furthermore, close() and flush() can be
called any time. Therefore, races to variables can occur among
the HAL API functions.
This prevents the races by enclosing HAL calls by mutex. It
might not be the best in terms of the performance, but the
simplicity is good as a first step and also further development.

Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
---
 src/android/camera_device.cpp | 59 ++++++++++++++++++++---------------
 src/android/camera_device.h   |  7 +++--
 2 files changed, 39 insertions(+), 27 deletions(-)

Comments

Jacopo Mondi May 3, 2021, 2:41 p.m. UTC | #1
Hi Hiro,

On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> HAL API functions might be called by different threads in Android
> Camera HAL3 API, while process_capture_request() must be called
> by a single thread. Furthermore, close() and flush() can be
> called any time. Therefore, races to variables can occur among
> the HAL API functions.
> This prevents the races by enclosing HAL calls by mutex. It
> might not be the best in terms of the performance, but the
> simplicity is good as a first step and also further development.
>
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> ---
>  src/android/camera_device.cpp | 59 ++++++++++++++++++++---------------
>  src/android/camera_device.h   |  7 +++--
>  2 files changed, 39 insertions(+), 27 deletions(-)
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index b3def04b..96c8d4a9 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -743,7 +743,7 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> +	std::scoped_lock<std::mutex> halLock(halMutex_);

Have you considered locking the wrappers functions in camera_ops.cpp ?
Wouldn't it be simpler ? However locking with such a coarse-grained
granularity I feel contradicts a bit the Camera3 API flush()
documentation, which implies that flush() and
process_capture_request() can race

     * Due to
     * concurrency issues, it is possible that from the HAL's point of view, a
     * process_capture_request() call may be started after flush has been invoked but has not
     * returned yet. If such a call happens before flush() returns, the HAL should treat the new
     * capture request like other in-flight pending requests (see #4 below).

Locking at function level rules out this use case: if we lock each
function it's not possible for process_capture_request() to be started
during flush() execution as the lock hold by flush will be released
only at exit time.

If serializing operations with this granularity is fine (and I really
with it is, as it would really simplify life for use) the only racing
condition we have to protect from is between the request completion
signal emitted by libcamera, and the camera framework calling into the
Camera HAL using the camera3 API as flush(), close() and
process_capture_request() will always be serialized.

Assuming the above is ok with the Camera3 API, isn't it enough to

- flush() to call stop() and stop the Camera:
        - All queued requests at the time Camera::stop() is called
          will be returned one by one with an error state. There's no
          reason to cancel the descriptors at stop() time, those
          requests will be completed

        - A process_capture_request() called after flush() has stopped
          the camera will return immeditely as the camera is stopped
          and the capture request is returned with an error
          (we might want a flag to keep track of this, otherwise we'll
          hit a Camera state machine error, much like we do with
          running_)

        - A close() called after flush() will be a no op

        - Only open() will allow the CameraDevice to move back from
          stopped_ state

- There's a tiny tiny window where a request can complete after flush
  has been called but the camera has not been stopped yet. In
  requestComplete() we'll have to inspect the stopped_ flag state and
  force the request to return with error even if it did not.

- Once we move JPEG encoding to a worker thread, stop() will have to
  forcefully stop it. Requests interrupted during encoding will be
  returned with error.

A simple lock around stopped_ should protect it from concurrent
accesses between stop() and requestComplete().

Have you considered this option and ruled it out for reasons I'm
failing to consider ? It feels simpler than decoupling requests and
descriptors, maybe you tried and found out it's not :)

Thanks
  j


>
>  	stop();
>
> @@ -758,12 +758,17 @@ void CameraDevice::stop()
>  	worker_.stop();
>  	camera_->stop();
>
> -	descriptors_.clear();
> +	{
> +		std::scoped_lock<std::mutex> lock(mutex_);
> +		descriptors_.clear();
> +	}
> +
>  	running_ = false;
>  }
>
>  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
>  {
> +	std::scoped_lock<std::mutex> halLock(halMutex_);
>  	callbacks_ = callbacks;
>  }
>
> @@ -1643,8 +1648,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
>   */
>  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  {
> -	/* Before any configuration attempt, stop the camera. */
> -	stop();
> +	std::scoped_lock<std::mutex> halLock(halMutex_);
>
>  	if (stream_list->num_streams == 0) {
>  		LOG(HAL, Error) << "No streams in configuration";
> @@ -1656,6 +1660,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  		return -EINVAL;
>  #endif
>
> +	/* Before any configuration attempt, stop the camera. */
> +	stop();
> +
>  	/*
>  	 * Generate an empty configuration, and construct a StreamConfiguration
>  	 * for each camera3_stream to add to it.
> @@ -1671,6 +1678,8 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  	 * ensure the required entries are available without further
>  	 * reallocation.
>  	 */
> +
> +	std::scoped_lock<std::mutex> lock(mutex_);
>  	streams_.clear();
>  	streams_.reserve(stream_list->num_streams);
>
> @@ -1907,6 +1916,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  	if (!isValidRequest(camera3Request))
>  		return -EINVAL;
>
> +	std::scoped_lock<std::mutex> halLock(halMutex_);
> +
>  	/* Start the camera if that's the first request we handle. */
>  	if (!running_) {
>  		worker_.start();
> @@ -2022,29 +2033,26 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>
>  void CameraDevice::requestComplete(Request *request)
>  {
> +	std::scoped_lock<std::mutex> lock(mutex_);
> +
>  	const Request::BufferMap &buffers = request->buffers();
>  	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
>  	std::unique_ptr<CameraMetadata> resultMetadata;
>
> -	decltype(descriptors_)::node_type node;
> -	std::unique_ptr<CaptureRequest> captureRequest;
> -	{
> -		std::scoped_lock<std::mutex> lock(mutex_);
> -		auto requestIt = requests_.find(request->cookie());
> -		if (requestIt == requests_.end()) {
> -			LOG(HAL, Fatal)
> -				<< "Unknown request: " << request->cookie();
> -			return;
> -		}
> -		captureRequest = std::move(requestIt->second);
> -		requests_.erase(requestIt);
> +	auto requestIt = requests_.find(request->cookie());
> +	if (requestIt == requests_.end()) {
> +		LOG(HAL, Fatal)
> +			<< "Unknown request: " << request->cookie();
> +		return;
> +	}
> +	auto captureRequest = std::move(requestIt->second);
> +	requests_.erase(requestIt);
>
> -		auto it = descriptors_.find(request->cookie());
> -		if (it == descriptors_.end())
> -			return;
> +	auto it = descriptors_.find(request->cookie());
> +	if (it == descriptors_.end())
> +		return;
>
> -		node = descriptors_.extract(it);
> -	}
> +	auto node = descriptors_.extract(it);
>  	Camera3RequestDescriptor &descriptor = node.mapped();
>
>  	if (request->status() != Request::RequestComplete) {
> @@ -2149,11 +2157,12 @@ void CameraDevice::notifyError(uint32_t frameNumber, camera3_stream_t *stream)
>  	camera3_notify_msg_t notify = {};
>
>  	/*
> -	 * \todo Report and identify the stream number or configuration to
> -	 * clarify the stream that failed.
> +	 * \todo Report and identify the stream number or configuration
> +	 * to clarify the stream that failed.
>  	 */
> -	LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
> -			<< toPixelFormat(stream->format).toString() << ")";
> +	LOG(HAL, Error)
> +		<< "Error occurred on frame " << descriptor.frameNumber_ << " ("
> +		<< toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
>
>  	notify.type = CAMERA3_MSG_ERROR;
>  	notify.message.error.error_stream = stream;
> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> index 95d77890..325fb967 100644
> --- a/src/android/camera_device.h
> +++ b/src/android/camera_device.h
> @@ -125,9 +125,12 @@ private:
>
>  	std::vector<Camera3StreamConfiguration> streamConfigurations_;
>  	std::map<int, libcamera::PixelFormat> formatsMap_;
> -	std::vector<CameraStream> streams_;
>
> -	std::mutex mutex_; /* Protect descriptors_ and requests_. */
> +	/* Avoid races by concurrent Camera HAL API calls. */
> +	std::mutex halMutex_;
> +
> +	std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
> +	std::vector<CameraStream> streams_;
>  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
>  	std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
>
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
> _______________________________________________
> libcamera-devel mailing list
> libcamera-devel@lists.libcamera.org
> https://lists.libcamera.org/listinfo/libcamera-devel
Laurent Pinchart May 3, 2021, 2:55 p.m. UTC | #2
Hello,

On Mon, May 03, 2021 at 04:41:04PM +0200, Jacopo Mondi wrote:
> On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> > HAL API functions might be called by different threads in Android
> > Camera HAL3 API, while process_capture_request() must be called
> > by a single thread. Furthermore, close() and flush() can be
> > called any time. Therefore, races to variables can occur among
> > the HAL API functions.
> > This prevents the races by enclosing HAL calls by mutex. It
> > might not be the best in terms of the performance, but the
> > simplicity is good as a first step and also further development.
> >
> > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > ---
> >  src/android/camera_device.cpp | 59 ++++++++++++++++++++---------------
> >  src/android/camera_device.h   |  7 +++--
> >  2 files changed, 39 insertions(+), 27 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index b3def04b..96c8d4a9 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -743,7 +743,7 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> >  void CameraDevice::close()
> >  {
> > -	streams_.clear();
> > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> 
> Have you considered locking the wrappers functions in camera_ops.cpp ?
> Wouldn't it be simpler ? However locking with such a coarse-grained
> granularity I feel contradicts a bit the Camera3 API flush()
> documentation, which implies that flush() and
> process_capture_request() can race
> 
>      * Due to
>      * concurrency issues, it is possible that from the HAL's point of view, a
>      * process_capture_request() call may be started after flush has been invoked but has not
>      * returned yet. If such a call happens before flush() returns, the HAL should treat the new
>      * capture request like other in-flight pending requests (see #4 below).
> 
> Locking at function level rules out this use case: if we lock each
> function it's not possible for process_capture_request() to be started
> during flush() execution as the lock hold by flush will be released
> only at exit time.
> 
> If serializing operations with this granularity is fine (and I really
> with it is, as it would really simplify life for use)

I think it is as an initial step, but likely not in the longer term. It
defeats the point of having the camera service issue
process_capture_request() and stop()/flush() calls concurrently. If
serializing operations was good enough in every case, it would be done
in the camera service itself, and HAL implementations wouldn't need to
deal with this.

> the only racing
> condition we have to protect from is between the request completion
> signal emitted by libcamera, and the camera framework calling into the
> Camera HAL using the camera3 API as flush(), close() and
> process_capture_request() will always be serialized.
> 
> Assuming the above is ok with the Camera3 API, isn't it enough to
> 
> - flush() to call stop() and stop the Camera:
>         - All queued requests at the time Camera::stop() is called
>           will be returned one by one with an error state. There's no
>           reason to cancel the descriptors at stop() time, those
>           requests will be completed
> 
>         - A process_capture_request() called after flush() has stopped
>           the camera will return immeditely as the camera is stopped
>           and the capture request is returned with an error
>           (we might want a flag to keep track of this, otherwise we'll
>           hit a Camera state machine error, much like we do with
>           running_)
> 
>         - A close() called after flush() will be a no op

So far it seems fine to me.

>         - Only open() will allow the CameraDevice to move back from
>           stopped_ state

That's correct for close(). For flush(), we can to make sure that a
configureStreams() call will reopen the camera.

> - There's a tiny tiny window where a request can complete after flush
>   has been called but the camera has not been stopped yet. In

It's not that tiny, it can certainly be hit in practice.

>   requestComplete() we'll have to inspect the stopped_ flag state and
>   force the request to return with error even if it did not.

Why can't we complete the request without any error in that case ?

> - Once we move JPEG encoding to a worker thread, stop() will have to
>   forcefully stop it. Requests interrupted during encoding will be
>   returned with error.
> 
> A simple lock around stopped_ should protect it from concurrent
> accesses between stop() and requestComplete().

The issue is that the flush() function must wait for all requests to
complete. This could be implemented with a condition variable, allowing
to avoid serializing CameraDevice::requestComplete() with the API calls
from the camera service. I think that would be better indeed.

> Have you considered this option and ruled it out for reasons I'm
> failing to consider ? It feels simpler than decoupling requests and
> descriptors, maybe you tried and found out it's not :)
> 
> >  	stop();
> >
> > @@ -758,12 +758,17 @@ void CameraDevice::stop()
> >  	worker_.stop();
> >  	camera_->stop();
> >
> > -	descriptors_.clear();
> > +	{
> > +		std::scoped_lock<std::mutex> lock(mutex_);
> > +		descriptors_.clear();
> > +	}
> > +
> >  	running_ = false;
> >  }
> >
> >  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> >  {
> > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> >  	callbacks_ = callbacks;
> >  }
> >
> > @@ -1643,8 +1648,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> >   */
> >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >  {
> > -	/* Before any configuration attempt, stop the camera. */
> > -	stop();
> > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> >
> >  	if (stream_list->num_streams == 0) {
> >  		LOG(HAL, Error) << "No streams in configuration";
> > @@ -1656,6 +1660,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >  		return -EINVAL;
> >  #endif
> >
> > +	/* Before any configuration attempt, stop the camera. */
> > +	stop();
> > +
> >  	/*
> >  	 * Generate an empty configuration, and construct a StreamConfiguration
> >  	 * for each camera3_stream to add to it.
> > @@ -1671,6 +1678,8 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >  	 * ensure the required entries are available without further
> >  	 * reallocation.
> >  	 */
> > +
> > +	std::scoped_lock<std::mutex> lock(mutex_);
> >  	streams_.clear();
> >  	streams_.reserve(stream_list->num_streams);
> >
> > @@ -1907,6 +1916,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >  	if (!isValidRequest(camera3Request))
> >  		return -EINVAL;
> >
> > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> > +
> >  	/* Start the camera if that's the first request we handle. */
> >  	if (!running_) {
> >  		worker_.start();
> > @@ -2022,29 +2033,26 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >
> >  void CameraDevice::requestComplete(Request *request)
> >  {
> > +	std::scoped_lock<std::mutex> lock(mutex_);
> > +
> >  	const Request::BufferMap &buffers = request->buffers();
> >  	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> >  	std::unique_ptr<CameraMetadata> resultMetadata;
> >
> > -	decltype(descriptors_)::node_type node;
> > -	std::unique_ptr<CaptureRequest> captureRequest;
> > -	{
> > -		std::scoped_lock<std::mutex> lock(mutex_);
> > -		auto requestIt = requests_.find(request->cookie());
> > -		if (requestIt == requests_.end()) {
> > -			LOG(HAL, Fatal)
> > -				<< "Unknown request: " << request->cookie();
> > -			return;
> > -		}
> > -		captureRequest = std::move(requestIt->second);
> > -		requests_.erase(requestIt);
> > +	auto requestIt = requests_.find(request->cookie());
> > +	if (requestIt == requests_.end()) {
> > +		LOG(HAL, Fatal)
> > +			<< "Unknown request: " << request->cookie();
> > +		return;
> > +	}
> > +	auto captureRequest = std::move(requestIt->second);
> > +	requests_.erase(requestIt);
> >
> > -		auto it = descriptors_.find(request->cookie());
> > -		if (it == descriptors_.end())
> > -			return;
> > +	auto it = descriptors_.find(request->cookie());
> > +	if (it == descriptors_.end())
> > +		return;
> >
> > -		node = descriptors_.extract(it);
> > -	}
> > +	auto node = descriptors_.extract(it);
> >  	Camera3RequestDescriptor &descriptor = node.mapped();
> >
> >  	if (request->status() != Request::RequestComplete) {
> > @@ -2149,11 +2157,12 @@ void CameraDevice::notifyError(uint32_t frameNumber, camera3_stream_t *stream)
> >  	camera3_notify_msg_t notify = {};
> >
> >  	/*
> > -	 * \todo Report and identify the stream number or configuration to
> > -	 * clarify the stream that failed.
> > +	 * \todo Report and identify the stream number or configuration
> > +	 * to clarify the stream that failed.
> >  	 */
> > -	LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
> > -			<< toPixelFormat(stream->format).toString() << ")";
> > +	LOG(HAL, Error)
> > +		<< "Error occurred on frame " << descriptor.frameNumber_ << " ("
> > +		<< toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
> >
> >  	notify.type = CAMERA3_MSG_ERROR;
> >  	notify.message.error.error_stream = stream;
> > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > index 95d77890..325fb967 100644
> > --- a/src/android/camera_device.h
> > +++ b/src/android/camera_device.h
> > @@ -125,9 +125,12 @@ private:
> >
> >  	std::vector<Camera3StreamConfiguration> streamConfigurations_;
> >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> > -	std::vector<CameraStream> streams_;
> >
> > -	std::mutex mutex_; /* Protect descriptors_ and requests_. */
> > +	/* Avoid races by concurrent Camera HAL API calls. */
> > +	std::mutex halMutex_;
> > +
> > +	std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
> > +	std::vector<CameraStream> streams_;
> >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> >  	std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
> >
Jacopo Mondi May 3, 2021, 4:02 p.m. UTC | #3
Hi Laurent,

On Mon, May 03, 2021 at 05:55:38PM +0300, Laurent Pinchart wrote:
> Hello,
>
> On Mon, May 03, 2021 at 04:41:04PM +0200, Jacopo Mondi wrote:
> > On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> > > HAL API functions might be called by different threads in Android
> > > Camera HAL3 API, while process_capture_request() must be called
> > > by a single thread. Furthermore, close() and flush() can be
> > > called any time. Therefore, races to variables can occur among
> > > the HAL API functions.
> > > This prevents the races by enclosing HAL calls by mutex. It
> > > might not be the best in terms of the performance, but the
> > > simplicity is good as a first step and also further development.
> > >
> > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > ---
> > >  src/android/camera_device.cpp | 59 ++++++++++++++++++++---------------
> > >  src/android/camera_device.h   |  7 +++--
> > >  2 files changed, 39 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index b3def04b..96c8d4a9 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -743,7 +743,7 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > >
> > >  void CameraDevice::close()
> > >  {
> > > -	streams_.clear();
> > > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> >
> > Have you considered locking the wrappers functions in camera_ops.cpp ?
> > Wouldn't it be simpler ? However locking with such a coarse-grained
> > granularity I feel contradicts a bit the Camera3 API flush()
> > documentation, which implies that flush() and
> > process_capture_request() can race
> >
> >      * Due to
> >      * concurrency issues, it is possible that from the HAL's point of view, a
> >      * process_capture_request() call may be started after flush has been invoked but has not
> >      * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> >      * capture request like other in-flight pending requests (see #4 below).
> >
> > Locking at function level rules out this use case: if we lock each
> > function it's not possible for process_capture_request() to be started
> > during flush() execution as the lock hold by flush will be released
> > only at exit time.
> >
> > If serializing operations with this granularity is fine (and I really
> > with it is, as it would really simplify life for use)
>
> I think it is as an initial step, but likely not in the longer term. It
> defeats the point of having the camera service issue
> process_capture_request() and stop()/flush() calls concurrently. If

Yes, that was my concern, although the documentation says it's enough
to implement the full success/full fail behavior which I think it's
what's happening here.

> serializing operations was good enough in every case, it would be done
> in the camera service itself, and HAL implementations wouldn't need to
> deal with this.

Also true, indeed the framework allows to implement a finer granularity
in locking the operations

>
> > the only racing
> > condition we have to protect from is between the request completion
> > signal emitted by libcamera, and the camera framework calling into the
> > Camera HAL using the camera3 API as flush(), close() and
> > process_capture_request() will always be serialized.
> >
> > Assuming the above is ok with the Camera3 API, isn't it enough to
> >
> > - flush() to call stop() and stop the Camera:
> >         - All queued requests at the time Camera::stop() is called
> >           will be returned one by one with an error state. There's no
> >           reason to cancel the descriptors at stop() time, those
> >           requests will be completed
> >
> >         - A process_capture_request() called after flush() has stopped
> >           the camera will return immeditely as the camera is stopped
> >           and the capture request is returned with an error
> >           (we might want a flag to keep track of this, otherwise we'll
> >           hit a Camera state machine error, much like we do with
> >           running_)
> >
> >         - A close() called after flush() will be a no op
>
> So far it seems fine to me.
>
> >         - Only open() will allow the CameraDevice to move back from
> >           stopped_ state
>
> That's correct for close(). For flush(), we can to make sure that a
> configureStreams() call will reopen the camera.

Right, it's probably enough to just reset the state of the new flag in
configureStreams() and return immediately from
processCaptureRequest().

My main point should have been that processCaptureRequest() cannot be
called after a flush() without going through configureStreams() ?

Maybe not true however, considering:

     * flush() should only return when there are no more outstanding buffers or
     * requests left in the HAL. The framework may call configure_streams (as
     * the HAL state is now quiesced) or may issue new requests.

Which seems to suggest after a flush, processCaptureRequest() can be
called again.

>
> > - There's a tiny tiny window where a request can complete after flush
> >   has been called but the camera has not been stopped yet. In
>
> It's not that tiny, it can certainly be hit in practice.
>
> >   requestComplete() we'll have to inspect the stopped_ flag state and
> >   force the request to return with error even if it did not.
>
> Why can't we complete the request without any error in that case ?

Because flush() has been called already (but the camera has not
stopped yet?)

Although the documentation reports

     * 1. For captures that are too late for the HAL to cancel/stop, and will be
     *    completed normally by the HAL; i.e. the HAL can send shutter/notify and
     *    process_capture_result and buffers as normal.

It's then probably fine to let those requests just complete
successfully (even better, it's easier)

>
> > - Once we move JPEG encoding to a worker thread, stop() will have to
> >   forcefully stop it. Requests interrupted during encoding will be
> >   returned with error.
> >
> > A simple lock around stopped_ should protect it from concurrent
> > accesses between stop() and requestComplete().
>
> The issue is that the flush() function must wait for all requests to
> complete. This could be implemented with a condition variable, allowing
> to avoid serializing CameraDevice::requestComplete() with the API calls
> from the camera service. I think that would be better indeed.

Ah yes, I overlooked this part

     * flush() should only return when there are no more outstanding buffers or
     * requests left in the HAL.

Actually with a condition variable it might also get easier, as

        flush() {
                {
                        /* mutex lock *
                        flushing_ = true;
                }

                stop()

                flushed.wait()
                {
                        /* mutex lock *
                        flushing_ = false;
                }
        }

Would allow to held the flushing_ state until all the pending requests
have been returned as a consequence of stop(). Once that's done we can
reset the flushing_ flag, and the next processCaptureRequest() can
continue as normal ?

Do we have a list of CTS or cros_camera_test that exercizes flush() to
be used as validation ?

I've tried with this series applied:

# cros_camera_test --gtest_filter="Camera3FrameTest/Camera3FlushRequestsTest*"
[  PASSED  ] 4 tests.
[  FAILED  ] 8 tests, listed below

Which is the same result I get on the most recent master..

Thanks
   j

> > Have you considered this option and ruled it out for reasons I'm
> > failing to consider ? It feels simpler than decoupling requests and
> > descriptors, maybe you tried and found out it's not :)
> >
> > >  	stop();
> > >
> > > @@ -758,12 +758,17 @@ void CameraDevice::stop()
> > >  	worker_.stop();
> > >  	camera_->stop();
> > >
> > > -	descriptors_.clear();
> > > +	{
> > > +		std::scoped_lock<std::mutex> lock(mutex_);
> > > +		descriptors_.clear();
> > > +	}
> > > +
> > >  	running_ = false;
> > >  }
> > >
> > >  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > >  {
> > > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> > >  	callbacks_ = callbacks;
> > >  }
> > >
> > > @@ -1643,8 +1648,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > >   */
> > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >  {
> > > -	/* Before any configuration attempt, stop the camera. */
> > > -	stop();
> > > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> > >
> > >  	if (stream_list->num_streams == 0) {
> > >  		LOG(HAL, Error) << "No streams in configuration";
> > > @@ -1656,6 +1660,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >  		return -EINVAL;
> > >  #endif
> > >
> > > +	/* Before any configuration attempt, stop the camera. */
> > > +	stop();
> > > +
> > >  	/*
> > >  	 * Generate an empty configuration, and construct a StreamConfiguration
> > >  	 * for each camera3_stream to add to it.
> > > @@ -1671,6 +1678,8 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >  	 * ensure the required entries are available without further
> > >  	 * reallocation.
> > >  	 */
> > > +
> > > +	std::scoped_lock<std::mutex> lock(mutex_);
> > >  	streams_.clear();
> > >  	streams_.reserve(stream_list->num_streams);
> > >
> > > @@ -1907,6 +1916,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >  	if (!isValidRequest(camera3Request))
> > >  		return -EINVAL;
> > >
> > > +	std::scoped_lock<std::mutex> halLock(halMutex_);
> > > +
> > >  	/* Start the camera if that's the first request we handle. */
> > >  	if (!running_) {
> > >  		worker_.start();
> > > @@ -2022,29 +2033,26 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >
> > >  void CameraDevice::requestComplete(Request *request)
> > >  {
> > > +	std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> > >  	const Request::BufferMap &buffers = request->buffers();
> > >  	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > >  	std::unique_ptr<CameraMetadata> resultMetadata;
> > >
> > > -	decltype(descriptors_)::node_type node;
> > > -	std::unique_ptr<CaptureRequest> captureRequest;
> > > -	{
> > > -		std::scoped_lock<std::mutex> lock(mutex_);
> > > -		auto requestIt = requests_.find(request->cookie());
> > > -		if (requestIt == requests_.end()) {
> > > -			LOG(HAL, Fatal)
> > > -				<< "Unknown request: " << request->cookie();
> > > -			return;
> > > -		}
> > > -		captureRequest = std::move(requestIt->second);
> > > -		requests_.erase(requestIt);
> > > +	auto requestIt = requests_.find(request->cookie());
> > > +	if (requestIt == requests_.end()) {
> > > +		LOG(HAL, Fatal)
> > > +			<< "Unknown request: " << request->cookie();
> > > +		return;
> > > +	}
> > > +	auto captureRequest = std::move(requestIt->second);
> > > +	requests_.erase(requestIt);
> > >
> > > -		auto it = descriptors_.find(request->cookie());
> > > -		if (it == descriptors_.end())
> > > -			return;
> > > +	auto it = descriptors_.find(request->cookie());
> > > +	if (it == descriptors_.end())
> > > +		return;
> > >
> > > -		node = descriptors_.extract(it);
> > > -	}
> > > +	auto node = descriptors_.extract(it);
> > >  	Camera3RequestDescriptor &descriptor = node.mapped();
> > >
> > >  	if (request->status() != Request::RequestComplete) {
> > > @@ -2149,11 +2157,12 @@ void CameraDevice::notifyError(uint32_t frameNumber, camera3_stream_t *stream)
> > >  	camera3_notify_msg_t notify = {};
> > >
> > >  	/*
> > > -	 * \todo Report and identify the stream number or configuration to
> > > -	 * clarify the stream that failed.
> > > +	 * \todo Report and identify the stream number or configuration
> > > +	 * to clarify the stream that failed.
> > >  	 */
> > > -	LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
> > > -			<< toPixelFormat(stream->format).toString() << ")";
> > > +	LOG(HAL, Error)
> > > +		<< "Error occurred on frame " << descriptor.frameNumber_ << " ("
> > > +		<< toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
> > >
> > >  	notify.type = CAMERA3_MSG_ERROR;
> > >  	notify.message.error.error_stream = stream;
> > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > index 95d77890..325fb967 100644
> > > --- a/src/android/camera_device.h
> > > +++ b/src/android/camera_device.h
> > > @@ -125,9 +125,12 @@ private:
> > >
> > >  	std::vector<Camera3StreamConfiguration> streamConfigurations_;
> > >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> > > -	std::vector<CameraStream> streams_;
> > >
> > > -	std::mutex mutex_; /* Protect descriptors_ and requests_. */
> > > +	/* Avoid races by concurrent Camera HAL API calls. */
> > > +	std::mutex halMutex_;
> > > +
> > > +	std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
> > > +	std::vector<CameraStream> streams_;
> > >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > >  	std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
> > >
>
> --
> Regards,
>
> Laurent Pinchart
Jacopo Mondi May 3, 2021, 4:04 p.m. UTC | #4
Hi Hiro,

On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> HAL API functions might be called by different threads in Android
> Camera HAL3 API, while process_capture_request() must be called
> by a single thread. Furthermore, close() and flush() can be
> called any time. Therefore, races to variables can occur among
> the HAL API functions.
> This prevents the races by enclosing HAL calls by mutex. It
> might not be the best in terms of the performance, but the
> simplicity is good as a first step and also further development.
>
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
>  	/*
> -	 * \todo Report and identify the stream number or configuration to
> -	 * clarify the stream that failed.
> +	 * \todo Report and identify the stream number or configuration
> +	 * to clarify the stream that failed.
>  	 */
> -	LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
> -			<< toPixelFormat(stream->format).toString() << ")";
> +	LOG(HAL, Error)
> +		<< "Error occurred on frame " << descriptor.frameNumber_ << " ("
> +		<< toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";

That's probably a left-over as the patch does not compile with this
hunk applied

Thanks
  j

>
>  	notify.type = CAMERA3_MSG_ERROR;
>  	notify.message.error.error_stream = stream;
> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> index 95d77890..325fb967 100644
> --- a/src/android/camera_device.h
> +++ b/src/android/camera_device.h
> @@ -125,9 +125,12 @@ private:
>
>  	std::vector<Camera3StreamConfiguration> streamConfigurations_;
>  	std::map<int, libcamera::PixelFormat> formatsMap_;
> -	std::vector<CameraStream> streams_;
>
> -	std::mutex mutex_; /* Protect descriptors_ and requests_. */
> +	/* Avoid races by concurrent Camera HAL API calls. */
> +	std::mutex halMutex_;
> +
> +	std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
> +	std::vector<CameraStream> streams_;
>  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
>  	std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
>
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
> _______________________________________________
> libcamera-devel mailing list
> libcamera-devel@lists.libcamera.org
> https://lists.libcamera.org/listinfo/libcamera-devel
Hirokazu Honda May 10, 2021, 6:48 a.m. UTC | #5
Hi Jacopo and Laurent, thank you for reviewing.

On Tue, May 4, 2021 at 1:02 AM Jacopo Mondi <jacopo@jmondi.org> wrote:

> Hi Laurent,
>
> On Mon, May 03, 2021 at 05:55:38PM +0300, Laurent Pinchart wrote:
> > Hello,
> >
> > On Mon, May 03, 2021 at 04:41:04PM +0200, Jacopo Mondi wrote:
> > > On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> > > > HAL API functions might be called by different threads in Android
> > > > Camera HAL3 API, while process_capture_request() must be called
> > > > by a single thread. Furthermore, close() and flush() can be
> > > > called any time. Therefore, races to variables can occur among
> > > > the HAL API functions.
> > > > This prevents the races by enclosing HAL calls by mutex. It
> > > > might not be the best in terms of the performance, but the
> > > > simplicity is good as a first step and also further development.
> > > >
> > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > ---
> > > >  src/android/camera_device.cpp | 59
> ++++++++++++++++++++---------------
> > > >  src/android/camera_device.h   |  7 +++--
> > > >  2 files changed, 39 insertions(+), 27 deletions(-)
> > > >
> > > > diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
> > > > index b3def04b..96c8d4a9 100644
> > > > --- a/src/android/camera_device.cpp
> > > > +++ b/src/android/camera_device.cpp
> > > > @@ -743,7 +743,7 @@ int CameraDevice::open(const hw_module_t
> *hardwareModule)
> > > >
> > > >  void CameraDevice::close()
> > > >  {
> > > > - streams_.clear();
> > > > + std::scoped_lock<std::mutex> halLock(halMutex_);
> > >
> > > Have you considered locking the wrappers functions in camera_ops.cpp ?
> > > Wouldn't it be simpler ? However locking with such a coarse-grained
> > > granularity I feel contradicts a bit the Camera3 API flush()
> > > documentation, which implies that flush() and
> > > process_capture_request() can race
> > >
> > >      * Due to
> > >      * concurrency issues, it is possible that from the HAL's point of
> view, a
> > >      * process_capture_request() call may be started after flush has
> been invoked but has not
> > >      * returned yet. If such a call happens before flush() returns,
> the HAL should treat the new
> > >      * capture request like other in-flight pending requests (see #4
> below).
> > >
> > > Locking at function level rules out this use case: if we lock each
> > > function it's not possible for process_capture_request() to be started
> > > during flush() execution as the lock hold by flush will be released
> > > only at exit time.
> > >
> > > If serializing operations with this granularity is fine (and I really
> > > with it is, as it would really simplify life for use)
> >
> > I think it is as an initial step, but likely not in the longer term. It
> > defeats the point of having the camera service issue
> > process_capture_request() and stop()/flush() calls concurrently. If
>
> Yes, that was my concern, although the documentation says it's enough
> to implement the full success/full fail behavior which I think it's
> what's happening here.
>
> serializing operations was good enough in every case, it would be done
> > in the camera service itself, and HAL implementations wouldn't need to
> > deal with this.
>
> Also true, indeed the framework allows to implement a finer granularity
> in locking the operations
>
>
Since ChromeOS camera services sealizes the calls, ChromeOS doesn't
require this serialization here.
But shall we assume that a HAL client serializes the call? Since this is a
HAL implementation, I think we should align the HAL specification; no
serialization is ensured.


> >
> > > the only racing
> > > condition we have to protect from is between the request completion
> > > signal emitted by libcamera, and the camera framework calling into the
> > > Camera HAL using the camera3 API as flush(), close() and
> > > process_capture_request() will always be serialized.
> > >
> > > Assuming the above is ok with the Camera3 API, isn't it enough to
> > >
> > > - flush() to call stop() and stop the Camera:
> > >         - All queued requests at the time Camera::stop() is called
> > >           will be returned one by one with an error state. There's no
> > >           reason to cancel the descriptors at stop() time, those
> > >           requests will be completed
> > >
>

That's correct. Well, I would do so just in case to ensure all requests are
returned before flush().
But it should be unnecessary.


> > >         - A process_capture_request() called after flush() has stopped
> > >           the camera will return immeditely as the camera is stopped
> > >           and the capture request is returned with an error
> > >           (we might want a flag to keep track of this, otherwise we'll
> > >           hit a Camera state machine error, much like we do with
> > >           running_)
> > >
>

Hmm, in my understanding, the process_capture_request() should re-start the
camera as it is a usual request.
Why do you think so?


> > >         - A close() called after flush() will be a no op
> >
> > So far it seems fine to me.
>

The critical difference between close() and flush() is whether a camera is
held or not, isn't it?
That is, Camera::release() is called or not.
I agree almost all is done by the preceding flush(), so close() called
after flush() will just call Camera::release().
How do you think?


> >
> > >         - Only open() will allow the CameraDevice to move back from
> > >           stopped_ state
> >
> > That's correct for close(). For flush(), we can to make sure that a
> > configureStreams() call will reopen the camera.
>
> Right, it's probably enough to just reset the state of the new flag in
> configureStreams() and return immediately from
> processCaptureRequest().
>
> My main point should have been that processCaptureRequest() cannot be
> called after a flush() without going through configureStreams() ?
>
> Maybe not true however, considering:
>
>      * flush() should only return when there are no more outstanding
> buffers or
>      * requests left in the HAL. The framework may call configure_streams
> (as
>      * the HAL state is now quiesced) or may issue new requests.
>
> Which seems to suggest after a flush, processCaptureRequest() can be
> called again.
>
>
Yeah, from this, I think we should re-open camera in the first
process_capture_request() after flush() has been called.


> >
> > > - There's a tiny tiny window where a request can complete after flush
> > >   has been called but the camera has not been stopped yet. In
> >
> > It's not that tiny, it can certainly be hit in practice.
> >
> > >   requestComplete() we'll have to inspect the stopped_ flag state and
> > >   force the request to return with error even if it did not.
> >
> > Why can't we complete the request without any error in that case ?
>
> Because flush() has been called already (but the camera has not
> stopped yet?)
>
> Although the documentation reports
>
>      * 1. For captures that are too late for the HAL to cancel/stop, and
> will be
>      *    completed normally by the HAL; i.e. the HAL can send
> shutter/notify and
>      *    process_capture_result and buffers as normal.
>
> It's then probably fine to let those requests just complete
> successfully (even better, it's easier)
>
>
Camera::stop() is a blocking function and calls requestComplete() with all
the pending requests?
In that case, a return request is marked with CANCELLED or if the buffer is
already complete, marked with COMPLETE.
Well, ideally, so that we should not execute a post-processing during flush
and a thread should be separated for the post processing, we should
introduce a way of detecting that flush is being called in
requestComplete().


> >
> > > - Once we move JPEG encoding to a worker thread, stop() will have to
> > >   forcefully stop it. Requests interrupted during encoding will be
> > >   returned with error.
> > >
> > > A simple lock around stopped_ should protect it from concurrent
> > > accesses between stop() and requestComplete().
> >
> > The issue is that the flush() function must wait for all requests to
> > complete. This could be implemented with a condition variable, allowing
> > to avoid serializing CameraDevice::requestComplete() with the API calls
> > from the camera service. I think that would be better indeed.
>
> Ah yes, I overlooked this part
>
>      * flush() should only return when there are no more outstanding
> buffers or
>      * requests left in the HAL.
>
> Actually with a condition variable it might also get easier, as
>
>         flush() {
>                 {
>                         /* mutex lock *
>                         flushing_ = true;
>                 }
>
>                 stop()
>
>                 flushed.wait()
>                 {
>                         /* mutex lock *
>                         flushing_ = false;
>                 }
>         }
>
>
The difficult point is with this other HAL calls (e.g.
process_capture_request() and stop()) can be called during stop().
We can protect it with flushing_. But I think it should be simplified by
guarding two mutexes as this patch does.


> Would allow to held the flushing_ state until all the pending requests
> have been returned as a consequence of stop(). Once that's done we can
> reset the flushing_ flag, and the next processCaptureRequest() can
> continue as normal ?
>
>
Yes, I did so in this patch series though without flushing_, but with a new
mutex.


> Do we have a list of CTS or cros_camera_test that exercizes flush() to
> be used as validation ?
>
>
I think CTS doesn't detect the issue.


> I've tried with this series applied:
>
> # cros_camera_test
> --gtest_filter="Camera3FrameTest/Camera3FlushRequestsTest*"
> [  PASSED  ] 4 tests.
> [  FAILED  ] 8 tests, listed below
>
> Which is the same result I get on the most recent master..
>
>
You need 1954 and 1967 to pass the test.

Thanks,
-Hiro


> Thanks
>    j
>
> > > Have you considered this option and ruled it out for reasons I'm
> > > failing to consider ? It feels simpler than decoupling requests and
> > > descriptors, maybe you tried and found out it's not :)
> > >
> > > >   stop();
> > > >
> > > > @@ -758,12 +758,17 @@ void CameraDevice::stop()
> > > >   worker_.stop();
> > > >   camera_->stop();
> > > >
> > > > - descriptors_.clear();
> > > > + {
> > > > +         std::scoped_lock<std::mutex> lock(mutex_);
> > > > +         descriptors_.clear();
> > > > + }
> > > > +
> > > >   running_ = false;
> > > >  }
> > > >
> > > >  void CameraDevice::setCallbacks(const camera3_callback_ops_t
> *callbacks)
> > > >  {
> > > > + std::scoped_lock<std::mutex> halLock(halMutex_);
> > > >   callbacks_ = callbacks;
> > > >  }
> > > >
> > > > @@ -1643,8 +1648,7 @@ PixelFormat CameraDevice::toPixelFormat(int
> format) const
> > > >   */
> > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t
> *stream_list)
> > > >  {
> > > > - /* Before any configuration attempt, stop the camera. */
> > > > - stop();
> > > > + std::scoped_lock<std::mutex> halLock(halMutex_);
> > > >
> > > >   if (stream_list->num_streams == 0) {
> > > >           LOG(HAL, Error) << "No streams in configuration";
> > > > @@ -1656,6 +1660,9 @@ int
> CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >           return -EINVAL;
> > > >  #endif
> > > >
> > > > + /* Before any configuration attempt, stop the camera. */
> > > > + stop();
> > > > +
> > > >   /*
> > > >    * Generate an empty configuration, and construct a
> StreamConfiguration
> > > >    * for each camera3_stream to add to it.
> > > > @@ -1671,6 +1678,8 @@ int
> CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >    * ensure the required entries are available without further
> > > >    * reallocation.
> > > >    */
> > > > +
> > > > + std::scoped_lock<std::mutex> lock(mutex_);
> > > >   streams_.clear();
> > > >   streams_.reserve(stream_list->num_streams);
> > > >
> > > > @@ -1907,6 +1916,8 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >   if (!isValidRequest(camera3Request))
> > > >           return -EINVAL;
> > > >
> > > > + std::scoped_lock<std::mutex> halLock(halMutex_);
> > > > +
> > > >   /* Start the camera if that's the first request we handle. */
> > > >   if (!running_) {
> > > >           worker_.start();
> > > > @@ -2022,29 +2033,26 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >
> > > >  void CameraDevice::requestComplete(Request *request)
> > > >  {
> > > > + std::scoped_lock<std::mutex> lock(mutex_);
> > > > +
> > > >   const Request::BufferMap &buffers = request->buffers();
> > > >   camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > > >   std::unique_ptr<CameraMetadata> resultMetadata;
> > > >
> > > > - decltype(descriptors_)::node_type node;
> > > > - std::unique_ptr<CaptureRequest> captureRequest;
> > > > - {
> > > > -         std::scoped_lock<std::mutex> lock(mutex_);
> > > > -         auto requestIt = requests_.find(request->cookie());
> > > > -         if (requestIt == requests_.end()) {
> > > > -                 LOG(HAL, Fatal)
> > > > -                         << "Unknown request: " <<
> request->cookie();
> > > > -                 return;
> > > > -         }
> > > > -         captureRequest = std::move(requestIt->second);
> > > > -         requests_.erase(requestIt);
> > > > + auto requestIt = requests_.find(request->cookie());
> > > > + if (requestIt == requests_.end()) {
> > > > +         LOG(HAL, Fatal)
> > > > +                 << "Unknown request: " << request->cookie();
> > > > +         return;
> > > > + }
> > > > + auto captureRequest = std::move(requestIt->second);
> > > > + requests_.erase(requestIt);
> > > >
> > > > -         auto it = descriptors_.find(request->cookie());
> > > > -         if (it == descriptors_.end())
> > > > -                 return;
> > > > + auto it = descriptors_.find(request->cookie());
> > > > + if (it == descriptors_.end())
> > > > +         return;
> > > >
> > > > -         node = descriptors_.extract(it);
> > > > - }
> > > > + auto node = descriptors_.extract(it);
> > > >   Camera3RequestDescriptor &descriptor = node.mapped();
> > > >
> > > >   if (request->status() != Request::RequestComplete) {
> > > > @@ -2149,11 +2157,12 @@ void CameraDevice::notifyError(uint32_t
> frameNumber, camera3_stream_t *stream)
> > > >   camera3_notify_msg_t notify = {};
> > > >
> > > >   /*
> > > > -  * \todo Report and identify the stream number or configuration to
> > > > -  * clarify the stream that failed.
> > > > +  * \todo Report and identify the stream number or configuration
> > > > +  * to clarify the stream that failed.
> > > >    */
> > > > - LOG(HAL, Error) << "Error occurred on frame " << frameNumber << "
> ("
> > > > -                 << toPixelFormat(stream->format).toString() << ")";
> > > > + LOG(HAL, Error)
> > > > +         << "Error occurred on frame " << descriptor.frameNumber_
> << " ("
> > > > +         <<
> toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
> > > >
> > > >   notify.type = CAMERA3_MSG_ERROR;
> > > >   notify.message.error.error_stream = stream;
> > > > diff --git a/src/android/camera_device.h
> b/src/android/camera_device.h
> > > > index 95d77890..325fb967 100644
> > > > --- a/src/android/camera_device.h
> > > > +++ b/src/android/camera_device.h
> > > > @@ -125,9 +125,12 @@ private:
> > > >
> > > >   std::vector<Camera3StreamConfiguration> streamConfigurations_;
> > > >   std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > - std::vector<CameraStream> streams_;
> > > >
> > > > - std::mutex mutex_; /* Protect descriptors_ and requests_. */
> > > > + /* Avoid races by concurrent Camera HAL API calls. */
> > > > + std::mutex halMutex_;
> > > > +
> > > > + std::mutex mutex_; /* Protect streams_, descriptors_ and
> requests_. */
> > > > + std::vector<CameraStream> streams_;
> > > >   std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > >   std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
> > > >
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
>

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index b3def04b..96c8d4a9 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -743,7 +743,7 @@  int CameraDevice::open(const hw_module_t *hardwareModule)
 
 void CameraDevice::close()
 {
-	streams_.clear();
+	std::scoped_lock<std::mutex> halLock(halMutex_);
 
 	stop();
 
@@ -758,12 +758,17 @@  void CameraDevice::stop()
 	worker_.stop();
 	camera_->stop();
 
-	descriptors_.clear();
+	{
+		std::scoped_lock<std::mutex> lock(mutex_);
+		descriptors_.clear();
+	}
+
 	running_ = false;
 }
 
 void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
 {
+	std::scoped_lock<std::mutex> halLock(halMutex_);
 	callbacks_ = callbacks;
 }
 
@@ -1643,8 +1648,7 @@  PixelFormat CameraDevice::toPixelFormat(int format) const
  */
 int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 {
-	/* Before any configuration attempt, stop the camera. */
-	stop();
+	std::scoped_lock<std::mutex> halLock(halMutex_);
 
 	if (stream_list->num_streams == 0) {
 		LOG(HAL, Error) << "No streams in configuration";
@@ -1656,6 +1660,9 @@  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 		return -EINVAL;
 #endif
 
+	/* Before any configuration attempt, stop the camera. */
+	stop();
+
 	/*
 	 * Generate an empty configuration, and construct a StreamConfiguration
 	 * for each camera3_stream to add to it.
@@ -1671,6 +1678,8 @@  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 	 * ensure the required entries are available without further
 	 * reallocation.
 	 */
+
+	std::scoped_lock<std::mutex> lock(mutex_);
 	streams_.clear();
 	streams_.reserve(stream_list->num_streams);
 
@@ -1907,6 +1916,8 @@  int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
 	if (!isValidRequest(camera3Request))
 		return -EINVAL;
 
+	std::scoped_lock<std::mutex> halLock(halMutex_);
+
 	/* Start the camera if that's the first request we handle. */
 	if (!running_) {
 		worker_.start();
@@ -2022,29 +2033,26 @@  int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
 
 void CameraDevice::requestComplete(Request *request)
 {
+	std::scoped_lock<std::mutex> lock(mutex_);
+
 	const Request::BufferMap &buffers = request->buffers();
 	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
 	std::unique_ptr<CameraMetadata> resultMetadata;
 
-	decltype(descriptors_)::node_type node;
-	std::unique_ptr<CaptureRequest> captureRequest;
-	{
-		std::scoped_lock<std::mutex> lock(mutex_);
-		auto requestIt = requests_.find(request->cookie());
-		if (requestIt == requests_.end()) {
-			LOG(HAL, Fatal)
-				<< "Unknown request: " << request->cookie();
-			return;
-		}
-		captureRequest = std::move(requestIt->second);
-		requests_.erase(requestIt);
+	auto requestIt = requests_.find(request->cookie());
+	if (requestIt == requests_.end()) {
+		LOG(HAL, Fatal)
+			<< "Unknown request: " << request->cookie();
+		return;
+	}
+	auto captureRequest = std::move(requestIt->second);
+	requests_.erase(requestIt);
 
-		auto it = descriptors_.find(request->cookie());
-		if (it == descriptors_.end())
-			return;
+	auto it = descriptors_.find(request->cookie());
+	if (it == descriptors_.end())
+		return;
 
-		node = descriptors_.extract(it);
-	}
+	auto node = descriptors_.extract(it);
 	Camera3RequestDescriptor &descriptor = node.mapped();
 
 	if (request->status() != Request::RequestComplete) {
@@ -2149,11 +2157,12 @@  void CameraDevice::notifyError(uint32_t frameNumber, camera3_stream_t *stream)
 	camera3_notify_msg_t notify = {};
 
 	/*
-	 * \todo Report and identify the stream number or configuration to
-	 * clarify the stream that failed.
+	 * \todo Report and identify the stream number or configuration
+	 * to clarify the stream that failed.
 	 */
-	LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
-			<< toPixelFormat(stream->format).toString() << ")";
+	LOG(HAL, Error)
+		<< "Error occurred on frame " << descriptor.frameNumber_ << " ("
+		<< toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
 
 	notify.type = CAMERA3_MSG_ERROR;
 	notify.message.error.error_stream = stream;
diff --git a/src/android/camera_device.h b/src/android/camera_device.h
index 95d77890..325fb967 100644
--- a/src/android/camera_device.h
+++ b/src/android/camera_device.h
@@ -125,9 +125,12 @@  private:
 
 	std::vector<Camera3StreamConfiguration> streamConfigurations_;
 	std::map<int, libcamera::PixelFormat> formatsMap_;
-	std::vector<CameraStream> streams_;
 
-	std::mutex mutex_; /* Protect descriptors_ and requests_. */
+	/* Avoid races by concurrent Camera HAL API calls. */
+	std::mutex halMutex_;
+
+	std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
+	std::vector<CameraStream> streams_;
 	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
 	std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;