[libcamera-devel,v2,1/2] android: CameraDevice: Prevent race due to concurrent HAL calls
diff mbox series

Message ID 20210423040738.1227220-2-hiroh@chromium.org
State Superseded
Headers show
Series
  • Support HAL3 API flush
Related show

Commit Message

Hirokazu Honda April 23, 2021, 4:07 a.m. UTC
HAL API functions might be called by different threads in Android
Camera HAL3 API, while process_capture_request() must be called
by a single thread. Furthermore, close() and flush() can be
called any time. Therefore, races to variables can occur among
the HAL API functions.
This prevents the races by enclosing HAL calls by mutex. It
might not be the best in terms of the performance, but the
simplicity is good as a first step and also further development.

Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
---
 src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
 src/android/camera_device.h   |  9 ++++-
 2 files changed, 50 insertions(+), 29 deletions(-)

Comments

Laurent Pinchart April 26, 2021, 12:30 a.m. UTC | #1
Hi Hiro,

Thank you for the patch.

On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> HAL API functions might be called by different threads in Android
> Camera HAL3 API, while process_capture_request() must be called
> by a single thread.

I'm not sure to follow you here. Do you mean that the HAL API requires
the camera service to call process_capture_request() from a single
thread ? Or do you mean that the libcamera HAL implementation assumes
that all calls come from the same thread ?

> Furthermore, close() and flush() can be
> called any time. Therefore, races to variables can occur among
> the HAL API functions.

I'd like to have a bit more details about calling contexts, and record
it in the commit message, or, actually better, in a comment at the top
of the file, as this is very important. There's unfortunately little
information about call contexts in the HAL documentation (camera3.h).
Here's what I've found:

     * flush() may be called concurrently to process_capture_request(), with the expectation that
     * process_capture_request will return quickly and the request submitted in that
     * process_capture_request call is treated like all other in-flight requests.  Due to
     * concurrency issues, it is possible that from the HAL's point of view, a
     * process_capture_request() call may be started after flush has been invoked but has not
     * returned yet. If such a call happens before flush() returns, the HAL should treat the new
     * capture request like other in-flight pending requests (see #4 below).

This means that flush() can be called from a different thread than
process_capture_request(). I think this makes sense, and we need to
handle that.

     * process_capture_request:
     *
     * Send a new capture request to the HAL. The HAL should not return from
     * this call until it is ready to accept the next request to process. Only
     * one call to process_capture_request() will be made at a time by the
     * framework, and the calls will all be from the same thread. The next call
     * to process_capture_request() will be made as soon as a new request and
     * its associated buffers are available. In a normal preview scenario, this
     * means the function will be called again by the framework almost
     * instantly.

There's only one call to process_capture_request() at a time, so we
don't need to serialize process_capture_request() calls.

That's pretty much all I've found. Is there any other documentation
you're aware of, information coming from discussions with the Android
camera team, or from reading the Android camera service implementation ?

> This prevents the races by enclosing HAL calls by mutex. It
> might not be the best in terms of the performance, but the
> simplicity is good as a first step and also further development.
> 
> Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> ---
>  src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
>  src/android/camera_device.h   |  9 ++++-
>  2 files changed, 50 insertions(+), 29 deletions(-)
> 
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index a71aee2f..c74dc0e7 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>  
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> -
> -	stop();
> +	std::scoped_lock<std::mutex> lock(mutex_);
>  
> -	camera_->release();
> +	stop(/*releaseCamera=*/true);

The camera is only released here. Can't we keep the camera_->release()
call here, to avoid the bool parameter to stop() ?

>  }
>  
> -void CameraDevice::stop()
> +void CameraDevice::stop(bool releaseCamera)
>  {
> -	if (!running_)
> +	streams_.clear();
> +
> +	if (!running_) {
> +		if (releaseCamera) {
> +			descriptors_.clear();
> +			camera_->release();
> +		}
>  		return;
> +	}
> +
> +	stopping_ = true;
>  
>  	worker_.stop();
>  	camera_->stop();
>  
>  	descriptors_.clear();
> +
>  	running_ = false;
> +	stopping_ = false;
> +
> +	if (releaseCamera)
> +		camera_->release();
>  }
>  
>  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
>  {
> +	std::scoped_lock<std::mutex> lock(mutex_);
>  	callbacks_ = callbacks;

This function is called by the HAL .initialize() operation. I really
don't think it could race with anything else. While it's not explicitly
documented, I don't see how it would be reasonable for the camera
service to call .initialize() from one thread, and other HAL functions
from different threads before .initialize() returns.

>  }
>  
> @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
>   */
>  const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
>  {
> +	std::scoped_lock<std::mutex> lock(mutex_);
> +

This function shouldn't affect the state of the camera, does it need to
be serialized ?

>  	auto it = requestTemplates_.find(type);
>  	if (it != requestTemplates_.end())
>  		return it->second->get();
> @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
>   */
>  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  {
> -	/* Before any configuration attempt, stop the camera. */
> -	stop();
> +	std::scoped_lock<std::mutex> lock(mutex_);

Same comment here, is this something that can race with other calls ?

This being said, I'm not opposed to locking the mutex here and in
CameraDevice::setCallbacks(). Locks are best documented as what data
they protect, and using them consistently with their documentation makes
it easier to check correct locking (this includes making it easier for
static code checkers to catch issues). As taking the lock here should
not cause performance issues (especially given that there should be no
concurrent call to this function), we can keep it for consistency.

>  
>  	if (stream_list->num_streams == 0) {
>  		LOG(HAL, Error) << "No streams in configuration";
> @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  		return -EINVAL;
>  #endif
>  
> +	/* Before any configuration attempt, stop the camera. */
> +	stop();
> +
>  	/*
>  	 * Generate an empty configuration, and construct a StreamConfiguration
>  	 * for each camera3_stream to add to it.
> @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  	}
>  
>  	/*
> -	 * Clear and remove any existing configuration from previous calls, and
> -	 * ensure the required entries are available without further
> +	 * Ensure the required entries are available without further
>  	 * reallocation.
>  	 */
> -	streams_.clear();
> +	ASSERT(streams_.empty());
>  	streams_.reserve(stream_list->num_streams);
>  
>  	std::vector<Camera3StreamConfig> streamConfigs;
> @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  	if (!isValidRequest(camera3Request))
>  		return -EINVAL;
>  
> +	std::scoped_lock<std::mutex> lock(mutex_);
> +
>  	/* Start the camera if that's the first request we handle. */
>  	if (!running_) {
>  		worker_.start();
> @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  
>  	worker_.queueRequest(descriptor.request_.get());
>  
> -	{
> -		std::scoped_lock<std::mutex> lock(mutex_);
> -		descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> -	}
> +	descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>  
>  	return 0;
>  }
>  
>  void CameraDevice::requestComplete(Request *request)
>  {
> +	if (stopping_)
> +		return;
> +
> +	std::scoped_lock<std::mutex> lock(mutex_);

The requestComplete signal is emitted from the CameraManager thread, and
must not block for a large amount of time. We already lock the mutex_ in
this function, but we currently use the mutex_ only to protect
descriptors_, which should never block for long. Extending mutex_ to
serialize the Camera calls is dangerous. Could we implement better
locking ? I think the first step would be to clearly explain what races
we need to handle, as mentioned above in the review of the commit
message. Race conditions are tricky to handle right, proper
documentation is a key.

I know we already have issues with JPEG compression that is performed
synchronously in this function. This needs to be fixed, but let's not
make it worse and introduce another issue that will then need to be
fixed too.

> +
>  	const Request::BufferMap &buffers = request->buffers();
>  	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
>  	std::unique_ptr<CameraMetadata> resultMetadata;
>  
> -	decltype(descriptors_)::node_type node;
> -	{
> -		std::scoped_lock<std::mutex> lock(mutex_);
> -		auto it = descriptors_.find(request->cookie());
> -		if (it == descriptors_.end()) {
> -			LOG(HAL, Fatal)
> -				<< "Unknown request: " << request->cookie();
> -			status = CAMERA3_BUFFER_STATUS_ERROR;
> -			return;
> -		}
> -
> -		node = descriptors_.extract(it);
> +	auto it = descriptors_.find(request->cookie());
> +	if (it == descriptors_.end()) {
> +		LOG(HAL, Debug)
> +			<< "Unknown request: " << request->cookie();
> +		status = CAMERA3_BUFFER_STATUS_ERROR;
> +		return;
>  	}
> +
> +	auto node = descriptors_.extract(it);
>  	Camera3RequestDescriptor &descriptor = node.mapped();
>  
>  	if (request->status() != Request::RequestComplete) {
> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> index c63e8e21..08553584 100644
> --- a/src/android/camera_device.h
> +++ b/src/android/camera_device.h
> @@ -88,7 +88,7 @@ private:
>  		int androidFormat;
>  	};
>  
> -	void stop();
> +	void stop(bool releaseCamera = false);
>  
>  	int initializeStreamConfigurations();
>  	std::vector<libcamera::Size>
> @@ -127,7 +127,12 @@ private:
>  	std::map<int, libcamera::PixelFormat> formatsMap_;
>  	std::vector<CameraStream> streams_;
>  
> -	std::mutex mutex_; /* Protect descriptors_ */
> +	/*
> +	 * Protect descriptors_ and camera_, and also avoid races by concurrent
> +	 * Camera HAL API calls.
> +	 */
> +	std::mutex mutex_;
> +	std::atomic_bool stopping_;
>  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
>  
>  	std::string maker_;
Hirokazu Honda April 26, 2021, 2:48 a.m. UTC | #2
Hi Laurent, thanks for the comments.

On Mon, Apr 26, 2021 at 9:30 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Hiro,
>
> Thank you for the patch.
>
> On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> > HAL API functions might be called by different threads in Android
> > Camera HAL3 API, while process_capture_request() must be called
> > by a single thread.
>
> I'm not sure to follow you here. Do you mean that the HAL API requires
> the camera service to call process_capture_request() from a single
> thread ? Or do you mean that the libcamera HAL implementation assumes
> that all calls come from the same thread ?
>

I mean process_capture_request() only. The document says.
      * ... Only
      * one call to process_capture_request() will be made at a time by the
      * framework, and the calls will all be from the same thread.


> > Furthermore, close() and flush() can be
> > called any time. Therefore, races to variables can occur among
> > the HAL API functions.
>
> I'd like to have a bit more details about calling contexts, and record
> it in the commit message, or, actually better, in a comment at the top
> of the file, as this is very important. There's unfortunately little
> information about call contexts in the HAL documentation (camera3.h).
> Here's what I've found:
>
>      * flush() may be called concurrently to process_capture_request(), with the expectation that
>      * process_capture_request will return quickly and the request submitted in that
>      * process_capture_request call is treated like all other in-flight requests.  Due to
>      * concurrency issues, it is possible that from the HAL's point of view, a
>      * process_capture_request() call may be started after flush has been invoked but has not
>      * returned yet. If such a call happens before flush() returns, the HAL should treat the new
>      * capture request like other in-flight pending requests (see #4 below).
>
> This means that flush() can be called from a different thread than
> process_capture_request(). I think this makes sense, and we need to
> handle that.
>
>      * process_capture_request:
>      *
>      * Send a new capture request to the HAL. The HAL should not return from
>      * this call until it is ready to accept the next request to process. Only
>      * one call to process_capture_request() will be made at a time by the
>      * framework, and the calls will all be from the same thread. The next call
>      * to process_capture_request() will be made as soon as a new request and
>      * its associated buffers are available. In a normal preview scenario, this
>      * means the function will be called again by the framework almost
>      * instantly.
>
> There's only one call to process_capture_request() at a time, so we
> don't need to serialize process_capture_request() calls.
>
> That's pretty much all I've found. Is there any other documentation
> you're aware of, information coming from discussions with the Android
> camera team, or from reading the Android camera service implementation ?
>

From Android camera team, flush() and close() can be called at any time.
Yes, there is no race among process_capture_request() calls. However,
the HAL functions must be serialized due to the any time of flush()
and close() call.

> > This prevents the races by enclosing HAL calls by mutex. It
> > might not be the best in terms of the performance, but the
> > simplicity is good as a first step and also further development.
> >
> > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > ---
> >  src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
> >  src/android/camera_device.h   |  9 ++++-
> >  2 files changed, 50 insertions(+), 29 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index a71aee2f..c74dc0e7 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> >  void CameraDevice::close()
> >  {
> > -     streams_.clear();
> > -
> > -     stop();
> > +     std::scoped_lock<std::mutex> lock(mutex_);
> >
> > -     camera_->release();
> > +     stop(/*releaseCamera=*/true);
>
> The camera is only released here. Can't we keep the camera_->release()
> call here, to avoid the bool parameter to stop() ?
>

Thanks, it should be better. I will move in the next patch.

> >  }
> >
> > -void CameraDevice::stop()
> > +void CameraDevice::stop(bool releaseCamera)
> >  {
> > -     if (!running_)
> > +     streams_.clear();
> > +
> > +     if (!running_) {
> > +             if (releaseCamera) {
> > +                     descriptors_.clear();
> > +                     camera_->release();
> > +             }
> >               return;
> > +     }
> > +
> > +     stopping_ = true;
> >
> >       worker_.stop();
> >       camera_->stop();
> >
> >       descriptors_.clear();
> > +
> >       running_ = false;
> > +     stopping_ = false;
> > +
> > +     if (releaseCamera)
> > +             camera_->release();
> >  }
> >
> >  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> >  {
> > +     std::scoped_lock<std::mutex> lock(mutex_);
> >       callbacks_ = callbacks;
>
> This function is called by the HAL .initialize() operation. I really
> don't think it could race with anything else. While it's not explicitly
> documented, I don't see how it would be reasonable for the camera
> service to call .initialize() from one thread, and other HAL functions
> from different threads before .initialize() returns.
>

Thanks. Yeah, I am sure this function will not be concurrently called
with other functions.
However, callbacks_ will be called in stop(), I wonder it is worth
having a seliralization here also thinking of consistency.
How do you think?

> >  }
> >
> > @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
> >   */
> >  const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
> >  {
> > +     std::scoped_lock<std::mutex> lock(mutex_);
> > +
>
> This function shouldn't affect the state of the camera, does it need to
> be serialized ?
>

Thanks. Even thinking of the consistency, having a serialization in
the function is not reasonable.
I will remove mutex_ in this function.

> >       auto it = requestTemplates_.find(type);
> >       if (it != requestTemplates_.end())
> >               return it->second->get();
> > @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> >   */
> >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >  {
> > -     /* Before any configuration attempt, stop the camera. */
> > -     stop();
> > +     std::scoped_lock<std::mutex> lock(mutex_);
>
> Same comment here, is this something that can race with other calls ?
>
> This being said, I'm not opposed to locking the mutex here and in
> CameraDevice::setCallbacks(). Locks are best documented as what data
> they protect, and using them consistently with their documentation makes
> it easier to check correct locking (this includes making it easier for
> static code checkers to catch issues). As taking the lock here should
> not cause performance issues (especially given that there should be no
> concurrent call to this function), we can keep it for consistency.
>

I think flush() and close() can be called during configureStreams() call.
I think it is mandatory to have a lock at least somewhere in
process_capture_request(). If we don't enclose
process_capture_request() by lock, handling flush() and concurrent
process_capture_request() might become a bit complicated (though I can
still do it, I actually implemented so initially but discarded because
of the complexity).

> >
> >       if (stream_list->num_streams == 0) {
> >               LOG(HAL, Error) << "No streams in configuration";
> > @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >               return -EINVAL;
> >  #endif
> >
> > +     /* Before any configuration attempt, stop the camera. */
> > +     stop();
> > +
> >       /*
> >        * Generate an empty configuration, and construct a StreamConfiguration
> >        * for each camera3_stream to add to it.
> > @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >       }
> >
> >       /*
> > -      * Clear and remove any existing configuration from previous calls, and
> > -      * ensure the required entries are available without further
> > +      * Ensure the required entries are available without further
> >        * reallocation.
> >        */
> > -     streams_.clear();
> > +     ASSERT(streams_.empty());
> >       streams_.reserve(stream_list->num_streams);
> >
> >       std::vector<Camera3StreamConfig> streamConfigs;
> > @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >       if (!isValidRequest(camera3Request))
> >               return -EINVAL;
> >
> > +     std::scoped_lock<std::mutex> lock(mutex_);
> > +
> >       /* Start the camera if that's the first request we handle. */
> >       if (!running_) {
> >               worker_.start();
> > @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >
> >       worker_.queueRequest(descriptor.request_.get());
> >
> > -     {
> > -             std::scoped_lock<std::mutex> lock(mutex_);
> > -             descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > -     }
> > +     descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> >
> >       return 0;
> >  }
> >
> >  void CameraDevice::requestComplete(Request *request)
> >  {
> > +     if (stopping_)
> > +             return;
> > +
> > +     std::scoped_lock<std::mutex> lock(mutex_);
>
> The requestComplete signal is emitted from the CameraManager thread, and
> must not block for a large amount of time. We already lock the mutex_ in
> this function, but we currently use the mutex_ only to protect
> descriptors_, which should never block for long. Extending mutex_ to
> serialize the Camera calls is dangerous. Could we implement better
> locking ? I think the first step would be to clearly explain what races
> we need to handle, as mentioned above in the review of the commit
> message. Race conditions are tricky to handle right, proper
> documentation is a key.
>
> I know we already have issues with JPEG compression that is performed
> synchronously in this function. This needs to be fixed, but let's not
> make it worse and introduce another issue that will then need to be
> fixed too.
>

I agree.
Since requestComplete() doesn't have to be serialized with HAL
functions, we don't need to have a lock from begin to end.
My first thought is coarse lock is a good first step as fine lock
might make the code complicated.
I will next try a fine lock in this function.

-Hiro

> > +
> >       const Request::BufferMap &buffers = request->buffers();
> >       camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> >       std::unique_ptr<CameraMetadata> resultMetadata;
> >
> > -     decltype(descriptors_)::node_type node;
> > -     {
> > -             std::scoped_lock<std::mutex> lock(mutex_);
> > -             auto it = descriptors_.find(request->cookie());
> > -             if (it == descriptors_.end()) {
> > -                     LOG(HAL, Fatal)
> > -                             << "Unknown request: " << request->cookie();
> > -                     status = CAMERA3_BUFFER_STATUS_ERROR;
> > -                     return;
> > -             }
> > -
> > -             node = descriptors_.extract(it);
> > +     auto it = descriptors_.find(request->cookie());
> > +     if (it == descriptors_.end()) {
> > +             LOG(HAL, Debug)
> > +                     << "Unknown request: " << request->cookie();
> > +             status = CAMERA3_BUFFER_STATUS_ERROR;
> > +             return;
> >       }
> > +
> > +     auto node = descriptors_.extract(it);
> >       Camera3RequestDescriptor &descriptor = node.mapped();
> >
> >       if (request->status() != Request::RequestComplete) {
> > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > index c63e8e21..08553584 100644
> > --- a/src/android/camera_device.h
> > +++ b/src/android/camera_device.h
> > @@ -88,7 +88,7 @@ private:
> >               int androidFormat;
> >       };
> >
> > -     void stop();
> > +     void stop(bool releaseCamera = false);
> >
> >       int initializeStreamConfigurations();
> >       std::vector<libcamera::Size>
> > @@ -127,7 +127,12 @@ private:
> >       std::map<int, libcamera::PixelFormat> formatsMap_;
> >       std::vector<CameraStream> streams_;
> >
> > -     std::mutex mutex_; /* Protect descriptors_ */
> > +     /*
> > +      * Protect descriptors_ and camera_, and also avoid races by concurrent
> > +      * Camera HAL API calls.
> > +      */
> > +     std::mutex mutex_;
> > +     std::atomic_bool stopping_;
> >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> >
> >       std::string maker_;
>
> --
> Regards,
>
> Laurent Pinchart
Laurent Pinchart April 26, 2021, 6:47 a.m. UTC | #3
Hi Hiro,

On Mon, Apr 26, 2021 at 11:48:02AM +0900, Hirokazu Honda wrote:
> On Mon, Apr 26, 2021 at 9:30 AM Laurent Pinchart wrote:
> > On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> > > HAL API functions might be called by different threads in Android
> > > Camera HAL3 API, while process_capture_request() must be called
> > > by a single thread.
> >
> > I'm not sure to follow you here. Do you mean that the HAL API requires
> > the camera service to call process_capture_request() from a single
> > thread ? Or do you mean that the libcamera HAL implementation assumes
> > that all calls come from the same thread ?
> 
> I mean process_capture_request() only. The document says.
>       * ... Only
>       * one call to process_capture_request() will be made at a time by the
>       * framework, and the calls will all be from the same thread.
> 
> 
> > > Furthermore, close() and flush() can be
> > > called any time. Therefore, races to variables can occur among
> > > the HAL API functions.
> >
> > I'd like to have a bit more details about calling contexts, and record
> > it in the commit message, or, actually better, in a comment at the top
> > of the file, as this is very important. There's unfortunately little
> > information about call contexts in the HAL documentation (camera3.h).
> > Here's what I've found:
> >
> >      * flush() may be called concurrently to process_capture_request(), with the expectation that
> >      * process_capture_request will return quickly and the request submitted in that
> >      * process_capture_request call is treated like all other in-flight requests.  Due to
> >      * concurrency issues, it is possible that from the HAL's point of view, a
> >      * process_capture_request() call may be started after flush has been invoked but has not
> >      * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> >      * capture request like other in-flight pending requests (see #4 below).
> >
> > This means that flush() can be called from a different thread than
> > process_capture_request(). I think this makes sense, and we need to
> > handle that.
> >
> >      * process_capture_request:
> >      *
> >      * Send a new capture request to the HAL. The HAL should not return from
> >      * this call until it is ready to accept the next request to process. Only
> >      * one call to process_capture_request() will be made at a time by the
> >      * framework, and the calls will all be from the same thread. The next call
> >      * to process_capture_request() will be made as soon as a new request and
> >      * its associated buffers are available. In a normal preview scenario, this
> >      * means the function will be called again by the framework almost
> >      * instantly.
> >
> > There's only one call to process_capture_request() at a time, so we
> > don't need to serialize process_capture_request() calls.
> >
> > That's pretty much all I've found. Is there any other documentation
> > you're aware of, information coming from discussions with the Android
> > camera team, or from reading the Android camera service implementation ?
> 
> From Android camera team, flush() and close() can be called at any time.
> Yes, there is no race among process_capture_request() calls. However,
> the HAL functions must be serialized due to the any time of flush()
> and close() call.

flush() and close() can race with process_capture_request(). But can
they race each other ? Can they race with configure_streams() ?

> > > This prevents the races by enclosing HAL calls by mutex. It
> > > might not be the best in terms of the performance, but the
> > > simplicity is good as a first step and also further development.
> > >
> > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > ---
> > >  src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
> > >  src/android/camera_device.h   |  9 ++++-
> > >  2 files changed, 50 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index a71aee2f..c74dc0e7 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > >
> > >  void CameraDevice::close()
> > >  {
> > > -     streams_.clear();
> > > -
> > > -     stop();
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > >
> > > -     camera_->release();
> > > +     stop(/*releaseCamera=*/true);
> >
> > The camera is only released here. Can't we keep the camera_->release()
> > call here, to avoid the bool parameter to stop() ?
> 
> Thanks, it should be better. I will move in the next patch.
> 
> > >  }
> > >
> > > -void CameraDevice::stop()
> > > +void CameraDevice::stop(bool releaseCamera)
> > >  {
> > > -     if (!running_)
> > > +     streams_.clear();
> > > +
> > > +     if (!running_) {
> > > +             if (releaseCamera) {
> > > +                     descriptors_.clear();
> > > +                     camera_->release();
> > > +             }
> > >               return;
> > > +     }
> > > +
> > > +     stopping_ = true;
> > >
> > >       worker_.stop();
> > >       camera_->stop();
> > >
> > >       descriptors_.clear();
> > > +
> > >       running_ = false;
> > > +     stopping_ = false;
> > > +
> > > +     if (releaseCamera)
> > > +             camera_->release();
> > >  }
> > >
> > >  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > >  {
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > >       callbacks_ = callbacks;
> >
> > This function is called by the HAL .initialize() operation. I really
> > don't think it could race with anything else. While it's not explicitly
> > documented, I don't see how it would be reasonable for the camera
> > service to call .initialize() from one thread, and other HAL functions
> > from different threads before .initialize() returns.
> 
> Thanks. Yeah, I am sure this function will not be concurrently called
> with other functions.
> However, callbacks_ will be called in stop(), I wonder it is worth
> having a seliralization here also thinking of consistency.
> How do you think?

As commented below, even if we don't need to serialize these calls with
a lock, doing so shouldn't hurt performances, and should make locking
verification easier, so I'm not opposed to it.

> > >  }
> > >
> > > @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
> > >   */
> > >  const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
> > >  {
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> >
> > This function shouldn't affect the state of the camera, does it need to
> > be serialized ?
> 
> Thanks. Even thinking of the consistency, having a serialization in
> the function is not reasonable.
> I will remove mutex_ in this function.
> 
> > >       auto it = requestTemplates_.find(type);
> > >       if (it != requestTemplates_.end())
> > >               return it->second->get();
> > > @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > >   */
> > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >  {
> > > -     /* Before any configuration attempt, stop the camera. */
> > > -     stop();
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> >
> > Same comment here, is this something that can race with other calls ?
> >
> > This being said, I'm not opposed to locking the mutex here and in
> > CameraDevice::setCallbacks(). Locks are best documented as what data
> > they protect, and using them consistently with their documentation makes
> > it easier to check correct locking (this includes making it easier for
> > static code checkers to catch issues). As taking the lock here should
> > not cause performance issues (especially given that there should be no
> > concurrent call to this function), we can keep it for consistency.
> 
> I think flush() and close() can be called during configureStreams() call.

That's interesting. I would have expected those to be serialized. Is
there any documentation that explains this ?

> I think it is mandatory to have a lock at least somewhere in
> process_capture_request(). If we don't enclose
> process_capture_request() by lock, handling flush() and concurrent
> process_capture_request() might become a bit complicated (though I can
> still do it, I actually implemented so initially but discarded because
> of the complexity).
> 
> > >
> > >       if (stream_list->num_streams == 0) {
> > >               LOG(HAL, Error) << "No streams in configuration";
> > > @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >               return -EINVAL;
> > >  #endif
> > >
> > > +     /* Before any configuration attempt, stop the camera. */
> > > +     stop();
> > > +
> > >       /*
> > >        * Generate an empty configuration, and construct a StreamConfiguration
> > >        * for each camera3_stream to add to it.
> > > @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >       }
> > >
> > >       /*
> > > -      * Clear and remove any existing configuration from previous calls, and
> > > -      * ensure the required entries are available without further
> > > +      * Ensure the required entries are available without further
> > >        * reallocation.
> > >        */
> > > -     streams_.clear();
> > > +     ASSERT(streams_.empty());
> > >       streams_.reserve(stream_list->num_streams);
> > >
> > >       std::vector<Camera3StreamConfig> streamConfigs;
> > > @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >       if (!isValidRequest(camera3Request))
> > >               return -EINVAL;
> > >
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> > >       /* Start the camera if that's the first request we handle. */
> > >       if (!running_) {
> > >               worker_.start();
> > > @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >
> > >       worker_.queueRequest(descriptor.request_.get());
> > >
> > > -     {
> > > -             std::scoped_lock<std::mutex> lock(mutex_);
> > > -             descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > -     }
> > > +     descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > >
> > >       return 0;
> > >  }
> > >
> > >  void CameraDevice::requestComplete(Request *request)
> > >  {
> > > +     if (stopping_)
> > > +             return;
> > > +
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> >
> > The requestComplete signal is emitted from the CameraManager thread, and
> > must not block for a large amount of time. We already lock the mutex_ in
> > this function, but we currently use the mutex_ only to protect
> > descriptors_, which should never block for long. Extending mutex_ to
> > serialize the Camera calls is dangerous. Could we implement better
> > locking ? I think the first step would be to clearly explain what races
> > we need to handle, as mentioned above in the review of the commit
> > message. Race conditions are tricky to handle right, proper
> > documentation is a key.
> >
> > I know we already have issues with JPEG compression that is performed
> > synchronously in this function. This needs to be fixed, but let's not
> > make it worse and introduce another issue that will then need to be
> > fixed too.
> 
> I agree.
> Since requestComplete() doesn't have to be serialized with HAL
> functions, we don't need to have a lock from begin to end.
> My first thought is coarse lock is a good first step as fine lock
> might make the code complicated.
> I will next try a fine lock in this function.

Thank you. If it starts causing issues, please feel free to reach out to
discuss them before spending too much time on the implementation.

> > > +
> > >       const Request::BufferMap &buffers = request->buffers();
> > >       camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > >       std::unique_ptr<CameraMetadata> resultMetadata;
> > >
> > > -     decltype(descriptors_)::node_type node;
> > > -     {
> > > -             std::scoped_lock<std::mutex> lock(mutex_);
> > > -             auto it = descriptors_.find(request->cookie());
> > > -             if (it == descriptors_.end()) {
> > > -                     LOG(HAL, Fatal)
> > > -                             << "Unknown request: " << request->cookie();
> > > -                     status = CAMERA3_BUFFER_STATUS_ERROR;
> > > -                     return;
> > > -             }
> > > -
> > > -             node = descriptors_.extract(it);
> > > +     auto it = descriptors_.find(request->cookie());
> > > +     if (it == descriptors_.end()) {
> > > +             LOG(HAL, Debug)
> > > +                     << "Unknown request: " << request->cookie();
> > > +             status = CAMERA3_BUFFER_STATUS_ERROR;
> > > +             return;
> > >       }
> > > +
> > > +     auto node = descriptors_.extract(it);
> > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > >
> > >       if (request->status() != Request::RequestComplete) {
> > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > index c63e8e21..08553584 100644
> > > --- a/src/android/camera_device.h
> > > +++ b/src/android/camera_device.h
> > > @@ -88,7 +88,7 @@ private:
> > >               int androidFormat;
> > >       };
> > >
> > > -     void stop();
> > > +     void stop(bool releaseCamera = false);
> > >
> > >       int initializeStreamConfigurations();
> > >       std::vector<libcamera::Size>
> > > @@ -127,7 +127,12 @@ private:
> > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > >       std::vector<CameraStream> streams_;
> > >
> > > -     std::mutex mutex_; /* Protect descriptors_ */
> > > +     /*
> > > +      * Protect descriptors_ and camera_, and also avoid races by concurrent
> > > +      * Camera HAL API calls.
> > > +      */
> > > +     std::mutex mutex_;
> > > +     std::atomic_bool stopping_;
> > >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > >
> > >       std::string maker_;
Hirokazu Honda April 26, 2021, 7:51 a.m. UTC | #4
Hi Laurent, thanks for commenting.

On Mon, Apr 26, 2021 at 3:48 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Hiro,
>
> On Mon, Apr 26, 2021 at 11:48:02AM +0900, Hirokazu Honda wrote:
> > On Mon, Apr 26, 2021 at 9:30 AM Laurent Pinchart wrote:
> > > On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> > > > HAL API functions might be called by different threads in Android
> > > > Camera HAL3 API, while process_capture_request() must be called
> > > > by a single thread.
> > >
> > > I'm not sure to follow you here. Do you mean that the HAL API requires
> > > the camera service to call process_capture_request() from a single
> > > thread ? Or do you mean that the libcamera HAL implementation assumes
> > > that all calls come from the same thread ?
> >
> > I mean process_capture_request() only. The document says.
> >       * ... Only
> >       * one call to process_capture_request() will be made at a time by the
> >       * framework, and the calls will all be from the same thread.
> >
> >
> > > > Furthermore, close() and flush() can be
> > > > called any time. Therefore, races to variables can occur among
> > > > the HAL API functions.
> > >
> > > I'd like to have a bit more details about calling contexts, and record
> > > it in the commit message, or, actually better, in a comment at the top
> > > of the file, as this is very important. There's unfortunately little
> > > information about call contexts in the HAL documentation (camera3.h).
> > > Here's what I've found:
> > >
> > >      * flush() may be called concurrently to process_capture_request(), with the expectation that
> > >      * process_capture_request will return quickly and the request submitted in that
> > >      * process_capture_request call is treated like all other in-flight requests.  Due to
> > >      * concurrency issues, it is possible that from the HAL's point of view, a
> > >      * process_capture_request() call may be started after flush has been invoked but has not
> > >      * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> > >      * capture request like other in-flight pending requests (see #4 below).
> > >
> > > This means that flush() can be called from a different thread than
> > > process_capture_request(). I think this makes sense, and we need to
> > > handle that.
> > >
> > >      * process_capture_request:
> > >      *
> > >      * Send a new capture request to the HAL. The HAL should not return from
> > >      * this call until it is ready to accept the next request to process. Only
> > >      * one call to process_capture_request() will be made at a time by the
> > >      * framework, and the calls will all be from the same thread. The next call
> > >      * to process_capture_request() will be made as soon as a new request and
> > >      * its associated buffers are available. In a normal preview scenario, this
> > >      * means the function will be called again by the framework almost
> > >      * instantly.
> > >
> > > There's only one call to process_capture_request() at a time, so we
> > > don't need to serialize process_capture_request() calls.
> > >
> > > That's pretty much all I've found. Is there any other documentation
> > > you're aware of, information coming from discussions with the Android
> > > camera team, or from reading the Android camera service implementation ?
> >
> > From Android camera team, flush() and close() can be called at any time.
> > Yes, there is no race among process_capture_request() calls. However,
> > the HAL functions must be serialized due to the any time of flush()
> > and close() call.
>
> flush() and close() can race with process_capture_request(). But can
> they race each other ? Can they race with configure_streams() ?
>

+Ricky Liang tells me flush() and close() can be called at any time.
I have no idea if flush() and close() race each other. But we should
protect other functions with them.
Ricky, is there any document stating this?

> > > > This prevents the races by enclosing HAL calls by mutex. It
> > > > might not be the best in terms of the performance, but the
> > > > simplicity is good as a first step and also further development.
> > > >
> > > > Signed-off-by: Hirokazu Honda <hiroh@chromium.org>
> > > > ---
> > > >  src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
> > > >  src/android/camera_device.h   |  9 ++++-
> > > >  2 files changed, 50 insertions(+), 29 deletions(-)
> > > >
> > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > index a71aee2f..c74dc0e7 100644
> > > > --- a/src/android/camera_device.cpp
> > > > +++ b/src/android/camera_device.cpp
> > > > @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > >
> > > >  void CameraDevice::close()
> > > >  {
> > > > -     streams_.clear();
> > > > -
> > > > -     stop();
> > > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > >
> > > > -     camera_->release();
> > > > +     stop(/*releaseCamera=*/true);
> > >
> > > The camera is only released here. Can't we keep the camera_->release()
> > > call here, to avoid the bool parameter to stop() ?
> >
> > Thanks, it should be better. I will move in the next patch.
> >
> > > >  }
> > > >
> > > > -void CameraDevice::stop()
> > > > +void CameraDevice::stop(bool releaseCamera)
> > > >  {
> > > > -     if (!running_)
> > > > +     streams_.clear();
> > > > +
> > > > +     if (!running_) {
> > > > +             if (releaseCamera) {
> > > > +                     descriptors_.clear();
> > > > +                     camera_->release();
> > > > +             }
> > > >               return;
> > > > +     }
> > > > +
> > > > +     stopping_ = true;
> > > >
> > > >       worker_.stop();
> > > >       camera_->stop();
> > > >
> > > >       descriptors_.clear();
> > > > +
> > > >       running_ = false;
> > > > +     stopping_ = false;
> > > > +
> > > > +     if (releaseCamera)
> > > > +             camera_->release();
> > > >  }
> > > >
> > > >  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > > >  {
> > > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > >       callbacks_ = callbacks;
> > >
> > > This function is called by the HAL .initialize() operation. I really
> > > don't think it could race with anything else. While it's not explicitly
> > > documented, I don't see how it would be reasonable for the camera
> > > service to call .initialize() from one thread, and other HAL functions
> > > from different threads before .initialize() returns.
> >
> > Thanks. Yeah, I am sure this function will not be concurrently called
> > with other functions.
> > However, callbacks_ will be called in stop(), I wonder it is worth
> > having a seliralization here also thinking of consistency.
> > How do you think?
>
> As commented below, even if we don't need to serialize these calls with
> a lock, doing so shouldn't hurt performances, and should make locking
> verification easier, so I'm not opposed to it.
>
> > > >  }
> > > >
> > > > @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
> > > >   */
> > > >  const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
> > > >  {
> > > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > > +
> > >
> > > This function shouldn't affect the state of the camera, does it need to
> > > be serialized ?
> >
> > Thanks. Even thinking of the consistency, having a serialization in
> > the function is not reasonable.
> > I will remove mutex_ in this function.
> >
> > > >       auto it = requestTemplates_.find(type);
> > > >       if (it != requestTemplates_.end())
> > > >               return it->second->get();
> > > > @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > >   */
> > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >  {
> > > > -     /* Before any configuration attempt, stop the camera. */
> > > > -     stop();
> > > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > >
> > > Same comment here, is this something that can race with other calls ?
> > >
> > > This being said, I'm not opposed to locking the mutex here and in
> > > CameraDevice::setCallbacks(). Locks are best documented as what data
> > > they protect, and using them consistently with their documentation makes
> > > it easier to check correct locking (this includes making it easier for
> > > static code checkers to catch issues). As taking the lock here should
> > > not cause performance issues (especially given that there should be no
> > > concurrent call to this function), we can keep it for consistency.
> >
> > I think flush() and close() can be called during configureStreams() call.
>
> That's interesting. I would have expected those to be serialized. Is
> there any documentation that explains this ?
>
> > I think it is mandatory to have a lock at least somewhere in
> > process_capture_request(). If we don't enclose
> > process_capture_request() by lock, handling flush() and concurrent
> > process_capture_request() might become a bit complicated (though I can
> > still do it, I actually implemented so initially but discarded because
> > of the complexity).
> >
> > > >
> > > >       if (stream_list->num_streams == 0) {
> > > >               LOG(HAL, Error) << "No streams in configuration";
> > > > @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >               return -EINVAL;
> > > >  #endif
> > > >
> > > > +     /* Before any configuration attempt, stop the camera. */
> > > > +     stop();
> > > > +
> > > >       /*
> > > >        * Generate an empty configuration, and construct a StreamConfiguration
> > > >        * for each camera3_stream to add to it.
> > > > @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >       }
> > > >
> > > >       /*
> > > > -      * Clear and remove any existing configuration from previous calls, and
> > > > -      * ensure the required entries are available without further
> > > > +      * Ensure the required entries are available without further
> > > >        * reallocation.
> > > >        */
> > > > -     streams_.clear();
> > > > +     ASSERT(streams_.empty());
> > > >       streams_.reserve(stream_list->num_streams);
> > > >
> > > >       std::vector<Camera3StreamConfig> streamConfigs;
> > > > @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >       if (!isValidRequest(camera3Request))
> > > >               return -EINVAL;
> > > >
> > > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > > +
> > > >       /* Start the camera if that's the first request we handle. */
> > > >       if (!running_) {
> > > >               worker_.start();
> > > > @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >
> > > >       worker_.queueRequest(descriptor.request_.get());
> > > >
> > > > -     {
> > > > -             std::scoped_lock<std::mutex> lock(mutex_);
> > > > -             descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > > -     }
> > > > +     descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > >
> > > >       return 0;
> > > >  }
> > > >
> > > >  void CameraDevice::requestComplete(Request *request)
> > > >  {
> > > > +     if (stopping_)
> > > > +             return;
> > > > +
> > > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > >
> > > The requestComplete signal is emitted from the CameraManager thread, and
> > > must not block for a large amount of time. We already lock the mutex_ in
> > > this function, but we currently use the mutex_ only to protect
> > > descriptors_, which should never block for long. Extending mutex_ to
> > > serialize the Camera calls is dangerous. Could we implement better
> > > locking ? I think the first step would be to clearly explain what races
> > > we need to handle, as mentioned above in the review of the commit
> > > message. Race conditions are tricky to handle right, proper
> > > documentation is a key.
> > >
> > > I know we already have issues with JPEG compression that is performed
> > > synchronously in this function. This needs to be fixed, but let's not
> > > make it worse and introduce another issue that will then need to be
> > > fixed too.
> >
> > I agree.
> > Since requestComplete() doesn't have to be serialized with HAL
> > functions, we don't need to have a lock from begin to end.
> > My first thought is coarse lock is a good first step as fine lock
> > might make the code complicated.
> > I will next try a fine lock in this function.
>
> Thank you. If it starts causing issues, please feel free to reach out to
> discuss them before spending too much time on the implementation.
>

While I tried improving the grain of lock, I found requestComplete()
can touch an invalid request.
This is caused because descriptors_ can be destroyed in stop(), but
requestComplete() needs to call request->cookie() in order to check
it.
I think the easiest way of avoiding this is to separate request from
descriptor and destroy it always in requestComplete().
Please see the next patch series.

-Hiro

> > > > +
> > > >       const Request::BufferMap &buffers = request->buffers();
> > > >       camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > > >       std::unique_ptr<CameraMetadata> resultMetadata;
> > > >
> > > > -     decltype(descriptors_)::node_type node;
> > > > -     {
> > > > -             std::scoped_lock<std::mutex> lock(mutex_);
> > > > -             auto it = descriptors_.find(request->cookie());
> > > > -             if (it == descriptors_.end()) {
> > > > -                     LOG(HAL, Fatal)
> > > > -                             << "Unknown request: " << request->cookie();
> > > > -                     status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > -                     return;
> > > > -             }
> > > > -
> > > > -             node = descriptors_.extract(it);
> > > > +     auto it = descriptors_.find(request->cookie());
> > > > +     if (it == descriptors_.end()) {
> > > > +             LOG(HAL, Debug)
> > > > +                     << "Unknown request: " << request->cookie();
> > > > +             status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > +             return;
> > > >       }
> > > > +
> > > > +     auto node = descriptors_.extract(it);
> > > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > > >
> > > >       if (request->status() != Request::RequestComplete) {
> > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > > index c63e8e21..08553584 100644
> > > > --- a/src/android/camera_device.h
> > > > +++ b/src/android/camera_device.h
> > > > @@ -88,7 +88,7 @@ private:
> > > >               int androidFormat;
> > > >       };
> > > >
> > > > -     void stop();
> > > > +     void stop(bool releaseCamera = false);
> > > >
> > > >       int initializeStreamConfigurations();
> > > >       std::vector<libcamera::Size>
> > > > @@ -127,7 +127,12 @@ private:
> > > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > > >       std::vector<CameraStream> streams_;
> > > >
> > > > -     std::mutex mutex_; /* Protect descriptors_ */
> > > > +     /*
> > > > +      * Protect descriptors_ and camera_, and also avoid races by concurrent
> > > > +      * Camera HAL API calls.
> > > > +      */
> > > > +     std::mutex mutex_;
> > > > +     std::atomic_bool stopping_;
> > > >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > >
> > > >       std::string maker_;
>
> --
> Regards,
>
> Laurent Pinchart

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index a71aee2f..c74dc0e7 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -748,27 +748,40 @@  int CameraDevice::open(const hw_module_t *hardwareModule)
 
 void CameraDevice::close()
 {
-	streams_.clear();
-
-	stop();
+	std::scoped_lock<std::mutex> lock(mutex_);
 
-	camera_->release();
+	stop(/*releaseCamera=*/true);
 }
 
-void CameraDevice::stop()
+void CameraDevice::stop(bool releaseCamera)
 {
-	if (!running_)
+	streams_.clear();
+
+	if (!running_) {
+		if (releaseCamera) {
+			descriptors_.clear();
+			camera_->release();
+		}
 		return;
+	}
+
+	stopping_ = true;
 
 	worker_.stop();
 	camera_->stop();
 
 	descriptors_.clear();
+
 	running_ = false;
+	stopping_ = false;
+
+	if (releaseCamera)
+		camera_->release();
 }
 
 void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
 {
+	std::scoped_lock<std::mutex> lock(mutex_);
 	callbacks_ = callbacks;
 }
 
@@ -1581,6 +1594,8 @@  std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
  */
 const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
 {
+	std::scoped_lock<std::mutex> lock(mutex_);
+
 	auto it = requestTemplates_.find(type);
 	if (it != requestTemplates_.end())
 		return it->second->get();
@@ -1648,8 +1663,7 @@  PixelFormat CameraDevice::toPixelFormat(int format) const
  */
 int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 {
-	/* Before any configuration attempt, stop the camera. */
-	stop();
+	std::scoped_lock<std::mutex> lock(mutex_);
 
 	if (stream_list->num_streams == 0) {
 		LOG(HAL, Error) << "No streams in configuration";
@@ -1661,6 +1675,9 @@  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 		return -EINVAL;
 #endif
 
+	/* Before any configuration attempt, stop the camera. */
+	stop();
+
 	/*
 	 * Generate an empty configuration, and construct a StreamConfiguration
 	 * for each camera3_stream to add to it.
@@ -1672,11 +1689,10 @@  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 	}
 
 	/*
-	 * Clear and remove any existing configuration from previous calls, and
-	 * ensure the required entries are available without further
+	 * Ensure the required entries are available without further
 	 * reallocation.
 	 */
-	streams_.clear();
+	ASSERT(streams_.empty());
 	streams_.reserve(stream_list->num_streams);
 
 	std::vector<Camera3StreamConfig> streamConfigs;
@@ -1912,6 +1928,8 @@  int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
 	if (!isValidRequest(camera3Request))
 		return -EINVAL;
 
+	std::scoped_lock<std::mutex> lock(mutex_);
+
 	/* Start the camera if that's the first request we handle. */
 	if (!running_) {
 		worker_.start();
@@ -2015,33 +2033,31 @@  int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
 
 	worker_.queueRequest(descriptor.request_.get());
 
-	{
-		std::scoped_lock<std::mutex> lock(mutex_);
-		descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
-	}
+	descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
 
 	return 0;
 }
 
 void CameraDevice::requestComplete(Request *request)
 {
+	if (stopping_)
+		return;
+
+	std::scoped_lock<std::mutex> lock(mutex_);
+
 	const Request::BufferMap &buffers = request->buffers();
 	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
 	std::unique_ptr<CameraMetadata> resultMetadata;
 
-	decltype(descriptors_)::node_type node;
-	{
-		std::scoped_lock<std::mutex> lock(mutex_);
-		auto it = descriptors_.find(request->cookie());
-		if (it == descriptors_.end()) {
-			LOG(HAL, Fatal)
-				<< "Unknown request: " << request->cookie();
-			status = CAMERA3_BUFFER_STATUS_ERROR;
-			return;
-		}
-
-		node = descriptors_.extract(it);
+	auto it = descriptors_.find(request->cookie());
+	if (it == descriptors_.end()) {
+		LOG(HAL, Debug)
+			<< "Unknown request: " << request->cookie();
+		status = CAMERA3_BUFFER_STATUS_ERROR;
+		return;
 	}
+
+	auto node = descriptors_.extract(it);
 	Camera3RequestDescriptor &descriptor = node.mapped();
 
 	if (request->status() != Request::RequestComplete) {
diff --git a/src/android/camera_device.h b/src/android/camera_device.h
index c63e8e21..08553584 100644
--- a/src/android/camera_device.h
+++ b/src/android/camera_device.h
@@ -88,7 +88,7 @@  private:
 		int androidFormat;
 	};
 
-	void stop();
+	void stop(bool releaseCamera = false);
 
 	int initializeStreamConfigurations();
 	std::vector<libcamera::Size>
@@ -127,7 +127,12 @@  private:
 	std::map<int, libcamera::PixelFormat> formatsMap_;
 	std::vector<CameraStream> streams_;
 
-	std::mutex mutex_; /* Protect descriptors_ */
+	/*
+	 * Protect descriptors_ and camera_, and also avoid races by concurrent
+	 * Camera HAL API calls.
+	 */
+	std::mutex mutex_;
+	std::atomic_bool stopping_;
 	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
 
 	std::string maker_;