[libcamera-devel,v3,8/8] android: Implement flush() camera operation
diff mbox series

Message ID 20210521154227.60186-9-jacopo@jmondi.org
State Superseded
Headers show
Series
  • Implement flush() camera operation
Related show

Commit Message

Jacopo Mondi May 21, 2021, 3:42 p.m. UTC
Implement the flush() camera operation in the CameraDevice class
and make it available to the camera framework by implementing the
operation wrapper in camera_ops.cpp.

The flush() implementation stops the Camera and the worker thread and
waits for all in-flight requests to be returned. Stopping the Camera
forces all Requests already queued to be returned immediately in error
state. As flush() has to wait until all of them have been returned, make it
wait on a newly introduced condition variable which is notified by the
request completion handler when the queue of pending requests has been
exhausted.

As flush() can race with processCaptureRequest() protect the requests
queueing by introducing a new CameraState::CameraFlushing state that
processCaptureRequest() inspects before queuing the Request to the
Camera. If flush() has been called while processCaptureRequest() was
executing, return the current Request immediately in error state.

Protect potentially concurrent calls to close() and configureStreams()
by inspecting the CameraState, and force a wait for any flush() call
to complete before proceeding.

Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
---
 src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
 src/android/camera_device.h   |  9 +++-
 src/android/camera_ops.cpp    |  8 +++-
 3 files changed, 100 insertions(+), 7 deletions(-)

Comments

Niklas Söderlund May 22, 2021, 9:55 a.m. UTC | #1
Hi Jacopo,

Thanks for your work.

On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> Implement the flush() camera operation in the CameraDevice class
> and make it available to the camera framework by implementing the
> operation wrapper in camera_ops.cpp.
> 
> The flush() implementation stops the Camera and the worker thread and
> waits for all in-flight requests to be returned. Stopping the Camera
> forces all Requests already queued to be returned immediately in error
> state. As flush() has to wait until all of them have been returned, make it
> wait on a newly introduced condition variable which is notified by the
> request completion handler when the queue of pending requests has been
> exhausted.
> 
> As flush() can race with processCaptureRequest() protect the requests
> queueing by introducing a new CameraState::CameraFlushing state that
> processCaptureRequest() inspects before queuing the Request to the
> Camera. If flush() has been called while processCaptureRequest() was
> executing, return the current Request immediately in error state.
> 
> Protect potentially concurrent calls to close() and configureStreams()
> by inspecting the CameraState, and force a wait for any flush() call
> to complete before proceeding.
> 
> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> ---
>  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
>  src/android/camera_device.h   |  9 +++-
>  src/android/camera_ops.cpp    |  8 +++-
>  3 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 3fce14035718..899afaa49439 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>  
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> +	MutexLocker cameraLock(cameraMutex_);
> +	if (state_ == CameraFlushing) {
> +		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> +		camera_->release();
>  
> +		return;
> +	}
> +
> +	streams_.clear();
>  	stop();
>  
>  	camera_->release();
>  }
>  
> -void CameraDevice::stop()
> +/*
> + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
> + * until all the in-flight requests have been returned before setting the
> + * camera state to stopped.
> + *
> + * Once flushing is done it unlocks concurrent calls to camera close() and
> + * configureStreams().
> + */
> +void CameraDevice::flush()
>  {
> +	{
> +		MutexLocker cameraLock(cameraMutex_);
> +
> +		if (state_ != CameraRunning)
> +			return;
> +
> +		worker_.stop();
> +		camera_->stop();
> +		state_ = CameraFlushing;
> +	}
> +
> +	/*
> +	 * Now wait for all the in-flight requests to be completed before
> +	 * continuing. Stopping the Camera guarantees that all in-flight
> +	 * requests are completed in error state.
> +	 */
> +	{
> +		MutexLocker requestsLock(requestsMutex_);
> +		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> +	}

I'm still uneasy about releasing the cameraMutex_ for this section. In 
patch 6/8 you add it to protect the state_ variable but here it's 
ignored. I see the ASSERT() added to stop() but the patter of taking the 
lock checking state_, releasing the lock and do some work, retake the 
lock and update state_ feels like a bad idea. Maybe I'm missing 
something and this is not a real problem, if so maybe we can capture 
that in the comment here?

> +
> +	/*
> +	 * Set state to stopped and unlock close() or configureStreams() that
> +	 * might be waiting for flush to be completed.
> +	 */
>  	MutexLocker cameraLock(cameraMutex_);
> +	state_ = CameraStopped;
> +	flushed_.notify_one();
> +}
> +
> +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> +void CameraDevice::stop()
> +{
> +	ASSERT(state_ != CameraFlushing);
> +
>  	if (state_ == CameraStopped)
>  		return;
>  
> @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
>   */
>  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  {
> -	/* Before any configuration attempt, stop the camera. */
> -	stop();
> +	{
> +		/*
> +		 * If a flush is in progress, wait for it to complete and to
> +		 * stop the camera, otherwise before any new configuration
> +		 * attempt we have to stop the camera explictely.
> +		 */
> +		MutexLocker cameraLock(cameraMutex_);
> +		if (state_ == CameraFlushing)
> +			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> +		else
> +			stop();
> +	}
>  
>  	if (stream_list->num_streams == 0) {
>  		LOG(HAL, Error) << "No streams in configuration";
> @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * Just before queuing the request, make sure flush() has not
> +	 * been called after this function has been executed. In that
> +	 * case, immediately return the request with errors.
> +	 */
> +	MutexLocker cameraLock(cameraMutex_);
> +	if (state_ == CameraFlushing || state_ == CameraStopped) {
> +		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> +			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> +			buffer.release_fence = buffer.acquire_fence;
> +		}
> +
> +		notifyError(descriptor.frameNumber_,
> +			    descriptor.buffers_[0].stream,
> +			    CAMERA3_MSG_ERROR_REQUEST);
> +
> +		return 0;
> +	}
> +
>  	worker_.queueRequest(descriptor.request_.get());
>  
>  	{
> @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
>  			return;
>  		}
>  
> +		/* Release flush if all the pending requests have been completed. */
> +		if (descriptors_.empty())
> +			flushing_.notify_one();
> +
>  		node = descriptors_.extract(it);
>  	}
>  	Camera3RequestDescriptor &descriptor = node.mapped();
> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> index 7cf8e8370387..e1b3bf7d30f2 100644
> --- a/src/android/camera_device.h
> +++ b/src/android/camera_device.h
> @@ -7,6 +7,7 @@
>  #ifndef __ANDROID_CAMERA_DEVICE_H__
>  #define __ANDROID_CAMERA_DEVICE_H__
>  
> +#include <condition_variable>
>  #include <map>
>  #include <memory>
>  #include <mutex>
> @@ -42,6 +43,7 @@ public:
>  
>  	int open(const hw_module_t *hardwareModule);
>  	void close();
> +	void flush();
>  
>  	unsigned int id() const { return id_; }
>  	camera3_device_t *camera3Device() { return &camera3Device_; }
> @@ -92,6 +94,7 @@ private:
>  	enum State {
>  		CameraStopped,
>  		CameraRunning,
> +		CameraFlushing,
>  	};
>  
>  	void stop();
> @@ -120,8 +123,9 @@ private:
>  
>  	CameraWorker worker_;
>  
> -	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> +	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
>  	State state_;
> +	std::condition_variable flushed_;
>  
>  	std::shared_ptr<libcamera::Camera> camera_;
>  	std::unique_ptr<libcamera::CameraConfiguration> config_;
> @@ -134,8 +138,9 @@ private:
>  	std::map<int, libcamera::PixelFormat> formatsMap_;
>  	std::vector<CameraStream> streams_;
>  
> -	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> +	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
>  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> +	std::condition_variable flushing_;
>  
>  	std::string maker_;
>  	std::string model_;
> diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> index 696e80436821..8a3cfa175ff5 100644
> --- a/src/android/camera_ops.cpp
> +++ b/src/android/camera_ops.cpp
> @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
>  {
>  }
>  
> -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> +static int hal_dev_flush(const struct camera3_device *dev)
>  {
> +	if (!dev)
> +		return -EINVAL;
> +
> +	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> +	camera->flush();
> +
>  	return 0;
>  }
>  
> -- 
> 2.31.1
>
Jacopo Mondi May 23, 2021, 2:22 p.m. UTC | #2
Hi Niklas,

On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> Hi Jacopo,
>
> Thanks for your work.
>
> On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > Implement the flush() camera operation in the CameraDevice class
> > and make it available to the camera framework by implementing the
> > operation wrapper in camera_ops.cpp.
> >
> > The flush() implementation stops the Camera and the worker thread and
> > waits for all in-flight requests to be returned. Stopping the Camera
> > forces all Requests already queued to be returned immediately in error
> > state. As flush() has to wait until all of them have been returned, make it
> > wait on a newly introduced condition variable which is notified by the
> > request completion handler when the queue of pending requests has been
> > exhausted.
> >
> > As flush() can race with processCaptureRequest() protect the requests
> > queueing by introducing a new CameraState::CameraFlushing state that
> > processCaptureRequest() inspects before queuing the Request to the
> > Camera. If flush() has been called while processCaptureRequest() was
> > executing, return the current Request immediately in error state.
> >
> > Protect potentially concurrent calls to close() and configureStreams()
> > by inspecting the CameraState, and force a wait for any flush() call
> > to complete before proceeding.
> >
> > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > ---
> >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> >  src/android/camera_device.h   |  9 +++-
> >  src/android/camera_ops.cpp    |  8 +++-
> >  3 files changed, 100 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index 3fce14035718..899afaa49439 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> >  void CameraDevice::close()
> >  {
> > -	streams_.clear();
> > +	MutexLocker cameraLock(cameraMutex_);
> > +	if (state_ == CameraFlushing) {
> > +		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > +		camera_->release();
> >
> > +		return;
> > +	}
> > +
> > +	streams_.clear();
> >  	stop();
> >
> >  	camera_->release();
> >  }
> >
> > -void CameraDevice::stop()
> > +/*
> > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
> > + * until all the in-flight requests have been returned before setting the
> > + * camera state to stopped.
> > + *
> > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > + * configureStreams().
> > + */
> > +void CameraDevice::flush()
> >  {
> > +	{
> > +		MutexLocker cameraLock(cameraMutex_);
> > +
> > +		if (state_ != CameraRunning)
> > +			return;
> > +
> > +		worker_.stop();
> > +		camera_->stop();
> > +		state_ = CameraFlushing;
> > +	}
> > +
> > +	/*
> > +	 * Now wait for all the in-flight requests to be completed before
> > +	 * continuing. Stopping the Camera guarantees that all in-flight
> > +	 * requests are completed in error state.
> > +	 */
> > +	{
> > +		MutexLocker requestsLock(requestsMutex_);
> > +		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > +	}
>
> I'm still uneasy about releasing the cameraMutex_ for this section. In
> patch 6/8 you add it to protect the state_ variable but here it's

I'm not changing state_ without the mutex acquired, am I ?

> ignored. I see the ASSERT() added to stop() but the patter of taking the
> lock checking state_, releasing the lock and do some work, retake the
> lock and update state_ feels like a bad idea. Maybe I'm missing

How so, apart from the fact it feels a bit unusual, I concur ?

If I keep the held the mutex for the whole duration of flush no other
concurrent method can proceed until all the queued requests have not
been completed. While flush waits for the flushing_ condition to be
signaled, processCaptureRequest() can proceed and immediately return
the newly queued requests in error state by detecting state_ ==
CameraFlushing which signals that flush in is progress.
Otherwise it would have had to wait for flush to end. But then we're back
to a situation where we could serialize all calls and that's it, we
would be done with a single mutex to be held for the whole duration of
all operations.

If it only was for close() or configureStreams() we could have locked
for the whole duration of flush(), as they anyway wait for flush to
complete before proceeding (by waiting on the flushed_ condition here
below signaled).

> something and this is not a real problem, if so maybe we can capture
> that in the comment here?
>
> > +
> > +	/*
> > +	 * Set state to stopped and unlock close() or configureStreams() that
> > +	 * might be waiting for flush to be completed.
> > +	 */
> >  	MutexLocker cameraLock(cameraMutex_);
> > +	state_ = CameraStopped;
> > +	flushed_.notify_one();
> > +}
> > +
> > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > +void CameraDevice::stop()
> > +{
> > +	ASSERT(state_ != CameraFlushing);
> > +
> >  	if (state_ == CameraStopped)
> >  		return;
> >
> > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> >   */
> >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> >  {
> > -	/* Before any configuration attempt, stop the camera. */
> > -	stop();
> > +	{
> > +		/*
> > +		 * If a flush is in progress, wait for it to complete and to
> > +		 * stop the camera, otherwise before any new configuration
> > +		 * attempt we have to stop the camera explictely.
> > +		 */
> > +		MutexLocker cameraLock(cameraMutex_);
> > +		if (state_ == CameraFlushing)
> > +			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > +		else
> > +			stop();
> > +	}
> >
> >  	if (stream_list->num_streams == 0) {
> >  		LOG(HAL, Error) << "No streams in configuration";
> > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >  	if (ret)
> >  		return ret;
> >
> > +	/*
> > +	 * Just before queuing the request, make sure flush() has not
> > +	 * been called after this function has been executed. In that
> > +	 * case, immediately return the request with errors.
> > +	 */
> > +	MutexLocker cameraLock(cameraMutex_);
> > +	if (state_ == CameraFlushing || state_ == CameraStopped) {
> > +		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > +			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > +			buffer.release_fence = buffer.acquire_fence;
> > +		}
> > +
> > +		notifyError(descriptor.frameNumber_,
> > +			    descriptor.buffers_[0].stream,
> > +			    CAMERA3_MSG_ERROR_REQUEST);
> > +
> > +		return 0;
> > +	}
> > +
> >  	worker_.queueRequest(descriptor.request_.get());
> >
> >  	{
> > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> >  			return;
> >  		}
> >
> > +		/* Release flush if all the pending requests have been completed. */
> > +		if (descriptors_.empty())
> > +			flushing_.notify_one();
> > +
> >  		node = descriptors_.extract(it);
> >  	}
> >  	Camera3RequestDescriptor &descriptor = node.mapped();
> > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > index 7cf8e8370387..e1b3bf7d30f2 100644
> > --- a/src/android/camera_device.h
> > +++ b/src/android/camera_device.h
> > @@ -7,6 +7,7 @@
> >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> >  #define __ANDROID_CAMERA_DEVICE_H__
> >
> > +#include <condition_variable>
> >  #include <map>
> >  #include <memory>
> >  #include <mutex>
> > @@ -42,6 +43,7 @@ public:
> >
> >  	int open(const hw_module_t *hardwareModule);
> >  	void close();
> > +	void flush();
> >
> >  	unsigned int id() const { return id_; }
> >  	camera3_device_t *camera3Device() { return &camera3Device_; }
> > @@ -92,6 +94,7 @@ private:
> >  	enum State {
> >  		CameraStopped,
> >  		CameraRunning,
> > +		CameraFlushing,
> >  	};
> >
> >  	void stop();
> > @@ -120,8 +123,9 @@ private:
> >
> >  	CameraWorker worker_;
> >
> > -	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > +	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> >  	State state_;
> > +	std::condition_variable flushed_;
> >
> >  	std::shared_ptr<libcamera::Camera> camera_;
> >  	std::unique_ptr<libcamera::CameraConfiguration> config_;
> > @@ -134,8 +138,9 @@ private:
> >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> >  	std::vector<CameraStream> streams_;
> >
> > -	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > +	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > +	std::condition_variable flushing_;
> >
> >  	std::string maker_;
> >  	std::string model_;
> > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > index 696e80436821..8a3cfa175ff5 100644
> > --- a/src/android/camera_ops.cpp
> > +++ b/src/android/camera_ops.cpp
> > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> >  {
> >  }
> >
> > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > +static int hal_dev_flush(const struct camera3_device *dev)
> >  {
> > +	if (!dev)
> > +		return -EINVAL;
> > +
> > +	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > +	camera->flush();
> > +
> >  	return 0;
> >  }
> >
> > --
> > 2.31.1
> >
>
> --
> Regards,
> Niklas Söderlund
Laurent Pinchart May 23, 2021, 6:50 p.m. UTC | #3
Hi Jacopo,

Thank you for the patch.

On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > Implement the flush() camera operation in the CameraDevice class
> > > and make it available to the camera framework by implementing the
> > > operation wrapper in camera_ops.cpp.
> > >
> > > The flush() implementation stops the Camera and the worker thread and
> > > waits for all in-flight requests to be returned. Stopping the Camera
> > > forces all Requests already queued to be returned immediately in error
> > > state. As flush() has to wait until all of them have been returned, make it
> > > wait on a newly introduced condition variable which is notified by the
> > > request completion handler when the queue of pending requests has been
> > > exhausted.
> > >
> > > As flush() can race with processCaptureRequest() protect the requests
> > > queueing by introducing a new CameraState::CameraFlushing state that
> > > processCaptureRequest() inspects before queuing the Request to the
> > > Camera. If flush() has been called while processCaptureRequest() was
> > > executing, return the current Request immediately in error state.
> > >
> > > Protect potentially concurrent calls to close() and configureStreams()

Can this happen ? Quoting camera3.h,

 * 12. Alternatively, the framework may call camera3_device_t->common->close()
 *    to end the camera session. This may be called at any time when no other
 *    calls from the framework are active, although the call may block until all
 *    in-flight captures have completed (all results returned, all buffers
 *    filled). After the close call returns, no more calls to the
 *    camera3_callback_ops_t functions are allowed from the HAL. Once the
 *    close() call is underway, the framework may not call any other HAL device
 *    functions.

The important part is "when no other calss from the framework are
active". I don't think we need to handle close() racing with anything
else than process_capture_request().

> > > by inspecting the CameraState, and force a wait for any flush() call
> > > to complete before proceeding.
> > >
> > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > ---
> > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> > >  src/android/camera_device.h   |  9 +++-
> > >  src/android/camera_ops.cpp    |  8 +++-
> > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index 3fce14035718..899afaa49439 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > >
> > >  void CameraDevice::close()
> > >  {
> > > -	streams_.clear();
> > > +	MutexLocker cameraLock(cameraMutex_);

I'd add a blank line here.

> > > +	if (state_ == CameraFlushing) {

As mentioned above, I don't think you need to protect against close()
and flush() racing each other.

> > > +		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > +		camera_->release();
> > >
> > > +		return;
> > > +	}
> > > +
> > > +	streams_.clear();
> > >  	stop();
> > >
> > >  	camera_->release();
> > >  }
> > >
> > > -void CameraDevice::stop()
> > > +/*
> > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait

s/wait/waits/

> > > + * until all the in-flight requests have been returned before setting the
> > > + * camera state to stopped.
> > > + *
> > > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > > + * configureStreams().
> > > + */
> > > +void CameraDevice::flush()
> > >  {
> > > +	{
> > > +		MutexLocker cameraLock(cameraMutex_);
> > > +
> > > +		if (state_ != CameraRunning)
> > > +			return;
> > > +
> > > +		worker_.stop();
> > > +		camera_->stop();
> > > +		state_ = CameraFlushing;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Now wait for all the in-flight requests to be completed before
> > > +	 * continuing. Stopping the Camera guarantees that all in-flight
> > > +	 * requests are completed in error state.

Do we need to wait ? Camera::stop() guarantees that all requests
complete synchronously with the stop() call.

Partly answering myself here, we'll have to wait for post-processing
tasks to complete once we'll process them in a separate thread, but that
will likely be handled by Thread::wait(). I don't think you need a
condition variable here. I'm I'm not mistaken, this should simplify the
implementation.

> > > +	 */
> > > +	{
> > > +		MutexLocker requestsLock(requestsMutex_);
> > > +		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > > +	}
> >
> > I'm still uneasy about releasing the cameraMutex_ for this section. In
> > patch 6/8 you add it to protect the state_ variable but here it's
> 
> I'm not changing state_ without the mutex acquired, am I ?
> 
> > ignored. I see the ASSERT() added to stop() but the patter of taking the
> > lock checking state_, releasing the lock and do some work, retake the
> > lock and update state_ feels like a bad idea. Maybe I'm missing
> 
> How so, apart from the fact it feels a bit unusual, I concur ?
> 
> If I keep the held the mutex for the whole duration of flush no other
> concurrent method can proceed until all the queued requests have not
> been completed. While flush waits for the flushing_ condition to be
> signaled, processCaptureRequest() can proceed and immediately return
> the newly queued requests in error state by detecting state_ ==
> CameraFlushing which signals that flush in is progress.
> Otherwise it would have had to wait for flush to end. But then we're back
> to a situation where we could serialize all calls and that's it, we
> would be done with a single mutex to be held for the whole duration of
> all operations.
> 
> If it only was for close() or configureStreams() we could have locked
> for the whole duration of flush(), as they anyway wait for flush to
> complete before proceeding (by waiting on the flushed_ condition here
> below signaled).
> 
> > something and this is not a real problem, if so maybe we can capture
> > that in the comment here?
> >
> > > +
> > > +	/*
> > > +	 * Set state to stopped and unlock close() or configureStreams() that
> > > +	 * might be waiting for flush to be completed.
> > > +	 */
> > >  	MutexLocker cameraLock(cameraMutex_);
> > > +	state_ = CameraStopped;
> > > +	flushed_.notify_one();

You should drop the lock before calling notify_one(). Otherwise you'll
wake up the task waiting on flushed_, which will try to lock
cameraMutex_, which will block immediately. The scheduler will have to
reschedule this task for the function to return and the lock to be
released before the waiter can proceed. That works, but isn't very
efficient.

	{
		MutexLocker cameraLock(cameraMutex_);
		state_ = CameraStopped;
	}

	flushed_.notify_one();

> > > +}
> > > +
> > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > > +void CameraDevice::stop()
> > > +{
> > > +	ASSERT(state_ != CameraFlushing);
> > > +
> > >  	if (state_ == CameraStopped)
> > >  		return;
> > >
> > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > >   */
> > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >  {
> > > -	/* Before any configuration attempt, stop the camera. */
> > > -	stop();
> > > +	{
> > > +		/*
> > > +		 * If a flush is in progress, wait for it to complete and to
> > > +		 * stop the camera, otherwise before any new configuration
> > > +		 * attempt we have to stop the camera explictely.
> > > +		 */

Same here, I don't think flush() and configure_streams() can race each
other. I believe the only possible race to be between flush() and
process_capture_request().

> > > +		MutexLocker cameraLock(cameraMutex_);
> > > +		if (state_ == CameraFlushing)
> > > +			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > +		else
> > > +			stop();
> > > +	}
> > >
> > >  	if (stream_list->num_streams == 0) {
> > >  		LOG(HAL, Error) << "No streams in configuration";
> > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >  	if (ret)
> > >  		return ret;
> > >
> > > +	/*
> > > +	 * Just before queuing the request, make sure flush() has not
> > > +	 * been called after this function has been executed. In that
> > > +	 * case, immediately return the request with errors.
> > > +	 */
> > > +	MutexLocker cameraLock(cameraMutex_);
> > > +	if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > +		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > > +			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > > +			buffer.release_fence = buffer.acquire_fence;
> > > +		}
> > > +
> > > +		notifyError(descriptor.frameNumber_,
> > > +			    descriptor.buffers_[0].stream,

As commented on a previous patch, I think you should pass nullptr for
the stream here.

> > > +			    CAMERA3_MSG_ERROR_REQUEST);
> > > +
> > > +		return 0;
> > > +	}
> > > +
> > >  	worker_.queueRequest(descriptor.request_.get());
> > >
> > >  	{
> > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> > >  			return;
> > >  		}
> > >
> > > +		/* Release flush if all the pending requests have been completed. */
> > > +		if (descriptors_.empty())
> > > +			flushing_.notify_one();

This will never happen, as you can only get here if descriptors_.find()
has found the descriptor. Did you mean to do this after the extract()
call below ?

> > > +
> > >  		node = descriptors_.extract(it);
> > >  	}
> > >  	Camera3RequestDescriptor &descriptor = node.mapped();
> > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > --- a/src/android/camera_device.h
> > > +++ b/src/android/camera_device.h
> > > @@ -7,6 +7,7 @@
> > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > >  #define __ANDROID_CAMERA_DEVICE_H__
> > >
> > > +#include <condition_variable>
> > >  #include <map>
> > >  #include <memory>
> > >  #include <mutex>
> > > @@ -42,6 +43,7 @@ public:
> > >
> > >  	int open(const hw_module_t *hardwareModule);
> > >  	void close();
> > > +	void flush();
> > >
> > >  	unsigned int id() const { return id_; }
> > >  	camera3_device_t *camera3Device() { return &camera3Device_; }
> > > @@ -92,6 +94,7 @@ private:
> > >  	enum State {
> > >  		CameraStopped,
> > >  		CameraRunning,
> > > +		CameraFlushing,
> > >  	};
> > >
> > >  	void stop();
> > > @@ -120,8 +123,9 @@ private:
> > >
> > >  	CameraWorker worker_;
> > >
> > > -	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > > +	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> > >  	State state_;
> > > +	std::condition_variable flushed_;
> > >
> > >  	std::shared_ptr<libcamera::Camera> camera_;
> > >  	std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > @@ -134,8 +138,9 @@ private:
> > >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> > >  	std::vector<CameraStream> streams_;
> > >
> > > -	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > > +	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> > >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > +	std::condition_variable flushing_;
> > >
> > >  	std::string maker_;
> > >  	std::string model_;
> > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > > index 696e80436821..8a3cfa175ff5 100644
> > > --- a/src/android/camera_ops.cpp
> > > +++ b/src/android/camera_ops.cpp
> > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> > >  {
> > >  }
> > >
> > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > > +static int hal_dev_flush(const struct camera3_device *dev)
> > >  {
> > > +	if (!dev)
> > > +		return -EINVAL;
> > > +
> > > +	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > > +	camera->flush();
> > > +
> > >  	return 0;
> > >  }
> > >
Jacopo Mondi May 24, 2021, 7:47 a.m. UTC | #4
Hi Laurent,

On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> Hi Jacopo,
>
> Thank you for the patch.
>
> On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > Implement the flush() camera operation in the CameraDevice class
> > > > and make it available to the camera framework by implementing the
> > > > operation wrapper in camera_ops.cpp.
> > > >
> > > > The flush() implementation stops the Camera and the worker thread and
> > > > waits for all in-flight requests to be returned. Stopping the Camera
> > > > forces all Requests already queued to be returned immediately in error
> > > > state. As flush() has to wait until all of them have been returned, make it
> > > > wait on a newly introduced condition variable which is notified by the
> > > > request completion handler when the queue of pending requests has been
> > > > exhausted.
> > > >
> > > > As flush() can race with processCaptureRequest() protect the requests
> > > > queueing by introducing a new CameraState::CameraFlushing state that
> > > > processCaptureRequest() inspects before queuing the Request to the
> > > > Camera. If flush() has been called while processCaptureRequest() was
> > > > executing, return the current Request immediately in error state.
> > > >
> > > > Protect potentially concurrent calls to close() and configureStreams()
>
> Can this happen ? Quoting camera3.h,
>
>  * 12. Alternatively, the framework may call camera3_device_t->common->close()
>  *    to end the camera session. This may be called at any time when no other
>  *    calls from the framework are active, although the call may block until all
>  *    in-flight captures have completed (all results returned, all buffers
>  *    filled). After the close call returns, no more calls to the
>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
>  *    close() call is underway, the framework may not call any other HAL device
>  *    functions.
>
> The important part is "when no other calss from the framework are
> active". I don't think we need to handle close() racing with anything
> else than process_capture_request().

I've been discussing this with Hiro during v1, as initially I didn't
consider close() and configureStreams().

https://patchwork.libcamera.org/patch/12248/#16884

I initially only considered processCaptureRequest() as a potential
race, but got suggested differently by the cros camera team.


>
> > > > by inspecting the CameraState, and force a wait for any flush() call
> > > > to complete before proceeding.
> > > >
> > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > ---
> > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> > > >  src/android/camera_device.h   |  9 +++-
> > > >  src/android/camera_ops.cpp    |  8 +++-
> > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > index 3fce14035718..899afaa49439 100644
> > > > --- a/src/android/camera_device.cpp
> > > > +++ b/src/android/camera_device.cpp
> > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > >
> > > >  void CameraDevice::close()
> > > >  {
> > > > -	streams_.clear();
> > > > +	MutexLocker cameraLock(cameraMutex_);
>
> I'd add a blank line here.
>
> > > > +	if (state_ == CameraFlushing) {
>
> As mentioned above, I don't think you need to protect against close()
> and flush() racing each other.
>
> > > > +		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > +		camera_->release();
> > > >
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	streams_.clear();
> > > >  	stop();
> > > >
> > > >  	camera_->release();
> > > >  }
> > > >
> > > > -void CameraDevice::stop()
> > > > +/*
> > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
>
> s/wait/waits/
>
> > > > + * until all the in-flight requests have been returned before setting the
> > > > + * camera state to stopped.
> > > > + *
> > > > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > > > + * configureStreams().
> > > > + */
> > > > +void CameraDevice::flush()
> > > >  {
> > > > +	{
> > > > +		MutexLocker cameraLock(cameraMutex_);
> > > > +
> > > > +		if (state_ != CameraRunning)
> > > > +			return;
> > > > +
> > > > +		worker_.stop();
> > > > +		camera_->stop();
> > > > +		state_ = CameraFlushing;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Now wait for all the in-flight requests to be completed before
> > > > +	 * continuing. Stopping the Camera guarantees that all in-flight
> > > > +	 * requests are completed in error state.
>
> Do we need to wait ? Camera::stop() guarantees that all requests
> complete synchronously with the stop() call.

I didn't get the API that way... I thought after stop we would receive
a sequence of failed requests... Actually I don't see anything that
suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion
in Camera::stop()

>
> Partly answering myself here, we'll have to wait for post-processing
> tasks to complete once we'll process them in a separate thread, but that
> will likely be handled by Thread::wait(). I don't think you need a
> condition variable here. I'm I'm not mistaken, this should simplify the
> implementation.

If Camera::stop() is synchronous we don't need to wait indeed

>
> > > > +	 */
> > > > +	{
> > > > +		MutexLocker requestsLock(requestsMutex_);
> > > > +		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > > > +	}
> > >
> > > I'm still uneasy about releasing the cameraMutex_ for this section. In
> > > patch 6/8 you add it to protect the state_ variable but here it's
> >
> > I'm not changing state_ without the mutex acquired, am I ?
> >
> > > ignored. I see the ASSERT() added to stop() but the patter of taking the
> > > lock checking state_, releasing the lock and do some work, retake the
> > > lock and update state_ feels like a bad idea. Maybe I'm missing
> >
> > How so, apart from the fact it feels a bit unusual, I concur ?
> >
> > If I keep the held the mutex for the whole duration of flush no other
> > concurrent method can proceed until all the queued requests have not
> > been completed. While flush waits for the flushing_ condition to be
> > signaled, processCaptureRequest() can proceed and immediately return
> > the newly queued requests in error state by detecting state_ ==
> > CameraFlushing which signals that flush in is progress.
> > Otherwise it would have had to wait for flush to end. But then we're back
> > to a situation where we could serialize all calls and that's it, we
> > would be done with a single mutex to be held for the whole duration of
> > all operations.
> >
> > If it only was for close() or configureStreams() we could have locked
> > for the whole duration of flush(), as they anyway wait for flush to
> > complete before proceeding (by waiting on the flushed_ condition here
> > below signaled).
> >
> > > something and this is not a real problem, if so maybe we can capture
> > > that in the comment here?
> > >
> > > > +
> > > > +	/*
> > > > +	 * Set state to stopped and unlock close() or configureStreams() that
> > > > +	 * might be waiting for flush to be completed.
> > > > +	 */
> > > >  	MutexLocker cameraLock(cameraMutex_);
> > > > +	state_ = CameraStopped;
> > > > +	flushed_.notify_one();
>
> You should drop the lock before calling notify_one(). Otherwise you'll
> wake up the task waiting on flushed_, which will try to lock
> cameraMutex_, which will block immediately. The scheduler will have to
> reschedule this task for the function to return and the lock to be
> released before the waiter can proceed. That works, but isn't very
> efficient.

Weird, the cpp reference shows example about notify_one where the
caller always has the mutex held locked, but I see your point and
seems correct..

>
> 	{
> 		MutexLocker cameraLock(cameraMutex_);
> 		state_ = CameraStopped;
> 	}
>
> 	flushed_.notify_one();
>

So I could change to this one, if I don't have to drop this part
completely if we consider close() and configureStreams() not as
possible races...

> > > > +}
> > > > +
> > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > > > +void CameraDevice::stop()
> > > > +{
> > > > +	ASSERT(state_ != CameraFlushing);
> > > > +
> > > >  	if (state_ == CameraStopped)
> > > >  		return;
> > > >
> > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > >   */
> > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > >  {
> > > > -	/* Before any configuration attempt, stop the camera. */
> > > > -	stop();
> > > > +	{
> > > > +		/*
> > > > +		 * If a flush is in progress, wait for it to complete and to
> > > > +		 * stop the camera, otherwise before any new configuration
> > > > +		 * attempt we have to stop the camera explictely.
> > > > +		 */
>
> Same here, I don't think flush() and configure_streams() can race each
> other. I believe the only possible race to be between flush() and
> process_capture_request().
>

Ditto.

> > > > +		MutexLocker cameraLock(cameraMutex_);
> > > > +		if (state_ == CameraFlushing)
> > > > +			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > +		else
> > > > +			stop();
> > > > +	}
> > > >
> > > >  	if (stream_list->num_streams == 0) {
> > > >  		LOG(HAL, Error) << "No streams in configuration";
> > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >  	if (ret)
> > > >  		return ret;
> > > >
> > > > +	/*
> > > > +	 * Just before queuing the request, make sure flush() has not
> > > > +	 * been called after this function has been executed. In that
> > > > +	 * case, immediately return the request with errors.
> > > > +	 */
> > > > +	MutexLocker cameraLock(cameraMutex_);
> > > > +	if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > > +		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > > > +			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > +			buffer.release_fence = buffer.acquire_fence;
> > > > +		}
> > > > +
> > > > +		notifyError(descriptor.frameNumber_,
> > > > +			    descriptor.buffers_[0].stream,
>
> As commented on a previous patch, I think you should pass nullptr for
> the stream here.
>

The "S6. Error management:" section of the camera3.h header does not
mention that, not the ? where does you suggestion come from ? I don't find
any reference in the review of [1/8]


> > > > +			    CAMERA3_MSG_ERROR_REQUEST);
> > > > +
> > > > +		return 0;
> > > > +	}
> > > > +
> > > >  	worker_.queueRequest(descriptor.request_.get());
> > > >
> > > >  	{
> > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> > > >  			return;
> > > >  		}
> > > >
> > > > +		/* Release flush if all the pending requests have been completed. */
> > > > +		if (descriptors_.empty())
> > > > +			flushing_.notify_one();
>
> This will never happen, as you can only get here if descriptors_.find()
> has found the descriptor. Did you mean to do this after the extract()
> call below ?

Ugh. This works only because Camera::stop() is synchronous then ?

>
> > > > +
> > > >  		node = descriptors_.extract(it);
> > > >  	}
> > > >  	Camera3RequestDescriptor &descriptor = node.mapped();
> > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > --- a/src/android/camera_device.h
> > > > +++ b/src/android/camera_device.h
> > > > @@ -7,6 +7,7 @@
> > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > >
> > > > +#include <condition_variable>
> > > >  #include <map>
> > > >  #include <memory>
> > > >  #include <mutex>
> > > > @@ -42,6 +43,7 @@ public:
> > > >
> > > >  	int open(const hw_module_t *hardwareModule);
> > > >  	void close();
> > > > +	void flush();
> > > >
> > > >  	unsigned int id() const { return id_; }
> > > >  	camera3_device_t *camera3Device() { return &camera3Device_; }
> > > > @@ -92,6 +94,7 @@ private:
> > > >  	enum State {
> > > >  		CameraStopped,
> > > >  		CameraRunning,
> > > > +		CameraFlushing,
> > > >  	};
> > > >
> > > >  	void stop();
> > > > @@ -120,8 +123,9 @@ private:
> > > >
> > > >  	CameraWorker worker_;
> > > >
> > > > -	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > > > +	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> > > >  	State state_;
> > > > +	std::condition_variable flushed_;
> > > >
> > > >  	std::shared_ptr<libcamera::Camera> camera_;
> > > >  	std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > @@ -134,8 +138,9 @@ private:
> > > >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> > > >  	std::vector<CameraStream> streams_;
> > > >
> > > > -	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > > > +	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> > > >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > > +	std::condition_variable flushing_;
> > > >
> > > >  	std::string maker_;
> > > >  	std::string model_;
> > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > > > index 696e80436821..8a3cfa175ff5 100644
> > > > --- a/src/android/camera_ops.cpp
> > > > +++ b/src/android/camera_ops.cpp
> > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> > > >  {
> > > >  }
> > > >
> > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > >  {
> > > > +	if (!dev)
> > > > +		return -EINVAL;
> > > > +
> > > > +	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > > > +	camera->flush();
> > > > +
> > > >  	return 0;
> > > >  }
> > > >
>
> --
> Regards,
>
> Laurent Pinchart
Hirokazu Honda May 25, 2021, 2:49 a.m. UTC | #5
Hi Jacopo, thank you for the patch.

On Mon, May 24, 2021 at 4:47 PM Jacopo Mondi <jacopo@jmondi.org> wrote:

> Hi Laurent,
>
> On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > Hi Jacopo,
> >
> > Thank you for the patch.
> >
> > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > Implement the flush() camera operation in the CameraDevice class
> > > > > and make it available to the camera framework by implementing the
> > > > > operation wrapper in camera_ops.cpp.
> > > > >
> > > > > The flush() implementation stops the Camera and the worker thread
> and
> > > > > waits for all in-flight requests to be returned. Stopping the
> Camera
> > > > > forces all Requests already queued to be returned immediately in
> error
> > > > > state. As flush() has to wait until all of them have been
> returned, make it
> > > > > wait on a newly introduced condition variable which is notified by
> the
> > > > > request completion handler when the queue of pending requests has
> been
> > > > > exhausted.
> > > > >
> > > > > As flush() can race with processCaptureRequest() protect the
> requests
> > > > > queueing by introducing a new CameraState::CameraFlushing state
> that
> > > > > processCaptureRequest() inspects before queuing the Request to the
> > > > > Camera. If flush() has been called while processCaptureRequest()
> was
> > > > > executing, return the current Request immediately in error state.
> > > > >
> > > > > Protect potentially concurrent calls to close() and
> configureStreams()
> >
> > Can this happen ? Quoting camera3.h,
> >
> >  * 12. Alternatively, the framework may call
> camera3_device_t->common->close()
> >  *    to end the camera session. This may be called at any time when no
> other
> >  *    calls from the framework are active, although the call may block
> until all
> >  *    in-flight captures have completed (all results returned, all
> buffers
> >  *    filled). After the close call returns, no more calls to the
> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> >  *    close() call is underway, the framework may not call any other HAL
> device
> >  *    functions.
> >
> > The important part is "when no other calss from the framework are
> > active". I don't think we need to handle close() racing with anything
> > else than process_capture_request().
>
> I've been discussing this with Hiro during v1, as initially I didn't
> consider close() and configureStreams().
>
> https://patchwork.libcamera.org/patch/12248/#16884
>
> I initially only considered processCaptureRequest() as a potential
> race, but got suggested differently by the cros camera team.
>
>
> >
> > > > > by inspecting the CameraState, and force a wait for any flush()
> call
> > > > > to complete before proceeding.
> > > > >
> > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > ---
> > > > >  src/android/camera_device.cpp | 90
> +++++++++++++++++++++++++++++++++--
> > > > >  src/android/camera_device.h   |  9 +++-
> > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
> > > > > index 3fce14035718..899afaa49439 100644
> > > > > --- a/src/android/camera_device.cpp
> > > > > +++ b/src/android/camera_device.cpp
> > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t
> *hardwareModule)
> > > > >
> > > > >  void CameraDevice::close()
> > > > >  {
> > > > > -       streams_.clear();
> > > > > +       MutexLocker cameraLock(cameraMutex_);
> >
> > I'd add a blank line here.
> >
> > > > > +       if (state_ == CameraFlushing) {
> >
> > As mentioned above, I don't think you need to protect against close()
> > and flush() racing each other.
> >
> > > > > +               flushed_.wait(cameraLock, [&] { return state_ !=
> CameraStopped; });
> > > > > +               camera_->release();
> > > > >
> > > > > +               return;
> > > > > +       }
> > > > > +
> > > > > +       streams_.clear();
> > > > >         stop();
> > > > >
> > > > >         camera_->release();
> > > > >  }
> > > > >
> > > > > -void CameraDevice::stop()
> > > > > +/*
> > > > > + * Flush is similar to stop() but sets the camera state to
> 'flushing' and wait
> >
> > s/wait/waits/
> >
> > > > > + * until all the in-flight requests have been returned before
> setting the
> > > > > + * camera state to stopped.
> > > > > + *
> > > > > + * Once flushing is done it unlocks concurrent calls to camera
> close() and
> > > > > + * configureStreams().
> > > > > + */
> > > > > +void CameraDevice::flush()
> > > > >  {
> > > > > +       {
> > > > > +               MutexLocker cameraLock(cameraMutex_);
> > > > > +
> > > > > +               if (state_ != CameraRunning)
> > > > > +                       return;
> > > > > +
> > > > > +               worker_.stop();
> > > > > +               camera_->stop();
> > > > > +               state_ = CameraFlushing;
> > > > > +       }
> > > > > +
> > > > > +       /*
> > > > > +        * Now wait for all the in-flight requests to be completed
> before
> > > > > +        * continuing. Stopping the Camera guarantees that all
> in-flight
> > > > > +        * requests are completed in error state.
> >
> > Do we need to wait ? Camera::stop() guarantees that all requests
> > complete synchronously with the stop() call.
>
> I didn't get the API that way... I thought after stop we would receive
> a sequence of failed requests... Actually I don't see anything that
> suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion
> in Camera::stop()
>
> >
> > Partly answering myself here, we'll have to wait for post-processing
> > tasks to complete once we'll process them in a separate thread, but that
> > will likely be handled by Thread::wait(). I don't think you need a
> > condition variable here. I'm I'm not mistaken, this should simplify the
> > implementation.
>
> If Camera::stop() is synchronous we don't need to wait indeed
>
> >
> > > > > +        */
> > > > > +       {
> > > > > +               MutexLocker requestsLock(requestsMutex_);
> > > > > +               flushing_.wait(requestsLock, [&] { return
> descriptors_.empty(); });
> > > > > +       }
> > > >
> > > > I'm still uneasy about releasing the cameraMutex_ for this section.
> In
> > > > patch 6/8 you add it to protect the state_ variable but here it's
> > >
> > > I'm not changing state_ without the mutex acquired, am I ?
> > >
> > > > ignored. I see the ASSERT() added to stop() but the patter of taking
> the
> > > > lock checking state_, releasing the lock and do some work, retake the
> > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > >
> > > How so, apart from the fact it feels a bit unusual, I concur ?
> > >
> > > If I keep the held the mutex for the whole duration of flush no other
> > > concurrent method can proceed until all the queued requests have not
> > > been completed. While flush waits for the flushing_ condition to be
> > > signaled, processCaptureRequest() can proceed and immediately return
> > > the newly queued requests in error state by detecting state_ ==
> > > CameraFlushing which signals that flush in is progress.
> > > Otherwise it would have had to wait for flush to end. But then we're
> back
> > > to a situation where we could serialize all calls and that's it, we
> > > would be done with a single mutex to be held for the whole duration of
> > > all operations.
> > >
> > > If it only was for close() or configureStreams() we could have locked
> > > for the whole duration of flush(), as they anyway wait for flush to
> > > complete before proceeding (by waiting on the flushed_ condition here
> > > below signaled).
> > >
> > > > something and this is not a real problem, if so maybe we can capture
> > > > that in the comment here?
> > > >
> > > > > +
> > > > > +       /*
> > > > > +        * Set state to stopped and unlock close() or
> configureStreams() that
> > > > > +        * might be waiting for flush to be completed.
> > > > > +        */
> > > > >         MutexLocker cameraLock(cameraMutex_);
> > > > > +       state_ = CameraStopped;
> > > > > +       flushed_.notify_one();
> >
> > You should drop the lock before calling notify_one(). Otherwise you'll
> > wake up the task waiting on flushed_, which will try to lock
> > cameraMutex_, which will block immediately. The scheduler will have to
> > reschedule this task for the function to return and the lock to be
> > released before the waiter can proceed. That works, but isn't very
> > efficient.
>
> Weird, the cpp reference shows example about notify_one where the
> caller always has the mutex held locked, but I see your point and
> seems correct..
>
>
This is correct.
https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
> The notifying thread does not need to hold the lock on the same mutex as
the one held by the waiting thread(s);

I know that, but I haven't pointed it out because it is not false.
From the Laurent explanation, yeah, we should definitely avoid that. TIL.


> >
> >       {
> >               MutexLocker cameraLock(cameraMutex_);
> >               state_ = CameraStopped;
> >       }
> >
> >       flushed_.notify_one();
> >
>
> So I could change to this one, if I don't have to drop this part
> completely if we consider close() and configureStreams() not as
> possible races...
>
> > > > > +}
> > > > > +
> > > > > +/* Calls to stop() must be protected by cameraMutex_ being held
> by the caller. */
> > > > > +void CameraDevice::stop()
> > > > > +{
> > > > > +       ASSERT(state_ != CameraFlushing);
> > > > > +
> > > > >         if (state_ == CameraStopped)
> > > > >                 return;
> > > > >
> > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int
> format) const
> > > > >   */
> > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t
> *stream_list)
> > > > >  {
> > > > > -       /* Before any configuration attempt, stop the camera. */
> > > > > -       stop();
> > > > > +       {
> > > > > +               /*
> > > > > +                * If a flush is in progress, wait for it to
> complete and to
> > > > > +                * stop the camera, otherwise before any new
> configuration
> > > > > +                * attempt we have to stop the camera explictely.
> > > > > +                */
> >
> > Same here, I don't think flush() and configure_streams() can race each
> > other. I believe the only possible race to be between flush() and
> > process_capture_request().
> >
>
> Ditto.
>
> > > > > +               MutexLocker cameraLock(cameraMutex_);
> > > > > +               if (state_ == CameraFlushing)
> > > > > +                       flushed_.wait(cameraLock, [&] { return
> state_ != CameraStopped; });
> > > > > +               else
> > > > > +                       stop();
> > > > > +       }
> > > > >
> > > > >         if (stream_list->num_streams == 0) {
> > > > >                 LOG(HAL, Error) << "No streams in configuration";
> > > > > @@ -1950,6 +2009,25 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > >         if (ret)
> > > > >                 return ret;
> > > > >
> > > > > +       /*
> > > > > +        * Just before queuing the request, make sure flush() has
> not
> > > > > +        * been called after this function has been executed. In
> that
> > > > > +        * case, immediately return the request with errors.
> > > > > +        */
> > > > > +       MutexLocker cameraLock(cameraMutex_);
> > > > > +       if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > > > +               for (camera3_stream_buffer_t &buffer :
> descriptor.buffers_) {
> > > > > +                       buffer.status =
> CAMERA3_BUFFER_STATUS_ERROR;
> > > > > +                       buffer.release_fence =
> buffer.acquire_fence;
> > > > > +               }
> > > > > +
> > > > > +               notifyError(descriptor.frameNumber_,
> > > > > +                           descriptor.buffers_[0].stream,
> >
> > As commented on a previous patch, I think you should pass nullptr for
> > the stream here.
> >
>
> The "S6. Error management:" section of the camera3.h header does not
> mention that, not the ? where does you suggestion come from ? I don't find
> any reference in the review of [1/8]
>
>
> > > > > +                           CAMERA3_MSG_ERROR_REQUEST);
> > > > > +
> > > > > +               return 0;
> > > > > +       }
> > > > > +
> > > > >         worker_.queueRequest(descriptor.request_.get());
> > > > >
> > > > >         {
> > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request
> *request)
> > > > >                         return;
> > > > >                 }
> > > > >
> > > > > +               /* Release flush if all the pending requests have
> been completed. */
> > > > > +               if (descriptors_.empty())
> > > > > +                       flushing_.notify_one();
> >
> > This will never happen, as you can only get here if descriptors_.find()
> > has found the descriptor. Did you mean to do this after the extract()
> > call below ?
>
> Ugh. This works only because Camera::stop() is synchronous then ?
>
>
Ah, good catch!

With this fix, the code looks good to me.
I am happy to ongoingly join the discussion.

Reviewed-by: Hirokazu Honda <hiroh@chromium.org>


> >
> > > > > +
> > > > >                 node = descriptors_.extract(it);
> > > > >         }
> > > > >         Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > diff --git a/src/android/camera_device.h
> b/src/android/camera_device.h
> > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > --- a/src/android/camera_device.h
> > > > > +++ b/src/android/camera_device.h
> > > > > @@ -7,6 +7,7 @@
> > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > >
> > > > > +#include <condition_variable>
> > > > >  #include <map>
> > > > >  #include <memory>
> > > > >  #include <mutex>
> > > > > @@ -42,6 +43,7 @@ public:
> > > > >
> > > > >         int open(const hw_module_t *hardwareModule);
> > > > >         void close();
> > > > > +       void flush();
> > > > >
> > > > >         unsigned int id() const { return id_; }
> > > > >         camera3_device_t *camera3Device() { return
> &camera3Device_; }
> > > > > @@ -92,6 +94,7 @@ private:
> > > > >         enum State {
> > > > >                 CameraStopped,
> > > > >                 CameraRunning,
> > > > > +               CameraFlushing,
> > > > >         };
> > > > >
> > > > >         void stop();
> > > > > @@ -120,8 +123,9 @@ private:
> > > > >
> > > > >         CameraWorker worker_;
> > > > >
> > > > > -       libcamera::Mutex cameraMutex_; /* Protects access to the
> camera state. */
> > > > > +       libcamera::Mutex cameraMutex_; /* Protects the camera
> state and flushed_. */
> > > > >         State state_;
> > > > > +       std::condition_variable flushed_;
> > > > >
> > > > >         std::shared_ptr<libcamera::Camera> camera_;
> > > > >         std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > @@ -134,8 +138,9 @@ private:
> > > > >         std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > >         std::vector<CameraStream> streams_;
> > > > >
> > > > > -       libcamera::Mutex requestsMutex_; /* Protects descriptors_.
> */
> > > > > +       libcamera::Mutex requestsMutex_; /* Protects descriptors_
> and flushing_. */
> > > > >         std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > > > +       std::condition_variable flushing_;
> > > > >
> > > > >         std::string maker_;
> > > > >         std::string model_;
> > > > > diff --git a/src/android/camera_ops.cpp
> b/src/android/camera_ops.cpp
> > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > --- a/src/android/camera_ops.cpp
> > > > > +++ b/src/android/camera_ops.cpp
> > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const
> struct camera3_device *dev,
> > > > >  {
> > > > >  }
> > > > >
> > > > > -static int hal_dev_flush([[maybe_unused]] const struct
> camera3_device *dev)
> > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > >  {
> > > > > +       if (!dev)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       CameraDevice *camera = reinterpret_cast<CameraDevice
> *>(dev->priv);
> > > > > +       camera->flush();
> > > > > +
> > > > >         return 0;
> > > > >  }
> > > > >
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
>
Laurent Pinchart May 27, 2021, 2:26 a.m. UTC | #6
Hi Jacopo,

(expanding the CC list to finalize the race conditions discussion)

On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:
> On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > Implement the flush() camera operation in the CameraDevice class
> > > > > and make it available to the camera framework by implementing the
> > > > > operation wrapper in camera_ops.cpp.
> > > > >
> > > > > The flush() implementation stops the Camera and the worker thread and
> > > > > waits for all in-flight requests to be returned. Stopping the Camera
> > > > > forces all Requests already queued to be returned immediately in error
> > > > > state. As flush() has to wait until all of them have been returned, make it
> > > > > wait on a newly introduced condition variable which is notified by the
> > > > > request completion handler when the queue of pending requests has been
> > > > > exhausted.
> > > > >
> > > > > As flush() can race with processCaptureRequest() protect the requests
> > > > > queueing by introducing a new CameraState::CameraFlushing state that
> > > > > processCaptureRequest() inspects before queuing the Request to the
> > > > > Camera. If flush() has been called while processCaptureRequest() was
> > > > > executing, return the current Request immediately in error state.
> > > > >
> > > > > Protect potentially concurrent calls to close() and configureStreams()
> >
> > Can this happen ? Quoting camera3.h,
> >
> >  * 12. Alternatively, the framework may call camera3_device_t->common->close()
> >  *    to end the camera session. This may be called at any time when no other
> >  *    calls from the framework are active, although the call may block until all
> >  *    in-flight captures have completed (all results returned, all buffers
> >  *    filled). After the close call returns, no more calls to the
> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> >  *    close() call is underway, the framework may not call any other HAL device
> >  *    functions.
> >
> > The important part is "when no other calss from the framework are
> > active". I don't think we need to handle close() racing with anything
> > else than process_capture_request().
> 
> I've been discussing this with Hiro during v1, as initially I didn't
> consider close() and configureStreams().
> 
> https://patchwork.libcamera.org/patch/12248/#16884
> 
> I initially only considered processCaptureRequest() as a potential
> race, but got suggested differently by the cros camera team.

Let's try to get to the bottom of this.

Section S2 ("Startup and general expected operation sequence") states:

 * 12. Alternatively, the framework may call camera3_device_t->common->close()
 *    to end the camera session. This may be called at any time when no other
 *    calls from the framework are active, although the call may block until all
 *    in-flight captures have completed (all results returned, all buffers
 *    filled). After the close call returns, no more calls to the
 *    camera3_callback_ops_t functions are allowed from the HAL. Once the
 *    close() call is underway, the framework may not call any other HAL device
 *    functions.

There can be in-flight requests when .close() is called, but it can't be
called concurrently with any other call. There's thus no race condition
to protect against.

The .configure_streams() documentation states:

     * Preconditions:
     *
     * The framework will only call this method when no captures are being
     * processed. That is, all results have been returned to the framework, and
     * all in-flight input and output buffers have been returned and their
     * release sync fences have been signaled by the HAL. The framework will not
     * submit new requests for capture while the configure_streams() call is
     * underway.

This clearly forbids calling .configure_streams() and
.process_capture_request() concurrently.

The .flush() documentation states:

     * Flush all currently in-process captures and all buffers in the pipeline
     * on the given device. The framework will use this to dump all state as
     * quickly as possible in order to prepare for a configure_streams() call.

I interpret this as at least a very strong hint that .flush() and
.configure_streams() can't be called concurrently :-)

If anyone disagrees, I'd like compelling evidence that those races can
occur.

> > > > > by inspecting the CameraState, and force a wait for any flush() call
> > > > > to complete before proceeding.
> > > > >
> > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > ---
> > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> > > > >  src/android/camera_device.h   |  9 +++-
> > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > index 3fce14035718..899afaa49439 100644
> > > > > --- a/src/android/camera_device.cpp
> > > > > +++ b/src/android/camera_device.cpp
> > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > >
> > > > >  void CameraDevice::close()
> > > > >  {
> > > > > -	streams_.clear();
> > > > > +	MutexLocker cameraLock(cameraMutex_);
> >
> > I'd add a blank line here.
> >
> > > > > +	if (state_ == CameraFlushing) {
> >
> > As mentioned above, I don't think you need to protect against close()
> > and flush() racing each other.
> >
> > > > > +		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > +		camera_->release();
> > > > >
> > > > > +		return;
> > > > > +	}
> > > > > +
> > > > > +	streams_.clear();
> > > > >  	stop();
> > > > >
> > > > >  	camera_->release();
> > > > >  }
> > > > >
> > > > > -void CameraDevice::stop()
> > > > > +/*
> > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
> >
> > s/wait/waits/
> >
> > > > > + * until all the in-flight requests have been returned before setting the
> > > > > + * camera state to stopped.
> > > > > + *
> > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > > > > + * configureStreams().
> > > > > + */
> > > > > +void CameraDevice::flush()
> > > > >  {
> > > > > +	{
> > > > > +		MutexLocker cameraLock(cameraMutex_);
> > > > > +
> > > > > +		if (state_ != CameraRunning)
> > > > > +			return;
> > > > > +
> > > > > +		worker_.stop();
> > > > > +		camera_->stop();
> > > > > +		state_ = CameraFlushing;
> > > > > +	}
> > > > > +
> > > > > +	/*
> > > > > +	 * Now wait for all the in-flight requests to be completed before
> > > > > +	 * continuing. Stopping the Camera guarantees that all in-flight
> > > > > +	 * requests are completed in error state.
> >
> > Do we need to wait ? Camera::stop() guarantees that all requests
> > complete synchronously with the stop() call.
> 
> I didn't get the API that way... I thought after stop we would receive
> a sequence of failed requests... Actually I don't see anything that
> suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion
> in Camera::stop()

The camera::stop() documentation states

 * This method stops capturing and processing requests immediately. All pending
 * requests are cancelled and complete synchronously in an error state.

Is this ambiguous ?

> > Partly answering myself here, we'll have to wait for post-processing
> > tasks to complete once we'll process them in a separate thread, but that
> > will likely be handled by Thread::wait(). I don't think you need a
> > condition variable here. I'm I'm not mistaken, this should simplify the
> > implementation.
> 
> If Camera::stop() is synchronous we don't need to wait indeed
> 
> > > > > +	 */
> > > > > +	{
> > > > > +		MutexLocker requestsLock(requestsMutex_);
> > > > > +		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > > > > +	}
> > > >
> > > > I'm still uneasy about releasing the cameraMutex_ for this section. In
> > > > patch 6/8 you add it to protect the state_ variable but here it's
> > >
> > > I'm not changing state_ without the mutex acquired, am I ?
> > >
> > > > ignored. I see the ASSERT() added to stop() but the patter of taking the
> > > > lock checking state_, releasing the lock and do some work, retake the
> > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > >
> > > How so, apart from the fact it feels a bit unusual, I concur ?
> > >
> > > If I keep the held the mutex for the whole duration of flush no other
> > > concurrent method can proceed until all the queued requests have not
> > > been completed. While flush waits for the flushing_ condition to be
> > > signaled, processCaptureRequest() can proceed and immediately return
> > > the newly queued requests in error state by detecting state_ ==
> > > CameraFlushing which signals that flush in is progress.
> > > Otherwise it would have had to wait for flush to end. But then we're back
> > > to a situation where we could serialize all calls and that's it, we
> > > would be done with a single mutex to be held for the whole duration of
> > > all operations.
> > >
> > > If it only was for close() or configureStreams() we could have locked
> > > for the whole duration of flush(), as they anyway wait for flush to
> > > complete before proceeding (by waiting on the flushed_ condition here
> > > below signaled).
> > >
> > > > something and this is not a real problem, if so maybe we can capture
> > > > that in the comment here?
> > > >
> > > > > +
> > > > > +	/*
> > > > > +	 * Set state to stopped and unlock close() or configureStreams() that
> > > > > +	 * might be waiting for flush to be completed.
> > > > > +	 */
> > > > >  	MutexLocker cameraLock(cameraMutex_);
> > > > > +	state_ = CameraStopped;
> > > > > +	flushed_.notify_one();
> >
> > You should drop the lock before calling notify_one(). Otherwise you'll
> > wake up the task waiting on flushed_, which will try to lock
> > cameraMutex_, which will block immediately. The scheduler will have to
> > reschedule this task for the function to return and the lock to be
> > released before the waiter can proceed. That works, but isn't very
> > efficient.
> 
> Weird, the cpp reference shows example about notify_one where the
> caller always has the mutex held locked, but I see your point and
> seems correct..

I'm looking at
https://en.cppreference.com/w/cpp/thread/condition_variable and
https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
and both calls to notify_one() in the example are made without the lock
held, aren't they ?

> >
> > 	{
> > 		MutexLocker cameraLock(cameraMutex_);
> > 		state_ = CameraStopped;
> > 	}
> >
> > 	flushed_.notify_one();
> >
> 
> So I could change to this one, if I don't have to drop this part
> completely if we consider close() and configureStreams() not as
> possible races...
> 
> > > > > +}
> > > > > +
> > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > > > > +void CameraDevice::stop()
> > > > > +{
> > > > > +	ASSERT(state_ != CameraFlushing);
> > > > > +
> > > > >  	if (state_ == CameraStopped)
> > > > >  		return;
> > > > >
> > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > > >   */
> > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > > >  {
> > > > > -	/* Before any configuration attempt, stop the camera. */
> > > > > -	stop();
> > > > > +	{
> > > > > +		/*
> > > > > +		 * If a flush is in progress, wait for it to complete and to
> > > > > +		 * stop the camera, otherwise before any new configuration
> > > > > +		 * attempt we have to stop the camera explictely.
> > > > > +		 */
> >
> > Same here, I don't think flush() and configure_streams() can race each
> > other. I believe the only possible race to be between flush() and
> > process_capture_request().
> 
> Ditto.
> 
> > > > > +		MutexLocker cameraLock(cameraMutex_);
> > > > > +		if (state_ == CameraFlushing)
> > > > > +			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > +		else
> > > > > +			stop();
> > > > > +	}
> > > > >
> > > > >  	if (stream_list->num_streams == 0) {
> > > > >  		LOG(HAL, Error) << "No streams in configuration";
> > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > >  	if (ret)
> > > > >  		return ret;
> > > > >
> > > > > +	/*
> > > > > +	 * Just before queuing the request, make sure flush() has not
> > > > > +	 * been called after this function has been executed. In that
> > > > > +	 * case, immediately return the request with errors.
> > > > > +	 */
> > > > > +	MutexLocker cameraLock(cameraMutex_);
> > > > > +	if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > > > +		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > > > > +			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > > +			buffer.release_fence = buffer.acquire_fence;
> > > > > +		}
> > > > > +
> > > > > +		notifyError(descriptor.frameNumber_,
> > > > > +			    descriptor.buffers_[0].stream,
> >
> > As commented on a previous patch, I think you should pass nullptr for
> > the stream here.
> 
> The "S6. Error management:" section of the camera3.h header does not
> mention that, not the ?

Indeed, that section doesn't mention the camera3_error_msg::error_stream
field at all. The field is documented in the structure as

    /**
     * Pointer to the stream that had a failure. NULL if the stream isn't
     * applicable to the error.
     */

The question is thus when the stream is applicable to the error. The
documentation of enum camera3_error_msg_code mentions error_stream in
the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to
the device, the request or the result metadata, which are not specific
to a stream.

> where does you suggestion come from ? I don't find any reference in
> the review of [1/8]

([PATCH v3 1/8] android: Rework request completion notification'
(YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)

> > > > > +			    CAMERA3_MSG_ERROR_REQUEST);
> > > > > +
> > > > > +		return 0;
> > > > > +	}
> > > > > +
> > > > >  	worker_.queueRequest(descriptor.request_.get());
> > > > >
> > > > >  	{
> > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> > > > >  			return;
> > > > >  		}
> > > > >
> > > > > +		/* Release flush if all the pending requests have been completed. */
> > > > > +		if (descriptors_.empty())
> > > > > +			flushing_.notify_one();
> >
> > This will never happen, as you can only get here if descriptors_.find()
> > has found the descriptor. Did you mean to do this after the extract()
> > call below ?
> 
> Ugh. This works only because Camera::stop() is synchronous then ?

I believe so.

> > > > > +
> > > > >  		node = descriptors_.extract(it);
> > > > >  	}
> > > > >  	Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > --- a/src/android/camera_device.h
> > > > > +++ b/src/android/camera_device.h
> > > > > @@ -7,6 +7,7 @@
> > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > >
> > > > > +#include <condition_variable>
> > > > >  #include <map>
> > > > >  #include <memory>
> > > > >  #include <mutex>
> > > > > @@ -42,6 +43,7 @@ public:
> > > > >
> > > > >  	int open(const hw_module_t *hardwareModule);
> > > > >  	void close();
> > > > > +	void flush();
> > > > >
> > > > >  	unsigned int id() const { return id_; }
> > > > >  	camera3_device_t *camera3Device() { return &camera3Device_; }
> > > > > @@ -92,6 +94,7 @@ private:
> > > > >  	enum State {
> > > > >  		CameraStopped,
> > > > >  		CameraRunning,
> > > > > +		CameraFlushing,
> > > > >  	};
> > > > >
> > > > >  	void stop();
> > > > > @@ -120,8 +123,9 @@ private:
> > > > >
> > > > >  	CameraWorker worker_;
> > > > >
> > > > > -	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > > > > +	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> > > > >  	State state_;
> > > > > +	std::condition_variable flushed_;
> > > > >
> > > > >  	std::shared_ptr<libcamera::Camera> camera_;
> > > > >  	std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > @@ -134,8 +138,9 @@ private:
> > > > >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > >  	std::vector<CameraStream> streams_;
> > > > >
> > > > > -	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > > > > +	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> > > > >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > > > +	std::condition_variable flushing_;
> > > > >
> > > > >  	std::string maker_;
> > > > >  	std::string model_;
> > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > --- a/src/android/camera_ops.cpp
> > > > > +++ b/src/android/camera_ops.cpp
> > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> > > > >  {
> > > > >  }
> > > > >
> > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > >  {
> > > > > +	if (!dev)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > > > > +	camera->flush();
> > > > > +
> > > > >  	return 0;
> > > > >  }
> > > > >
Tomasz Figa May 27, 2021, 2:49 a.m. UTC | #7
Hi Laurent,

On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Jacopo,
>
> (expanding the CC list to finalize the race conditions discussion)
>
> On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:
> > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > > Implement the flush() camera operation in the CameraDevice class
> > > > > > and make it available to the camera framework by implementing the
> > > > > > operation wrapper in camera_ops.cpp.
> > > > > >
> > > > > > The flush() implementation stops the Camera and the worker thread and
> > > > > > waits for all in-flight requests to be returned. Stopping the Camera
> > > > > > forces all Requests already queued to be returned immediately in error
> > > > > > state. As flush() has to wait until all of them have been returned, make it
> > > > > > wait on a newly introduced condition variable which is notified by the
> > > > > > request completion handler when the queue of pending requests has been
> > > > > > exhausted.
> > > > > >
> > > > > > As flush() can race with processCaptureRequest() protect the requests
> > > > > > queueing by introducing a new CameraState::CameraFlushing state that
> > > > > > processCaptureRequest() inspects before queuing the Request to the
> > > > > > Camera. If flush() has been called while processCaptureRequest() was
> > > > > > executing, return the current Request immediately in error state.
> > > > > >
> > > > > > Protect potentially concurrent calls to close() and configureStreams()
> > >
> > > Can this happen ? Quoting camera3.h,
> > >
> > >  * 12. Alternatively, the framework may call camera3_device_t->common->close()
> > >  *    to end the camera session. This may be called at any time when no other
> > >  *    calls from the framework are active, although the call may block until all
> > >  *    in-flight captures have completed (all results returned, all buffers
> > >  *    filled). After the close call returns, no more calls to the
> > >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> > >  *    close() call is underway, the framework may not call any other HAL device
> > >  *    functions.
> > >
> > > The important part is "when no other calss from the framework are
> > > active". I don't think we need to handle close() racing with anything
> > > else than process_capture_request().
> >
> > I've been discussing this with Hiro during v1, as initially I didn't
> > consider close() and configureStreams().
> >
> > https://patchwork.libcamera.org/patch/12248/#16884
> >
> > I initially only considered processCaptureRequest() as a potential
> > race, but got suggested differently by the cros camera team.
>
> Let's try to get to the bottom of this.
>
> Section S2 ("Startup and general expected operation sequence") states:
>
>  * 12. Alternatively, the framework may call camera3_device_t->common->close()
>  *    to end the camera session. This may be called at any time when no other
>  *    calls from the framework are active, although the call may block until all
>  *    in-flight captures have completed (all results returned, all buffers
>  *    filled). After the close call returns, no more calls to the
>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
>  *    close() call is underway, the framework may not call any other HAL device
>  *    functions.
>
> There can be in-flight requests when .close() is called, but it can't be
> called concurrently with any other call. There's thus no race condition
> to protect against.

Note that camera3.h is considered outdated, as the interface updates
have been reflected only in the HIDL version [1]. However, I don't see
a HIDL counterpart of the section you mentioned.

[1] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/

>
> The .configure_streams() documentation states:
>
>      * Preconditions:
>      *
>      * The framework will only call this method when no captures are being
>      * processed. That is, all results have been returned to the framework, and
>      * all in-flight input and output buffers have been returned and their
>      * release sync fences have been signaled by the HAL. The framework will not
>      * submit new requests for capture while the configure_streams() call is
>      * underway.
>
> This clearly forbids calling .configure_streams() and
> .process_capture_request() concurrently.
>
> The .flush() documentation states:
>
>      * Flush all currently in-process captures and all buffers in the pipeline
>      * on the given device. The framework will use this to dump all state as
>      * quickly as possible in order to prepare for a configure_streams() call.
>
> I interpret this as at least a very strong hint that .flush() and
> .configure_streams() can't be called concurrently :-)
>
> If anyone disagrees, I'd like compelling evidence that those races can
> occur.

Indeed, the newest documentation [2] seems to be stating the same.

[2] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107

My reading of the documentation is the same as Laurent's. Hiro, did
you hear something contradictory from the Android framework team?

Best regards,
Tomasz

>
> > > > > > by inspecting the CameraState, and force a wait for any flush() call
> > > > > > to complete before proceeding.
> > > > > >
> > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > ---
> > > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> > > > > >  src/android/camera_device.h   |  9 +++-
> > > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > index 3fce14035718..899afaa49439 100644
> > > > > > --- a/src/android/camera_device.cpp
> > > > > > +++ b/src/android/camera_device.cpp
> > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > >
> > > > > >  void CameraDevice::close()
> > > > > >  {
> > > > > > -     streams_.clear();
> > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > >
> > > I'd add a blank line here.
> > >
> > > > > > +     if (state_ == CameraFlushing) {
> > >
> > > As mentioned above, I don't think you need to protect against close()
> > > and flush() racing each other.
> > >
> > > > > > +             flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > > +             camera_->release();
> > > > > >
> > > > > > +             return;
> > > > > > +     }
> > > > > > +
> > > > > > +     streams_.clear();
> > > > > >       stop();
> > > > > >
> > > > > >       camera_->release();
> > > > > >  }
> > > > > >
> > > > > > -void CameraDevice::stop()
> > > > > > +/*
> > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
> > >
> > > s/wait/waits/
> > >
> > > > > > + * until all the in-flight requests have been returned before setting the
> > > > > > + * camera state to stopped.
> > > > > > + *
> > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > > > > > + * configureStreams().
> > > > > > + */
> > > > > > +void CameraDevice::flush()
> > > > > >  {
> > > > > > +     {
> > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > +
> > > > > > +             if (state_ != CameraRunning)
> > > > > > +                     return;
> > > > > > +
> > > > > > +             worker_.stop();
> > > > > > +             camera_->stop();
> > > > > > +             state_ = CameraFlushing;
> > > > > > +     }
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Now wait for all the in-flight requests to be completed before
> > > > > > +      * continuing. Stopping the Camera guarantees that all in-flight
> > > > > > +      * requests are completed in error state.
> > >
> > > Do we need to wait ? Camera::stop() guarantees that all requests
> > > complete synchronously with the stop() call.
> >
> > I didn't get the API that way... I thought after stop we would receive
> > a sequence of failed requests... Actually I don't see anything that
> > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion
> > in Camera::stop()
>
> The camera::stop() documentation states
>
>  * This method stops capturing and processing requests immediately. All pending
>  * requests are cancelled and complete synchronously in an error state.
>
> Is this ambiguous ?
>
> > > Partly answering myself here, we'll have to wait for post-processing
> > > tasks to complete once we'll process them in a separate thread, but that
> > > will likely be handled by Thread::wait(). I don't think you need a
> > > condition variable here. I'm I'm not mistaken, this should simplify the
> > > implementation.
> >
> > If Camera::stop() is synchronous we don't need to wait indeed
> >
> > > > > > +      */
> > > > > > +     {
> > > > > > +             MutexLocker requestsLock(requestsMutex_);
> > > > > > +             flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > > > > > +     }
> > > > >
> > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In
> > > > > patch 6/8 you add it to protect the state_ variable but here it's
> > > >
> > > > I'm not changing state_ without the mutex acquired, am I ?
> > > >
> > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the
> > > > > lock checking state_, releasing the lock and do some work, retake the
> > > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > > >
> > > > How so, apart from the fact it feels a bit unusual, I concur ?
> > > >
> > > > If I keep the held the mutex for the whole duration of flush no other
> > > > concurrent method can proceed until all the queued requests have not
> > > > been completed. While flush waits for the flushing_ condition to be
> > > > signaled, processCaptureRequest() can proceed and immediately return
> > > > the newly queued requests in error state by detecting state_ ==
> > > > CameraFlushing which signals that flush in is progress.
> > > > Otherwise it would have had to wait for flush to end. But then we're back
> > > > to a situation where we could serialize all calls and that's it, we
> > > > would be done with a single mutex to be held for the whole duration of
> > > > all operations.
> > > >
> > > > If it only was for close() or configureStreams() we could have locked
> > > > for the whole duration of flush(), as they anyway wait for flush to
> > > > complete before proceeding (by waiting on the flushed_ condition here
> > > > below signaled).
> > > >
> > > > > something and this is not a real problem, if so maybe we can capture
> > > > > that in the comment here?
> > > > >
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Set state to stopped and unlock close() or configureStreams() that
> > > > > > +      * might be waiting for flush to be completed.
> > > > > > +      */
> > > > > >       MutexLocker cameraLock(cameraMutex_);
> > > > > > +     state_ = CameraStopped;
> > > > > > +     flushed_.notify_one();
> > >
> > > You should drop the lock before calling notify_one(). Otherwise you'll
> > > wake up the task waiting on flushed_, which will try to lock
> > > cameraMutex_, which will block immediately. The scheduler will have to
> > > reschedule this task for the function to return and the lock to be
> > > released before the waiter can proceed. That works, but isn't very
> > > efficient.
> >
> > Weird, the cpp reference shows example about notify_one where the
> > caller always has the mutex held locked, but I see your point and
> > seems correct..
>
> I'm looking at
> https://en.cppreference.com/w/cpp/thread/condition_variable and
> https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
> and both calls to notify_one() in the example are made without the lock
> held, aren't they ?
>
> > >
> > >     {
> > >             MutexLocker cameraLock(cameraMutex_);
> > >             state_ = CameraStopped;
> > >     }
> > >
> > >     flushed_.notify_one();
> > >
> >
> > So I could change to this one, if I don't have to drop this part
> > completely if we consider close() and configureStreams() not as
> > possible races...
> >
> > > > > > +}
> > > > > > +
> > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > > > > > +void CameraDevice::stop()
> > > > > > +{
> > > > > > +     ASSERT(state_ != CameraFlushing);
> > > > > > +
> > > > > >       if (state_ == CameraStopped)
> > > > > >               return;
> > > > > >
> > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > > > >   */
> > > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > > > >  {
> > > > > > -     /* Before any configuration attempt, stop the camera. */
> > > > > > -     stop();
> > > > > > +     {
> > > > > > +             /*
> > > > > > +              * If a flush is in progress, wait for it to complete and to
> > > > > > +              * stop the camera, otherwise before any new configuration
> > > > > > +              * attempt we have to stop the camera explictely.
> > > > > > +              */
> > >
> > > Same here, I don't think flush() and configure_streams() can race each
> > > other. I believe the only possible race to be between flush() and
> > > process_capture_request().
> >
> > Ditto.
> >
> > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > +             if (state_ == CameraFlushing)
> > > > > > +                     flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > > +             else
> > > > > > +                     stop();
> > > > > > +     }
> > > > > >
> > > > > >       if (stream_list->num_streams == 0) {
> > > > > >               LOG(HAL, Error) << "No streams in configuration";
> > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > > >       if (ret)
> > > > > >               return ret;
> > > > > >
> > > > > > +     /*
> > > > > > +      * Just before queuing the request, make sure flush() has not
> > > > > > +      * been called after this function has been executed. In that
> > > > > > +      * case, immediately return the request with errors.
> > > > > > +      */
> > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > > > > > +     if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > > > > +             for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > > > > > +                     buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > > > +                     buffer.release_fence = buffer.acquire_fence;
> > > > > > +             }
> > > > > > +
> > > > > > +             notifyError(descriptor.frameNumber_,
> > > > > > +                         descriptor.buffers_[0].stream,
> > >
> > > As commented on a previous patch, I think you should pass nullptr for
> > > the stream here.
> >
> > The "S6. Error management:" section of the camera3.h header does not
> > mention that, not the ?
>
> Indeed, that section doesn't mention the camera3_error_msg::error_stream
> field at all. The field is documented in the structure as
>
>     /**
>      * Pointer to the stream that had a failure. NULL if the stream isn't
>      * applicable to the error.
>      */
>
> The question is thus when the stream is applicable to the error. The
> documentation of enum camera3_error_msg_code mentions error_stream in
> the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to
> the device, the request or the result metadata, which are not specific
> to a stream.
>
> > where does you suggestion come from ? I don't find any reference in
> > the review of [1/8]
>
> ([PATCH v3 1/8] android: Rework request completion notification'
> (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)
>
> > > > > > +                         CAMERA3_MSG_ERROR_REQUEST);
> > > > > > +
> > > > > > +             return 0;
> > > > > > +     }
> > > > > > +
> > > > > >       worker_.queueRequest(descriptor.request_.get());
> > > > > >
> > > > > >       {
> > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> > > > > >                       return;
> > > > > >               }
> > > > > >
> > > > > > +             /* Release flush if all the pending requests have been completed. */
> > > > > > +             if (descriptors_.empty())
> > > > > > +                     flushing_.notify_one();
> > >
> > > This will never happen, as you can only get here if descriptors_.find()
> > > has found the descriptor. Did you mean to do this after the extract()
> > > call below ?
> >
> > Ugh. This works only because Camera::stop() is synchronous then ?
>
> I believe so.
>
> > > > > > +
> > > > > >               node = descriptors_.extract(it);
> > > > > >       }
> > > > > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > > --- a/src/android/camera_device.h
> > > > > > +++ b/src/android/camera_device.h
> > > > > > @@ -7,6 +7,7 @@
> > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > > >
> > > > > > +#include <condition_variable>
> > > > > >  #include <map>
> > > > > >  #include <memory>
> > > > > >  #include <mutex>
> > > > > > @@ -42,6 +43,7 @@ public:
> > > > > >
> > > > > >       int open(const hw_module_t *hardwareModule);
> > > > > >       void close();
> > > > > > +     void flush();
> > > > > >
> > > > > >       unsigned int id() const { return id_; }
> > > > > >       camera3_device_t *camera3Device() { return &camera3Device_; }
> > > > > > @@ -92,6 +94,7 @@ private:
> > > > > >       enum State {
> > > > > >               CameraStopped,
> > > > > >               CameraRunning,
> > > > > > +             CameraFlushing,
> > > > > >       };
> > > > > >
> > > > > >       void stop();
> > > > > > @@ -120,8 +123,9 @@ private:
> > > > > >
> > > > > >       CameraWorker worker_;
> > > > > >
> > > > > > -     libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > > > > > +     libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> > > > > >       State state_;
> > > > > > +     std::condition_variable flushed_;
> > > > > >
> > > > > >       std::shared_ptr<libcamera::Camera> camera_;
> > > > > >       std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > > @@ -134,8 +138,9 @@ private:
> > > > > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > > >       std::vector<CameraStream> streams_;
> > > > > >
> > > > > > -     libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > > > > > +     libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> > > > > >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > > > > +     std::condition_variable flushing_;
> > > > > >
> > > > > >       std::string maker_;
> > > > > >       std::string model_;
> > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > > --- a/src/android/camera_ops.cpp
> > > > > > +++ b/src/android/camera_ops.cpp
> > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> > > > > >  {
> > > > > >  }
> > > > > >
> > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > > >  {
> > > > > > +     if (!dev)
> > > > > > +             return -EINVAL;
> > > > > > +
> > > > > > +     CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > > > > > +     camera->flush();
> > > > > > +
> > > > > >       return 0;
> > > > > >  }
> > > > > >
>
> --
> Regards,
>
> Laurent Pinchart
Hirokazu Honda May 27, 2021, 3:59 a.m. UTC | #8
Hi Laurent and Tomasz.

On Thu, May 27, 2021 at 11:49 AM Tomasz Figa <tfiga@chromium.org> wrote:

> Hi Laurent,
>
> On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >
> > Hi Jacopo,
> >
> > (expanding the CC list to finalize the race conditions discussion)
> >
> > On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:
> > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > > > Implement the flush() camera operation in the CameraDevice
> class
> > > > > > > and make it available to the camera framework by implementing
> the
> > > > > > > operation wrapper in camera_ops.cpp.
> > > > > > >
> > > > > > > The flush() implementation stops the Camera and the worker
> thread and
> > > > > > > waits for all in-flight requests to be returned. Stopping the
> Camera
> > > > > > > forces all Requests already queued to be returned immediately
> in error
> > > > > > > state. As flush() has to wait until all of them have been
> returned, make it
> > > > > > > wait on a newly introduced condition variable which is
> notified by the
> > > > > > > request completion handler when the queue of pending requests
> has been
> > > > > > > exhausted.
> > > > > > >
> > > > > > > As flush() can race with processCaptureRequest() protect the
> requests
> > > > > > > queueing by introducing a new CameraState::CameraFlushing
> state that
> > > > > > > processCaptureRequest() inspects before queuing the Request to
> the
> > > > > > > Camera. If flush() has been called while
> processCaptureRequest() was
> > > > > > > executing, return the current Request immediately in error
> state.
> > > > > > >
> > > > > > > Protect potentially concurrent calls to close() and
> configureStreams()
> > > >
> > > > Can this happen ? Quoting camera3.h,
> > > >
> > > >  * 12. Alternatively, the framework may call
> camera3_device_t->common->close()
> > > >  *    to end the camera session. This may be called at any time when
> no other
> > > >  *    calls from the framework are active, although the call may
> block until all
> > > >  *    in-flight captures have completed (all results returned, all
> buffers
> > > >  *    filled). After the close call returns, no more calls to the
> > > >  *    camera3_callback_ops_t functions are allowed from the HAL.
> Once the
> > > >  *    close() call is underway, the framework may not call any other
> HAL device
> > > >  *    functions.
> > > >
> > > > The important part is "when no other calss from the framework are
> > > > active". I don't think we need to handle close() racing with anything
> > > > else than process_capture_request().
> > >
> > > I've been discussing this with Hiro during v1, as initially I didn't
> > > consider close() and configureStreams().
> > >
> > > https://patchwork.libcamera.org/patch/12248/#16884
> > >
> > > I initially only considered processCaptureRequest() as a potential
> > > race, but got suggested differently by the cros camera team.
> >
> > Let's try to get to the bottom of this.
> >
> > Section S2 ("Startup and general expected operation sequence") states:
> >
> >  * 12. Alternatively, the framework may call
> camera3_device_t->common->close()
> >  *    to end the camera session. This may be called at any time when no
> other
> >  *    calls from the framework are active, although the call may block
> until all
> >  *    in-flight captures have completed (all results returned, all
> buffers
> >  *    filled). After the close call returns, no more calls to the
> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> >  *    close() call is underway, the framework may not call any other HAL
> device
> >  *    functions.
> >
> > There can be in-flight requests when .close() is called, but it can't be
> > called concurrently with any other call. There's thus no race condition
> > to protect against.
>
> Note that camera3.h is considered outdated, as the interface updates
> have been reflected only in the HIDL version [1]. However, I don't see
> a HIDL counterpart of the section you mentioned.
>
> [1]
> https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/
>
> >
> > The .configure_streams() documentation states:
> >
> >      * Preconditions:
> >      *
> >      * The framework will only call this method when no captures are
> being
> >      * processed. That is, all results have been returned to the
> framework, and
> >      * all in-flight input and output buffers have been returned and
> their
> >      * release sync fences have been signaled by the HAL. The framework
> will not
> >      * submit new requests for capture while the configure_streams()
> call is
> >      * underway.
> >
> > This clearly forbids calling .configure_streams() and
> > .process_capture_request() concurrently.
> >
> > The .flush() documentation states:
> >
> >      * Flush all currently in-process captures and all buffers in the
> pipeline
> >      * on the given device. The framework will use this to dump all
> state as
> >      * quickly as possible in order to prepare for a configure_streams()
> call.
> >
> > I interpret this as at least a very strong hint that .flush() and
> > .configure_streams() can't be called concurrently :-)
> >
> > If anyone disagrees, I'd like compelling evidence that those races can
> > occur.
>
> Indeed, the newest documentation [2] seems to be stating the same.
>
> [2]
> https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107
>
> My reading of the documentation is the same as Laurent's. Hiro, did
> you hear something contradictory from the Android framework team?
>
>
I got from Ricky that flush() and close() can be called anytime.
Ricky, have we requested the Android framework team to specify this in the
document?

Additionally, in the discussion mail with Android framework team, I found
the Android team told
> We also guarantee we won't call configureStreams while
processCaptureRequest is being called (or if there are any capture requests
that haven't yet been completed, for that matter).
So probably some protection in Jacopo's code can be saved. Sorry Jacopo for
reworking.

-Hiro


> Best regards,
> Tomasz
>
> >
> > > > > > > by inspecting the CameraState, and force a wait for any
> flush() call
> > > > > > > to complete before proceeding.
> > > > > > >
> > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > > ---
> > > > > > >  src/android/camera_device.cpp | 90
> +++++++++++++++++++++++++++++++++--
> > > > > > >  src/android/camera_device.h   |  9 +++-
> > > > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
> > > > > > > index 3fce14035718..899afaa49439 100644
> > > > > > > --- a/src/android/camera_device.cpp
> > > > > > > +++ b/src/android/camera_device.cpp
> > > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t
> *hardwareModule)
> > > > > > >
> > > > > > >  void CameraDevice::close()
> > > > > > >  {
> > > > > > > -     streams_.clear();
> > > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > > >
> > > > I'd add a blank line here.
> > > >
> > > > > > > +     if (state_ == CameraFlushing) {
> > > >
> > > > As mentioned above, I don't think you need to protect against close()
> > > > and flush() racing each other.
> > > >
> > > > > > > +             flushed_.wait(cameraLock, [&] { return state_ !=
> CameraStopped; });
> > > > > > > +             camera_->release();
> > > > > > >
> > > > > > > +             return;
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     streams_.clear();
> > > > > > >       stop();
> > > > > > >
> > > > > > >       camera_->release();
> > > > > > >  }
> > > > > > >
> > > > > > > -void CameraDevice::stop()
> > > > > > > +/*
> > > > > > > + * Flush is similar to stop() but sets the camera state to
> 'flushing' and wait
> > > >
> > > > s/wait/waits/
> > > >
> > > > > > > + * until all the in-flight requests have been returned before
> setting the
> > > > > > > + * camera state to stopped.
> > > > > > > + *
> > > > > > > + * Once flushing is done it unlocks concurrent calls to
> camera close() and
> > > > > > > + * configureStreams().
> > > > > > > + */
> > > > > > > +void CameraDevice::flush()
> > > > > > >  {
> > > > > > > +     {
> > > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > > +
> > > > > > > +             if (state_ != CameraRunning)
> > > > > > > +                     return;
> > > > > > > +
> > > > > > > +             worker_.stop();
> > > > > > > +             camera_->stop();
> > > > > > > +             state_ = CameraFlushing;
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * Now wait for all the in-flight requests to be
> completed before
> > > > > > > +      * continuing. Stopping the Camera guarantees that all
> in-flight
> > > > > > > +      * requests are completed in error state.
> > > >
> > > > Do we need to wait ? Camera::stop() guarantees that all requests
> > > > complete synchronously with the stop() call.
> > >
> > > I didn't get the API that way... I thought after stop we would receive
> > > a sequence of failed requests... Actually I don't see anything that
> > > suggests that in camera.cpp or pipeline_handler.cpp apart from an
> assertion
> > > in Camera::stop()
> >
> > The camera::stop() documentation states
> >
> >  * This method stops capturing and processing requests immediately. All
> pending
> >  * requests are cancelled and complete synchronously in an error state.
> >
> > Is this ambiguous ?
>

Perhaps, should "and complete synchronously" be rephrased "and
Camera::requestComplete() is executed for all of them before this returns"?

>
> > > > Partly answering myself here, we'll have to wait for post-processing
> > > > tasks to complete once we'll process them in a separate thread, but
> that
> > > > will likely be handled by Thread::wait(). I don't think you need a
> > > > condition variable here. I'm I'm not mistaken, this should simplify
> the
> > > > implementation.
> > >
> > > If Camera::stop() is synchronous we don't need to wait indeed
> > >
> > > > > > > +      */
> > > > > > > +     {
> > > > > > > +             MutexLocker requestsLock(requestsMutex_);
> > > > > > > +             flushing_.wait(requestsLock, [&] { return
> descriptors_.empty(); });
> > > > > > > +     }
> > > > > >
> > > > > > I'm still uneasy about releasing the cameraMutex_ for this
> section. In
> > > > > > patch 6/8 you add it to protect the state_ variable but here it's
> > > > >
> > > > > I'm not changing state_ without the mutex acquired, am I ?
> > > > >
> > > > > > ignored. I see the ASSERT() added to stop() but the patter of
> taking the
> > > > > > lock checking state_, releasing the lock and do some work,
> retake the
> > > > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > > > >
> > > > > How so, apart from the fact it feels a bit unusual, I concur ?
> > > > >
> > > > > If I keep the held the mutex for the whole duration of flush no
> other
> > > > > concurrent method can proceed until all the queued requests have
> not
> > > > > been completed. While flush waits for the flushing_ condition to be
> > > > > signaled, processCaptureRequest() can proceed and immediately
> return
> > > > > the newly queued requests in error state by detecting state_ ==
> > > > > CameraFlushing which signals that flush in is progress.
> > > > > Otherwise it would have had to wait for flush to end. But then
> we're back
> > > > > to a situation where we could serialize all calls and that's it, we
> > > > > would be done with a single mutex to be held for the whole
> duration of
> > > > > all operations.
> > > > >
> > > > > If it only was for close() or configureStreams() we could have
> locked
> > > > > for the whole duration of flush(), as they anyway wait for flush to
> > > > > complete before proceeding (by waiting on the flushed_ condition
> here
> > > > > below signaled).
> > > > >
> > > > > > something and this is not a real problem, if so maybe we can
> capture
> > > > > > that in the comment here?
> > > > > >
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * Set state to stopped and unlock close() or
> configureStreams() that
> > > > > > > +      * might be waiting for flush to be completed.
> > > > > > > +      */
> > > > > > >       MutexLocker cameraLock(cameraMutex_);
> > > > > > > +     state_ = CameraStopped;
> > > > > > > +     flushed_.notify_one();
> > > >
> > > > You should drop the lock before calling notify_one(). Otherwise
> you'll
> > > > wake up the task waiting on flushed_, which will try to lock
> > > > cameraMutex_, which will block immediately. The scheduler will have
> to
> > > > reschedule this task for the function to return and the lock to be
> > > > released before the waiter can proceed. That works, but isn't very
> > > > efficient.
> > >
> > > Weird, the cpp reference shows example about notify_one where the
> > > caller always has the mutex held locked, but I see your point and
> > > seems correct..
> >
> > I'm looking at
> > https://en.cppreference.com/w/cpp/thread/condition_variable and
> > https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
> > and both calls to notify_one() in the example are made without the lock
> > held, aren't they ?
> >
> > > >
> > > >     {
> > > >             MutexLocker cameraLock(cameraMutex_);
> > > >             state_ = CameraStopped;
> > > >     }
> > > >
> > > >     flushed_.notify_one();
> > > >
> > >
> > > So I could change to this one, if I don't have to drop this part
> > > completely if we consider close() and configureStreams() not as
> > > possible races...
> > >
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Calls to stop() must be protected by cameraMutex_ being
> held by the caller. */
> > > > > > > +void CameraDevice::stop()
> > > > > > > +{
> > > > > > > +     ASSERT(state_ != CameraFlushing);
> > > > > > > +
> > > > > > >       if (state_ == CameraStopped)
> > > > > > >               return;
> > > > > > >
> > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat
> CameraDevice::toPixelFormat(int format) const
> > > > > > >   */
> > > > > > >  int
> CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > > > > >  {
> > > > > > > -     /* Before any configuration attempt, stop the camera. */
> > > > > > > -     stop();
> > > > > > > +     {
> > > > > > > +             /*
> > > > > > > +              * If a flush is in progress, wait for it to
> complete and to
> > > > > > > +              * stop the camera, otherwise before any new
> configuration
> > > > > > > +              * attempt we have to stop the camera explictely.
> > > > > > > +              */
> > > >
> > > > Same here, I don't think flush() and configure_streams() can race
> each
> > > > other. I believe the only possible race to be between flush() and
> > > > process_capture_request().
> > >
> > > Ditto.
> > >
> > > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > > +             if (state_ == CameraFlushing)
> > > > > > > +                     flushed_.wait(cameraLock, [&] { return
> state_ != CameraStopped; });
> > > > > > > +             else
> > > > > > > +                     stop();
> > > > > > > +     }
> > > > > > >
> > > > > > >       if (stream_list->num_streams == 0) {
> > > > > > >               LOG(HAL, Error) << "No streams in configuration";
> > > > > > > @@ -1950,6 +2009,25 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > > > >       if (ret)
> > > > > > >               return ret;
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * Just before queuing the request, make sure flush()
> has not
> > > > > > > +      * been called after this function has been executed. In
> that
> > > > > > > +      * case, immediately return the request with errors.
> > > > > > > +      */
> > > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > > > > > > +     if (state_ == CameraFlushing || state_ == CameraStopped)
> {
> > > > > > > +             for (camera3_stream_buffer_t &buffer :
> descriptor.buffers_) {
> > > > > > > +                     buffer.status =
> CAMERA3_BUFFER_STATUS_ERROR;
> > > > > > > +                     buffer.release_fence =
> buffer.acquire_fence;
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             notifyError(descriptor.frameNumber_,
> > > > > > > +                         descriptor.buffers_[0].stream,
> > > >
> > > > As commented on a previous patch, I think you should pass nullptr for
> > > > the stream here.
> > >
> > > The "S6. Error management:" section of the camera3.h header does not
> > > mention that, not the ?
> >
> > Indeed, that section doesn't mention the camera3_error_msg::error_stream
> > field at all. The field is documented in the structure as
> >
> >     /**
> >      * Pointer to the stream that had a failure. NULL if the stream isn't
> >      * applicable to the error.
> >      */
> >
> > The question is thus when the stream is applicable to the error. The
> > documentation of enum camera3_error_msg_code mentions error_stream in
> > the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to
> > the device, the request or the result metadata, which are not specific
> > to a stream.
> >
> > > where does you suggestion come from ? I don't find any reference in
> > > the review of [1/8]
> >
> > ([PATCH v3 1/8] android: Rework request completion notification'
> > (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)
> >
> > > > > > > +                         CAMERA3_MSG_ERROR_REQUEST);
> > > > > > > +
> > > > > > > +             return 0;
> > > > > > > +     }
> > > > > > > +
> > > > > > >       worker_.queueRequest(descriptor.request_.get());
> > > > > > >
> > > > > > >       {
> > > > > > > @@ -1979,6 +2057,10 @@ void
> CameraDevice::requestComplete(Request *request)
> > > > > > >                       return;
> > > > > > >               }
> > > > > > >
> > > > > > > +             /* Release flush if all the pending requests
> have been completed. */
> > > > > > > +             if (descriptors_.empty())
> > > > > > > +                     flushing_.notify_one();
> > > >
> > > > This will never happen, as you can only get here if
> descriptors_.find()
> > > > has found the descriptor. Did you mean to do this after the extract()
> > > > call below ?
> > >
> > > Ugh. This works only because Camera::stop() is synchronous then ?
> >
> > I believe so.
> >
> > > > > > > +
> > > > > > >               node = descriptors_.extract(it);
> > > > > > >       }
> > > > > > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > > > diff --git a/src/android/camera_device.h
> b/src/android/camera_device.h
> > > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > > > --- a/src/android/camera_device.h
> > > > > > > +++ b/src/android/camera_device.h
> > > > > > > @@ -7,6 +7,7 @@
> > > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > > > >
> > > > > > > +#include <condition_variable>
> > > > > > >  #include <map>
> > > > > > >  #include <memory>
> > > > > > >  #include <mutex>
> > > > > > > @@ -42,6 +43,7 @@ public:
> > > > > > >
> > > > > > >       int open(const hw_module_t *hardwareModule);
> > > > > > >       void close();
> > > > > > > +     void flush();
> > > > > > >
> > > > > > >       unsigned int id() const { return id_; }
> > > > > > >       camera3_device_t *camera3Device() { return
> &camera3Device_; }
> > > > > > > @@ -92,6 +94,7 @@ private:
> > > > > > >       enum State {
> > > > > > >               CameraStopped,
> > > > > > >               CameraRunning,
> > > > > > > +             CameraFlushing,
> > > > > > >       };
> > > > > > >
> > > > > > >       void stop();
> > > > > > > @@ -120,8 +123,9 @@ private:
> > > > > > >
> > > > > > >       CameraWorker worker_;
> > > > > > >
> > > > > > > -     libcamera::Mutex cameraMutex_; /* Protects access to the
> camera state. */
> > > > > > > +     libcamera::Mutex cameraMutex_; /* Protects the camera
> state and flushed_. */
> > > > > > >       State state_;
> > > > > > > +     std::condition_variable flushed_;
> > > > > > >
> > > > > > >       std::shared_ptr<libcamera::Camera> camera_;
> > > > > > >       std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > > > @@ -134,8 +138,9 @@ private:
> > > > > > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > > > >       std::vector<CameraStream> streams_;
> > > > > > >
> > > > > > > -     libcamera::Mutex requestsMutex_; /* Protects
> descriptors_. */
> > > > > > > +     libcamera::Mutex requestsMutex_; /* Protects
> descriptors_ and flushing_. */
> > > > > > >       std::map<uint64_t, Camera3RequestDescriptor>
> descriptors_;
> > > > > > > +     std::condition_variable flushing_;
> > > > > > >
> > > > > > >       std::string maker_;
> > > > > > >       std::string model_;
> > > > > > > diff --git a/src/android/camera_ops.cpp
> b/src/android/camera_ops.cpp
> > > > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > > > --- a/src/android/camera_ops.cpp
> > > > > > > +++ b/src/android/camera_ops.cpp
> > > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]]
> const struct camera3_device *dev,
> > > > > > >  {
> > > > > > >  }
> > > > > > >
> > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct
> camera3_device *dev)
> > > > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > > > >  {
> > > > > > > +     if (!dev)
> > > > > > > +             return -EINVAL;
> > > > > > > +
> > > > > > > +     CameraDevice *camera = reinterpret_cast<CameraDevice
> *>(dev->priv);
> > > > > > > +     camera->flush();
> > > > > > > +
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
>
Jacopo Mondi May 27, 2021, 7:46 a.m. UTC | #9
Hi Laurent,
   thanks for the detailed answer

On Thu, May 27, 2021 at 05:26:51AM +0300, Laurent Pinchart wrote:
> Hi Jacopo,
>
> (expanding the CC list to finalize the race conditions discussion)
>
> On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:
> > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > > Implement the flush() camera operation in the CameraDevice class
> > > > > > and make it available to the camera framework by implementing the
> > > > > > operation wrapper in camera_ops.cpp.
> > > > > >
> > > > > > The flush() implementation stops the Camera and the worker thread and
> > > > > > waits for all in-flight requests to be returned. Stopping the Camera
> > > > > > forces all Requests already queued to be returned immediately in error
> > > > > > state. As flush() has to wait until all of them have been returned, make it
> > > > > > wait on a newly introduced condition variable which is notified by the
> > > > > > request completion handler when the queue of pending requests has been
> > > > > > exhausted.
> > > > > >
> > > > > > As flush() can race with processCaptureRequest() protect the requests
> > > > > > queueing by introducing a new CameraState::CameraFlushing state that
> > > > > > processCaptureRequest() inspects before queuing the Request to the
> > > > > > Camera. If flush() has been called while processCaptureRequest() was
> > > > > > executing, return the current Request immediately in error state.
> > > > > >
> > > > > > Protect potentially concurrent calls to close() and configureStreams()
> > >
> > > Can this happen ? Quoting camera3.h,
> > >
> > >  * 12. Alternatively, the framework may call camera3_device_t->common->close()
> > >  *    to end the camera session. This may be called at any time when no other
> > >  *    calls from the framework are active, although the call may block until all
> > >  *    in-flight captures have completed (all results returned, all buffers
> > >  *    filled). After the close call returns, no more calls to the
> > >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> > >  *    close() call is underway, the framework may not call any other HAL device
> > >  *    functions.
> > >
> > > The important part is "when no other calss from the framework are
> > > active". I don't think we need to handle close() racing with anything
> > > else than process_capture_request().
> >
> > I've been discussing this with Hiro during v1, as initially I didn't
> > consider close() and configureStreams().
> >
> > https://patchwork.libcamera.org/patch/12248/#16884
> >
> > I initially only considered processCaptureRequest() as a potential
> > race, but got suggested differently by the cros camera team.
>
> Let's try to get to the bottom of this.
>
> Section S2 ("Startup and general expected operation sequence") states:
>
>  * 12. Alternatively, the framework may call camera3_device_t->common->close()
>  *    to end the camera session. This may be called at any time when no other
>  *    calls from the framework are active, although the call may block until all
>  *    in-flight captures have completed (all results returned, all buffers
>  *    filled). After the close call returns, no more calls to the
>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
>  *    close() call is underway, the framework may not call any other HAL device
>  *    functions.
>
> There can be in-flight requests when .close() is called, but it can't be
> called concurrently with any other call. There's thus no race condition
> to protect against.
>
> The .configure_streams() documentation states:
>
>      * Preconditions:
>      *
>      * The framework will only call this method when no captures are being
>      * processed. That is, all results have been returned to the framework, and
>      * all in-flight input and output buffers have been returned and their
>      * release sync fences have been signaled by the HAL. The framework will not
>      * submit new requests for capture while the configure_streams() call is
>      * underway.
>
> This clearly forbids calling .configure_streams() and
> .process_capture_request() concurrently.
>
> The .flush() documentation states:
>
>      * Flush all currently in-process captures and all buffers in the pipeline
>      * on the given device. The framework will use this to dump all state as
>      * quickly as possible in order to prepare for a configure_streams() call.
>
> I interpret this as at least a very strong hint that .flush() and
> .configure_streams() can't be called concurrently :-)
>
> If anyone disagrees, I'd like compelling evidence that those races can
> occur.
>
> > > > > > by inspecting the CameraState, and force a wait for any flush() call
> > > > > > to complete before proceeding.
> > > > > >
> > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > > > > > ---
> > > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> > > > > >  src/android/camera_device.h   |  9 +++-
> > > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > index 3fce14035718..899afaa49439 100644
> > > > > > --- a/src/android/camera_device.cpp
> > > > > > +++ b/src/android/camera_device.cpp
> > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > >
> > > > > >  void CameraDevice::close()
> > > > > >  {
> > > > > > -	streams_.clear();
> > > > > > +	MutexLocker cameraLock(cameraMutex_);
> > >
> > > I'd add a blank line here.
> > >
> > > > > > +	if (state_ == CameraFlushing) {
> > >
> > > As mentioned above, I don't think you need to protect against close()
> > > and flush() racing each other.
> > >
> > > > > > +		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > > +		camera_->release();
> > > > > >
> > > > > > +		return;
> > > > > > +	}
> > > > > > +
> > > > > > +	streams_.clear();
> > > > > >  	stop();
> > > > > >
> > > > > >  	camera_->release();
> > > > > >  }
> > > > > >
> > > > > > -void CameraDevice::stop()
> > > > > > +/*
> > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
> > >
> > > s/wait/waits/
> > >
> > > > > > + * until all the in-flight requests have been returned before setting the
> > > > > > + * camera state to stopped.
> > > > > > + *
> > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > > > > > + * configureStreams().
> > > > > > + */
> > > > > > +void CameraDevice::flush()
> > > > > >  {
> > > > > > +	{
> > > > > > +		MutexLocker cameraLock(cameraMutex_);
> > > > > > +
> > > > > > +		if (state_ != CameraRunning)
> > > > > > +			return;
> > > > > > +
> > > > > > +		worker_.stop();
> > > > > > +		camera_->stop();
> > > > > > +		state_ = CameraFlushing;
> > > > > > +	}
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Now wait for all the in-flight requests to be completed before
> > > > > > +	 * continuing. Stopping the Camera guarantees that all in-flight
> > > > > > +	 * requests are completed in error state.
> > >
> > > Do we need to wait ? Camera::stop() guarantees that all requests
> > > complete synchronously with the stop() call.
> >
> > I didn't get the API that way... I thought after stop we would receive
> > a sequence of failed requests... Actually I don't see anything that
> > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion
> > in Camera::stop()
>
> The camera::stop() documentation states
>
>  * This method stops capturing and processing requests immediately. All pending
>  * requests are cancelled and complete synchronously in an error state.
>
> Is this ambiguous ?
>

I admit I haven't looked at documentation but only the code paths..

> > > Partly answering myself here, we'll have to wait for post-processing
> > > tasks to complete once we'll process them in a separate thread, but that
> > > will likely be handled by Thread::wait(). I don't think you need a
> > > condition variable here. I'm I'm not mistaken, this should simplify the
> > > implementation.
> >
> > If Camera::stop() is synchronous we don't need to wait indeed
> >
> > > > > > +	 */
> > > > > > +	{
> > > > > > +		MutexLocker requestsLock(requestsMutex_);
> > > > > > +		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > > > > > +	}
> > > > >
> > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In
> > > > > patch 6/8 you add it to protect the state_ variable but here it's
> > > >
> > > > I'm not changing state_ without the mutex acquired, am I ?
> > > >
> > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the
> > > > > lock checking state_, releasing the lock and do some work, retake the
> > > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > > >
> > > > How so, apart from the fact it feels a bit unusual, I concur ?
> > > >
> > > > If I keep the held the mutex for the whole duration of flush no other
> > > > concurrent method can proceed until all the queued requests have not
> > > > been completed. While flush waits for the flushing_ condition to be
> > > > signaled, processCaptureRequest() can proceed and immediately return
> > > > the newly queued requests in error state by detecting state_ ==
> > > > CameraFlushing which signals that flush in is progress.
> > > > Otherwise it would have had to wait for flush to end. But then we're back
> > > > to a situation where we could serialize all calls and that's it, we
> > > > would be done with a single mutex to be held for the whole duration of
> > > > all operations.
> > > >
> > > > If it only was for close() or configureStreams() we could have locked
> > > > for the whole duration of flush(), as they anyway wait for flush to
> > > > complete before proceeding (by waiting on the flushed_ condition here
> > > > below signaled).
> > > >
> > > > > something and this is not a real problem, if so maybe we can capture
> > > > > that in the comment here?
> > > > >
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Set state to stopped and unlock close() or configureStreams() that
> > > > > > +	 * might be waiting for flush to be completed.
> > > > > > +	 */
> > > > > >  	MutexLocker cameraLock(cameraMutex_);
> > > > > > +	state_ = CameraStopped;
> > > > > > +	flushed_.notify_one();
> > >
> > > You should drop the lock before calling notify_one(). Otherwise you'll
> > > wake up the task waiting on flushed_, which will try to lock
> > > cameraMutex_, which will block immediately. The scheduler will have to
> > > reschedule this task for the function to return and the lock to be
> > > released before the waiter can proceed. That works, but isn't very
> > > efficient.
> >
> > Weird, the cpp reference shows example about notify_one where the
> > caller always has the mutex held locked, but I see your point and
> > seems correct..
>
> I'm looking at
> https://en.cppreference.com/w/cpp/thread/condition_variable and
> https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
> and both calls to notify_one() in the example are made without the lock
> held, aren't they ?

o_0 I would swear I've seen different producer/consumer examples in the
documentation. As I assume they haven't changed overnight, it's
clearly my immagination..

>
> > >
> > > 	{
> > > 		MutexLocker cameraLock(cameraMutex_);
> > > 		state_ = CameraStopped;
> > > 	}
> > >
> > > 	flushed_.notify_one();
> > >
> >
> > So I could change to this one, if I don't have to drop this part
> > completely if we consider close() and configureStreams() not as
> > possible races...
> >
> > > > > > +}
> > > > > > +
> > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > > > > > +void CameraDevice::stop()
> > > > > > +{
> > > > > > +	ASSERT(state_ != CameraFlushing);
> > > > > > +
> > > > > >  	if (state_ == CameraStopped)
> > > > > >  		return;
> > > > > >
> > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > > > >   */
> > > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > > > >  {
> > > > > > -	/* Before any configuration attempt, stop the camera. */
> > > > > > -	stop();
> > > > > > +	{
> > > > > > +		/*
> > > > > > +		 * If a flush is in progress, wait for it to complete and to
> > > > > > +		 * stop the camera, otherwise before any new configuration
> > > > > > +		 * attempt we have to stop the camera explictely.
> > > > > > +		 */
> > >
> > > Same here, I don't think flush() and configure_streams() can race each
> > > other. I believe the only possible race to be between flush() and
> > > process_capture_request().
> >
> > Ditto.
> >
> > > > > > +		MutexLocker cameraLock(cameraMutex_);
> > > > > > +		if (state_ == CameraFlushing)
> > > > > > +			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > > +		else
> > > > > > +			stop();
> > > > > > +	}
> > > > > >
> > > > > >  	if (stream_list->num_streams == 0) {
> > > > > >  		LOG(HAL, Error) << "No streams in configuration";
> > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > > >  	if (ret)
> > > > > >  		return ret;
> > > > > >
> > > > > > +	/*
> > > > > > +	 * Just before queuing the request, make sure flush() has not
> > > > > > +	 * been called after this function has been executed. In that
> > > > > > +	 * case, immediately return the request with errors.
> > > > > > +	 */
> > > > > > +	MutexLocker cameraLock(cameraMutex_);
> > > > > > +	if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > > > > +		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > > > > > +			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > > > +			buffer.release_fence = buffer.acquire_fence;
> > > > > > +		}
> > > > > > +
> > > > > > +		notifyError(descriptor.frameNumber_,
> > > > > > +			    descriptor.buffers_[0].stream,
> > >
> > > As commented on a previous patch, I think you should pass nullptr for
> > > the stream here.
> >
> > The "S6. Error management:" section of the camera3.h header does not
> > mention that, not the ?
>
> Indeed, that section doesn't mention the camera3_error_msg::error_stream
> field at all. The field is documented in the structure as
>
>     /**
>      * Pointer to the stream that had a failure. NULL if the stream isn't
>      * applicable to the error.
>      */
>
> The question is thus when the stream is applicable to the error. The
> documentation of enum camera3_error_msg_code mentions error_stream in
> the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to
> the device, the request or the result metadata, which are not specific
> to a stream.
>

Ack, thanks for clarifying

> > where does you suggestion come from ? I don't find any reference in
> > the review of [1/8]
>
> ([PATCH v3 1/8] android: Rework request completion notification'
> (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)
>
> > > > > > +			    CAMERA3_MSG_ERROR_REQUEST);
> > > > > > +
> > > > > > +		return 0;
> > > > > > +	}
> > > > > > +
> > > > > >  	worker_.queueRequest(descriptor.request_.get());
> > > > > >
> > > > > >  	{
> > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> > > > > >  			return;
> > > > > >  		}
> > > > > >
> > > > > > +		/* Release flush if all the pending requests have been completed. */
> > > > > > +		if (descriptors_.empty())
> > > > > > +			flushing_.notify_one();
> > >
> > > This will never happen, as you can only get here if descriptors_.find()
> > > has found the descriptor. Did you mean to do this after the extract()
> > > call below ?
> >
> > Ugh. This works only because Camera::stop() is synchronous then ?
>
> I believe so.
>
> > > > > > +
> > > > > >  		node = descriptors_.extract(it);
> > > > > >  	}
> > > > > >  	Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > > --- a/src/android/camera_device.h
> > > > > > +++ b/src/android/camera_device.h
> > > > > > @@ -7,6 +7,7 @@
> > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > > >
> > > > > > +#include <condition_variable>
> > > > > >  #include <map>
> > > > > >  #include <memory>
> > > > > >  #include <mutex>
> > > > > > @@ -42,6 +43,7 @@ public:
> > > > > >
> > > > > >  	int open(const hw_module_t *hardwareModule);
> > > > > >  	void close();
> > > > > > +	void flush();
> > > > > >
> > > > > >  	unsigned int id() const { return id_; }
> > > > > >  	camera3_device_t *camera3Device() { return &camera3Device_; }
> > > > > > @@ -92,6 +94,7 @@ private:
> > > > > >  	enum State {
> > > > > >  		CameraStopped,
> > > > > >  		CameraRunning,
> > > > > > +		CameraFlushing,
> > > > > >  	};
> > > > > >
> > > > > >  	void stop();
> > > > > > @@ -120,8 +123,9 @@ private:
> > > > > >
> > > > > >  	CameraWorker worker_;
> > > > > >
> > > > > > -	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > > > > > +	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> > > > > >  	State state_;
> > > > > > +	std::condition_variable flushed_;
> > > > > >
> > > > > >  	std::shared_ptr<libcamera::Camera> camera_;
> > > > > >  	std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > > @@ -134,8 +138,9 @@ private:
> > > > > >  	std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > > >  	std::vector<CameraStream> streams_;
> > > > > >
> > > > > > -	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > > > > > +	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> > > > > >  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > > > > +	std::condition_variable flushing_;
> > > > > >
> > > > > >  	std::string maker_;
> > > > > >  	std::string model_;
> > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > > --- a/src/android/camera_ops.cpp
> > > > > > +++ b/src/android/camera_ops.cpp
> > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> > > > > >  {
> > > > > >  }
> > > > > >
> > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > > >  {
> > > > > > +	if (!dev)
> > > > > > +		return -EINVAL;
> > > > > > +
> > > > > > +	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > > > > > +	camera->flush();
> > > > > > +
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
>
> --
> Regards,
>
> Laurent Pinchart

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index 3fce14035718..899afaa49439 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -750,16 +750,65 @@  int CameraDevice::open(const hw_module_t *hardwareModule)
 
 void CameraDevice::close()
 {
-	streams_.clear();
+	MutexLocker cameraLock(cameraMutex_);
+	if (state_ == CameraFlushing) {
+		flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
+		camera_->release();
 
+		return;
+	}
+
+	streams_.clear();
 	stop();
 
 	camera_->release();
 }
 
-void CameraDevice::stop()
+/*
+ * Flush is similar to stop() but sets the camera state to 'flushing' and wait
+ * until all the in-flight requests have been returned before setting the
+ * camera state to stopped.
+ *
+ * Once flushing is done it unlocks concurrent calls to camera close() and
+ * configureStreams().
+ */
+void CameraDevice::flush()
 {
+	{
+		MutexLocker cameraLock(cameraMutex_);
+
+		if (state_ != CameraRunning)
+			return;
+
+		worker_.stop();
+		camera_->stop();
+		state_ = CameraFlushing;
+	}
+
+	/*
+	 * Now wait for all the in-flight requests to be completed before
+	 * continuing. Stopping the Camera guarantees that all in-flight
+	 * requests are completed in error state.
+	 */
+	{
+		MutexLocker requestsLock(requestsMutex_);
+		flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
+	}
+
+	/*
+	 * Set state to stopped and unlock close() or configureStreams() that
+	 * might be waiting for flush to be completed.
+	 */
 	MutexLocker cameraLock(cameraMutex_);
+	state_ = CameraStopped;
+	flushed_.notify_one();
+}
+
+/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
+void CameraDevice::stop()
+{
+	ASSERT(state_ != CameraFlushing);
+
 	if (state_ == CameraStopped)
 		return;
 
@@ -1581,8 +1630,18 @@  PixelFormat CameraDevice::toPixelFormat(int format) const
  */
 int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
 {
-	/* Before any configuration attempt, stop the camera. */
-	stop();
+	{
+		/*
+		 * If a flush is in progress, wait for it to complete and to
+		 * stop the camera, otherwise before any new configuration
+		 * attempt we have to stop the camera explictely.
+		 */
+		MutexLocker cameraLock(cameraMutex_);
+		if (state_ == CameraFlushing)
+			flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
+		else
+			stop();
+	}
 
 	if (stream_list->num_streams == 0) {
 		LOG(HAL, Error) << "No streams in configuration";
@@ -1950,6 +2009,25 @@  int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
 	if (ret)
 		return ret;
 
+	/*
+	 * Just before queuing the request, make sure flush() has not
+	 * been called after this function has been executed. In that
+	 * case, immediately return the request with errors.
+	 */
+	MutexLocker cameraLock(cameraMutex_);
+	if (state_ == CameraFlushing || state_ == CameraStopped) {
+		for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
+			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
+			buffer.release_fence = buffer.acquire_fence;
+		}
+
+		notifyError(descriptor.frameNumber_,
+			    descriptor.buffers_[0].stream,
+			    CAMERA3_MSG_ERROR_REQUEST);
+
+		return 0;
+	}
+
 	worker_.queueRequest(descriptor.request_.get());
 
 	{
@@ -1979,6 +2057,10 @@  void CameraDevice::requestComplete(Request *request)
 			return;
 		}
 
+		/* Release flush if all the pending requests have been completed. */
+		if (descriptors_.empty())
+			flushing_.notify_one();
+
 		node = descriptors_.extract(it);
 	}
 	Camera3RequestDescriptor &descriptor = node.mapped();
diff --git a/src/android/camera_device.h b/src/android/camera_device.h
index 7cf8e8370387..e1b3bf7d30f2 100644
--- a/src/android/camera_device.h
+++ b/src/android/camera_device.h
@@ -7,6 +7,7 @@ 
 #ifndef __ANDROID_CAMERA_DEVICE_H__
 #define __ANDROID_CAMERA_DEVICE_H__
 
+#include <condition_variable>
 #include <map>
 #include <memory>
 #include <mutex>
@@ -42,6 +43,7 @@  public:
 
 	int open(const hw_module_t *hardwareModule);
 	void close();
+	void flush();
 
 	unsigned int id() const { return id_; }
 	camera3_device_t *camera3Device() { return &camera3Device_; }
@@ -92,6 +94,7 @@  private:
 	enum State {
 		CameraStopped,
 		CameraRunning,
+		CameraFlushing,
 	};
 
 	void stop();
@@ -120,8 +123,9 @@  private:
 
 	CameraWorker worker_;
 
-	libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
+	libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
 	State state_;
+	std::condition_variable flushed_;
 
 	std::shared_ptr<libcamera::Camera> camera_;
 	std::unique_ptr<libcamera::CameraConfiguration> config_;
@@ -134,8 +138,9 @@  private:
 	std::map<int, libcamera::PixelFormat> formatsMap_;
 	std::vector<CameraStream> streams_;
 
-	libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
+	libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
 	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
+	std::condition_variable flushing_;
 
 	std::string maker_;
 	std::string model_;
diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
index 696e80436821..8a3cfa175ff5 100644
--- a/src/android/camera_ops.cpp
+++ b/src/android/camera_ops.cpp
@@ -66,8 +66,14 @@  static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
 {
 }
 
-static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
+static int hal_dev_flush(const struct camera3_device *dev)
 {
+	if (!dev)
+		return -EINVAL;
+
+	CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
+	camera->flush();
+
 	return 0;
 }