Message ID | 20210521154227.60186-9-jacopo@jmondi.org |
---|---|
State | Superseded |
Headers | show |
Series |
|
Related | show |
Hi Jacopo, Thanks for your work. On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > Implement the flush() camera operation in the CameraDevice class > and make it available to the camera framework by implementing the > operation wrapper in camera_ops.cpp. > > The flush() implementation stops the Camera and the worker thread and > waits for all in-flight requests to be returned. Stopping the Camera > forces all Requests already queued to be returned immediately in error > state. As flush() has to wait until all of them have been returned, make it > wait on a newly introduced condition variable which is notified by the > request completion handler when the queue of pending requests has been > exhausted. > > As flush() can race with processCaptureRequest() protect the requests > queueing by introducing a new CameraState::CameraFlushing state that > processCaptureRequest() inspects before queuing the Request to the > Camera. If flush() has been called while processCaptureRequest() was > executing, return the current Request immediately in error state. > > Protect potentially concurrent calls to close() and configureStreams() > by inspecting the CameraState, and force a wait for any flush() call > to complete before proceeding. > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > --- > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > src/android/camera_device.h | 9 +++- > src/android/camera_ops.cpp | 8 +++- > 3 files changed, 100 insertions(+), 7 deletions(-) > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > index 3fce14035718..899afaa49439 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > void CameraDevice::close() > { > - streams_.clear(); > + MutexLocker cameraLock(cameraMutex_); > + if (state_ == CameraFlushing) { > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > + camera_->release(); > > + return; > + } > + > + streams_.clear(); > stop(); > > camera_->release(); > } > > -void CameraDevice::stop() > +/* > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait > + * until all the in-flight requests have been returned before setting the > + * camera state to stopped. > + * > + * Once flushing is done it unlocks concurrent calls to camera close() and > + * configureStreams(). > + */ > +void CameraDevice::flush() > { > + { > + MutexLocker cameraLock(cameraMutex_); > + > + if (state_ != CameraRunning) > + return; > + > + worker_.stop(); > + camera_->stop(); > + state_ = CameraFlushing; > + } > + > + /* > + * Now wait for all the in-flight requests to be completed before > + * continuing. Stopping the Camera guarantees that all in-flight > + * requests are completed in error state. > + */ > + { > + MutexLocker requestsLock(requestsMutex_); > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > + } I'm still uneasy about releasing the cameraMutex_ for this section. In patch 6/8 you add it to protect the state_ variable but here it's ignored. I see the ASSERT() added to stop() but the patter of taking the lock checking state_, releasing the lock and do some work, retake the lock and update state_ feels like a bad idea. Maybe I'm missing something and this is not a real problem, if so maybe we can capture that in the comment here? > + > + /* > + * Set state to stopped and unlock close() or configureStreams() that > + * might be waiting for flush to be completed. > + */ > MutexLocker cameraLock(cameraMutex_); > + state_ = CameraStopped; > + flushed_.notify_one(); > +} > + > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > +void CameraDevice::stop() > +{ > + ASSERT(state_ != CameraFlushing); > + > if (state_ == CameraStopped) > return; > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > */ > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > { > - /* Before any configuration attempt, stop the camera. */ > - stop(); > + { > + /* > + * If a flush is in progress, wait for it to complete and to > + * stop the camera, otherwise before any new configuration > + * attempt we have to stop the camera explictely. > + */ > + MutexLocker cameraLock(cameraMutex_); > + if (state_ == CameraFlushing) > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > + else > + stop(); > + } > > if (stream_list->num_streams == 0) { > LOG(HAL, Error) << "No streams in configuration"; > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > if (ret) > return ret; > > + /* > + * Just before queuing the request, make sure flush() has not > + * been called after this function has been executed. In that > + * case, immediately return the request with errors. > + */ > + MutexLocker cameraLock(cameraMutex_); > + if (state_ == CameraFlushing || state_ == CameraStopped) { > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > + buffer.release_fence = buffer.acquire_fence; > + } > + > + notifyError(descriptor.frameNumber_, > + descriptor.buffers_[0].stream, > + CAMERA3_MSG_ERROR_REQUEST); > + > + return 0; > + } > + > worker_.queueRequest(descriptor.request_.get()); > > { > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > return; > } > > + /* Release flush if all the pending requests have been completed. */ > + if (descriptors_.empty()) > + flushing_.notify_one(); > + > node = descriptors_.extract(it); > } > Camera3RequestDescriptor &descriptor = node.mapped(); > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > index 7cf8e8370387..e1b3bf7d30f2 100644 > --- a/src/android/camera_device.h > +++ b/src/android/camera_device.h > @@ -7,6 +7,7 @@ > #ifndef __ANDROID_CAMERA_DEVICE_H__ > #define __ANDROID_CAMERA_DEVICE_H__ > > +#include <condition_variable> > #include <map> > #include <memory> > #include <mutex> > @@ -42,6 +43,7 @@ public: > > int open(const hw_module_t *hardwareModule); > void close(); > + void flush(); > > unsigned int id() const { return id_; } > camera3_device_t *camera3Device() { return &camera3Device_; } > @@ -92,6 +94,7 @@ private: > enum State { > CameraStopped, > CameraRunning, > + CameraFlushing, > }; > > void stop(); > @@ -120,8 +123,9 @@ private: > > CameraWorker worker_; > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > State state_; > + std::condition_variable flushed_; > > std::shared_ptr<libcamera::Camera> camera_; > std::unique_ptr<libcamera::CameraConfiguration> config_; > @@ -134,8 +138,9 @@ private: > std::map<int, libcamera::PixelFormat> formatsMap_; > std::vector<CameraStream> streams_; > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > + std::condition_variable flushing_; > > std::string maker_; > std::string model_; > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > index 696e80436821..8a3cfa175ff5 100644 > --- a/src/android/camera_ops.cpp > +++ b/src/android/camera_ops.cpp > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > { > } > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > +static int hal_dev_flush(const struct camera3_device *dev) > { > + if (!dev) > + return -EINVAL; > + > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > + camera->flush(); > + > return 0; > } > > -- > 2.31.1 >
Hi Niklas, On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > Hi Jacopo, > > Thanks for your work. > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > Implement the flush() camera operation in the CameraDevice class > > and make it available to the camera framework by implementing the > > operation wrapper in camera_ops.cpp. > > > > The flush() implementation stops the Camera and the worker thread and > > waits for all in-flight requests to be returned. Stopping the Camera > > forces all Requests already queued to be returned immediately in error > > state. As flush() has to wait until all of them have been returned, make it > > wait on a newly introduced condition variable which is notified by the > > request completion handler when the queue of pending requests has been > > exhausted. > > > > As flush() can race with processCaptureRequest() protect the requests > > queueing by introducing a new CameraState::CameraFlushing state that > > processCaptureRequest() inspects before queuing the Request to the > > Camera. If flush() has been called while processCaptureRequest() was > > executing, return the current Request immediately in error state. > > > > Protect potentially concurrent calls to close() and configureStreams() > > by inspecting the CameraState, and force a wait for any flush() call > > to complete before proceeding. > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > --- > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > > src/android/camera_device.h | 9 +++- > > src/android/camera_ops.cpp | 8 +++- > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > index 3fce14035718..899afaa49439 100644 > > --- a/src/android/camera_device.cpp > > +++ b/src/android/camera_device.cpp > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > > > void CameraDevice::close() > > { > > - streams_.clear(); > > + MutexLocker cameraLock(cameraMutex_); > > + if (state_ == CameraFlushing) { > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > + camera_->release(); > > > > + return; > > + } > > + > > + streams_.clear(); > > stop(); > > > > camera_->release(); > > } > > > > -void CameraDevice::stop() > > +/* > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait > > + * until all the in-flight requests have been returned before setting the > > + * camera state to stopped. > > + * > > + * Once flushing is done it unlocks concurrent calls to camera close() and > > + * configureStreams(). > > + */ > > +void CameraDevice::flush() > > { > > + { > > + MutexLocker cameraLock(cameraMutex_); > > + > > + if (state_ != CameraRunning) > > + return; > > + > > + worker_.stop(); > > + camera_->stop(); > > + state_ = CameraFlushing; > > + } > > + > > + /* > > + * Now wait for all the in-flight requests to be completed before > > + * continuing. Stopping the Camera guarantees that all in-flight > > + * requests are completed in error state. > > + */ > > + { > > + MutexLocker requestsLock(requestsMutex_); > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > > + } > > I'm still uneasy about releasing the cameraMutex_ for this section. In > patch 6/8 you add it to protect the state_ variable but here it's I'm not changing state_ without the mutex acquired, am I ? > ignored. I see the ASSERT() added to stop() but the patter of taking the > lock checking state_, releasing the lock and do some work, retake the > lock and update state_ feels like a bad idea. Maybe I'm missing How so, apart from the fact it feels a bit unusual, I concur ? If I keep the held the mutex for the whole duration of flush no other concurrent method can proceed until all the queued requests have not been completed. While flush waits for the flushing_ condition to be signaled, processCaptureRequest() can proceed and immediately return the newly queued requests in error state by detecting state_ == CameraFlushing which signals that flush in is progress. Otherwise it would have had to wait for flush to end. But then we're back to a situation where we could serialize all calls and that's it, we would be done with a single mutex to be held for the whole duration of all operations. If it only was for close() or configureStreams() we could have locked for the whole duration of flush(), as they anyway wait for flush to complete before proceeding (by waiting on the flushed_ condition here below signaled). > something and this is not a real problem, if so maybe we can capture > that in the comment here? > > > + > > + /* > > + * Set state to stopped and unlock close() or configureStreams() that > > + * might be waiting for flush to be completed. > > + */ > > MutexLocker cameraLock(cameraMutex_); > > + state_ = CameraStopped; > > + flushed_.notify_one(); > > +} > > + > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > > +void CameraDevice::stop() > > +{ > > + ASSERT(state_ != CameraFlushing); > > + > > if (state_ == CameraStopped) > > return; > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > > */ > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > { > > - /* Before any configuration attempt, stop the camera. */ > > - stop(); > > + { > > + /* > > + * If a flush is in progress, wait for it to complete and to > > + * stop the camera, otherwise before any new configuration > > + * attempt we have to stop the camera explictely. > > + */ > > + MutexLocker cameraLock(cameraMutex_); > > + if (state_ == CameraFlushing) > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > + else > > + stop(); > > + } > > > > if (stream_list->num_streams == 0) { > > LOG(HAL, Error) << "No streams in configuration"; > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > if (ret) > > return ret; > > > > + /* > > + * Just before queuing the request, make sure flush() has not > > + * been called after this function has been executed. In that > > + * case, immediately return the request with errors. > > + */ > > + MutexLocker cameraLock(cameraMutex_); > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > > + buffer.release_fence = buffer.acquire_fence; > > + } > > + > > + notifyError(descriptor.frameNumber_, > > + descriptor.buffers_[0].stream, > > + CAMERA3_MSG_ERROR_REQUEST); > > + > > + return 0; > > + } > > + > > worker_.queueRequest(descriptor.request_.get()); > > > > { > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > > return; > > } > > > > + /* Release flush if all the pending requests have been completed. */ > > + if (descriptors_.empty()) > > + flushing_.notify_one(); > > + > > node = descriptors_.extract(it); > > } > > Camera3RequestDescriptor &descriptor = node.mapped(); > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > --- a/src/android/camera_device.h > > +++ b/src/android/camera_device.h > > @@ -7,6 +7,7 @@ > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > +#include <condition_variable> > > #include <map> > > #include <memory> > > #include <mutex> > > @@ -42,6 +43,7 @@ public: > > > > int open(const hw_module_t *hardwareModule); > > void close(); > > + void flush(); > > > > unsigned int id() const { return id_; } > > camera3_device_t *camera3Device() { return &camera3Device_; } > > @@ -92,6 +94,7 @@ private: > > enum State { > > CameraStopped, > > CameraRunning, > > + CameraFlushing, > > }; > > > > void stop(); > > @@ -120,8 +123,9 @@ private: > > > > CameraWorker worker_; > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > > State state_; > > + std::condition_variable flushed_; > > > > std::shared_ptr<libcamera::Camera> camera_; > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > @@ -134,8 +138,9 @@ private: > > std::map<int, libcamera::PixelFormat> formatsMap_; > > std::vector<CameraStream> streams_; > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > + std::condition_variable flushing_; > > > > std::string maker_; > > std::string model_; > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > > index 696e80436821..8a3cfa175ff5 100644 > > --- a/src/android/camera_ops.cpp > > +++ b/src/android/camera_ops.cpp > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > > { > > } > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > > +static int hal_dev_flush(const struct camera3_device *dev) > > { > > + if (!dev) > > + return -EINVAL; > > + > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > > + camera->flush(); > > + > > return 0; > > } > > > > -- > > 2.31.1 > > > > -- > Regards, > Niklas Söderlund
Hi Jacopo, Thank you for the patch. On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > Implement the flush() camera operation in the CameraDevice class > > > and make it available to the camera framework by implementing the > > > operation wrapper in camera_ops.cpp. > > > > > > The flush() implementation stops the Camera and the worker thread and > > > waits for all in-flight requests to be returned. Stopping the Camera > > > forces all Requests already queued to be returned immediately in error > > > state. As flush() has to wait until all of them have been returned, make it > > > wait on a newly introduced condition variable which is notified by the > > > request completion handler when the queue of pending requests has been > > > exhausted. > > > > > > As flush() can race with processCaptureRequest() protect the requests > > > queueing by introducing a new CameraState::CameraFlushing state that > > > processCaptureRequest() inspects before queuing the Request to the > > > Camera. If flush() has been called while processCaptureRequest() was > > > executing, return the current Request immediately in error state. > > > > > > Protect potentially concurrent calls to close() and configureStreams() Can this happen ? Quoting camera3.h, * 12. Alternatively, the framework may call camera3_device_t->common->close() * to end the camera session. This may be called at any time when no other * calls from the framework are active, although the call may block until all * in-flight captures have completed (all results returned, all buffers * filled). After the close call returns, no more calls to the * camera3_callback_ops_t functions are allowed from the HAL. Once the * close() call is underway, the framework may not call any other HAL device * functions. The important part is "when no other calss from the framework are active". I don't think we need to handle close() racing with anything else than process_capture_request(). > > > by inspecting the CameraState, and force a wait for any flush() call > > > to complete before proceeding. > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > --- > > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > > > src/android/camera_device.h | 9 +++- > > > src/android/camera_ops.cpp | 8 +++- > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > index 3fce14035718..899afaa49439 100644 > > > --- a/src/android/camera_device.cpp > > > +++ b/src/android/camera_device.cpp > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > > > > > void CameraDevice::close() > > > { > > > - streams_.clear(); > > > + MutexLocker cameraLock(cameraMutex_); I'd add a blank line here. > > > + if (state_ == CameraFlushing) { As mentioned above, I don't think you need to protect against close() and flush() racing each other. > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > + camera_->release(); > > > > > > + return; > > > + } > > > + > > > + streams_.clear(); > > > stop(); > > > > > > camera_->release(); > > > } > > > > > > -void CameraDevice::stop() > > > +/* > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait s/wait/waits/ > > > + * until all the in-flight requests have been returned before setting the > > > + * camera state to stopped. > > > + * > > > + * Once flushing is done it unlocks concurrent calls to camera close() and > > > + * configureStreams(). > > > + */ > > > +void CameraDevice::flush() > > > { > > > + { > > > + MutexLocker cameraLock(cameraMutex_); > > > + > > > + if (state_ != CameraRunning) > > > + return; > > > + > > > + worker_.stop(); > > > + camera_->stop(); > > > + state_ = CameraFlushing; > > > + } > > > + > > > + /* > > > + * Now wait for all the in-flight requests to be completed before > > > + * continuing. Stopping the Camera guarantees that all in-flight > > > + * requests are completed in error state. Do we need to wait ? Camera::stop() guarantees that all requests complete synchronously with the stop() call. Partly answering myself here, we'll have to wait for post-processing tasks to complete once we'll process them in a separate thread, but that will likely be handled by Thread::wait(). I don't think you need a condition variable here. I'm I'm not mistaken, this should simplify the implementation. > > > + */ > > > + { > > > + MutexLocker requestsLock(requestsMutex_); > > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > > > + } > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In > > patch 6/8 you add it to protect the state_ variable but here it's > > I'm not changing state_ without the mutex acquired, am I ? > > > ignored. I see the ASSERT() added to stop() but the patter of taking the > > lock checking state_, releasing the lock and do some work, retake the > > lock and update state_ feels like a bad idea. Maybe I'm missing > > How so, apart from the fact it feels a bit unusual, I concur ? > > If I keep the held the mutex for the whole duration of flush no other > concurrent method can proceed until all the queued requests have not > been completed. While flush waits for the flushing_ condition to be > signaled, processCaptureRequest() can proceed and immediately return > the newly queued requests in error state by detecting state_ == > CameraFlushing which signals that flush in is progress. > Otherwise it would have had to wait for flush to end. But then we're back > to a situation where we could serialize all calls and that's it, we > would be done with a single mutex to be held for the whole duration of > all operations. > > If it only was for close() or configureStreams() we could have locked > for the whole duration of flush(), as they anyway wait for flush to > complete before proceeding (by waiting on the flushed_ condition here > below signaled). > > > something and this is not a real problem, if so maybe we can capture > > that in the comment here? > > > > > + > > > + /* > > > + * Set state to stopped and unlock close() or configureStreams() that > > > + * might be waiting for flush to be completed. > > > + */ > > > MutexLocker cameraLock(cameraMutex_); > > > + state_ = CameraStopped; > > > + flushed_.notify_one(); You should drop the lock before calling notify_one(). Otherwise you'll wake up the task waiting on flushed_, which will try to lock cameraMutex_, which will block immediately. The scheduler will have to reschedule this task for the function to return and the lock to be released before the waiter can proceed. That works, but isn't very efficient. { MutexLocker cameraLock(cameraMutex_); state_ = CameraStopped; } flushed_.notify_one(); > > > +} > > > + > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > > > +void CameraDevice::stop() > > > +{ > > > + ASSERT(state_ != CameraFlushing); > > > + > > > if (state_ == CameraStopped) > > > return; > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > > > */ > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > > { > > > - /* Before any configuration attempt, stop the camera. */ > > > - stop(); > > > + { > > > + /* > > > + * If a flush is in progress, wait for it to complete and to > > > + * stop the camera, otherwise before any new configuration > > > + * attempt we have to stop the camera explictely. > > > + */ Same here, I don't think flush() and configure_streams() can race each other. I believe the only possible race to be between flush() and process_capture_request(). > > > + MutexLocker cameraLock(cameraMutex_); > > > + if (state_ == CameraFlushing) > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > + else > > > + stop(); > > > + } > > > > > > if (stream_list->num_streams == 0) { > > > LOG(HAL, Error) << "No streams in configuration"; > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > if (ret) > > > return ret; > > > > > > + /* > > > + * Just before queuing the request, make sure flush() has not > > > + * been called after this function has been executed. In that > > > + * case, immediately return the request with errors. > > > + */ > > > + MutexLocker cameraLock(cameraMutex_); > > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > > > + buffer.release_fence = buffer.acquire_fence; > > > + } > > > + > > > + notifyError(descriptor.frameNumber_, > > > + descriptor.buffers_[0].stream, As commented on a previous patch, I think you should pass nullptr for the stream here. > > > + CAMERA3_MSG_ERROR_REQUEST); > > > + > > > + return 0; > > > + } > > > + > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > { > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > > > return; > > > } > > > > > > + /* Release flush if all the pending requests have been completed. */ > > > + if (descriptors_.empty()) > > > + flushing_.notify_one(); This will never happen, as you can only get here if descriptors_.find() has found the descriptor. Did you mean to do this after the extract() call below ? > > > + > > > node = descriptors_.extract(it); > > > } > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > --- a/src/android/camera_device.h > > > +++ b/src/android/camera_device.h > > > @@ -7,6 +7,7 @@ > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > +#include <condition_variable> > > > #include <map> > > > #include <memory> > > > #include <mutex> > > > @@ -42,6 +43,7 @@ public: > > > > > > int open(const hw_module_t *hardwareModule); > > > void close(); > > > + void flush(); > > > > > > unsigned int id() const { return id_; } > > > camera3_device_t *camera3Device() { return &camera3Device_; } > > > @@ -92,6 +94,7 @@ private: > > > enum State { > > > CameraStopped, > > > CameraRunning, > > > + CameraFlushing, > > > }; > > > > > > void stop(); > > > @@ -120,8 +123,9 @@ private: > > > > > > CameraWorker worker_; > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > > > State state_; > > > + std::condition_variable flushed_; > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > @@ -134,8 +138,9 @@ private: > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > std::vector<CameraStream> streams_; > > > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > > + std::condition_variable flushing_; > > > > > > std::string maker_; > > > std::string model_; > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > > > index 696e80436821..8a3cfa175ff5 100644 > > > --- a/src/android/camera_ops.cpp > > > +++ b/src/android/camera_ops.cpp > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > > > { > > > } > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > { > > > + if (!dev) > > > + return -EINVAL; > > > + > > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > > > + camera->flush(); > > > + > > > return 0; > > > } > > >
Hi Laurent, On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote: > Hi Jacopo, > > Thank you for the patch. > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > > Implement the flush() camera operation in the CameraDevice class > > > > and make it available to the camera framework by implementing the > > > > operation wrapper in camera_ops.cpp. > > > > > > > > The flush() implementation stops the Camera and the worker thread and > > > > waits for all in-flight requests to be returned. Stopping the Camera > > > > forces all Requests already queued to be returned immediately in error > > > > state. As flush() has to wait until all of them have been returned, make it > > > > wait on a newly introduced condition variable which is notified by the > > > > request completion handler when the queue of pending requests has been > > > > exhausted. > > > > > > > > As flush() can race with processCaptureRequest() protect the requests > > > > queueing by introducing a new CameraState::CameraFlushing state that > > > > processCaptureRequest() inspects before queuing the Request to the > > > > Camera. If flush() has been called while processCaptureRequest() was > > > > executing, return the current Request immediately in error state. > > > > > > > > Protect potentially concurrent calls to close() and configureStreams() > > Can this happen ? Quoting camera3.h, > > * 12. Alternatively, the framework may call camera3_device_t->common->close() > * to end the camera session. This may be called at any time when no other > * calls from the framework are active, although the call may block until all > * in-flight captures have completed (all results returned, all buffers > * filled). After the close call returns, no more calls to the > * camera3_callback_ops_t functions are allowed from the HAL. Once the > * close() call is underway, the framework may not call any other HAL device > * functions. > > The important part is "when no other calss from the framework are > active". I don't think we need to handle close() racing with anything > else than process_capture_request(). I've been discussing this with Hiro during v1, as initially I didn't consider close() and configureStreams(). https://patchwork.libcamera.org/patch/12248/#16884 I initially only considered processCaptureRequest() as a potential race, but got suggested differently by the cros camera team. > > > > > by inspecting the CameraState, and force a wait for any flush() call > > > > to complete before proceeding. > > > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > > --- > > > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > > > > src/android/camera_device.h | 9 +++- > > > > src/android/camera_ops.cpp | 8 +++- > > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > > index 3fce14035718..899afaa49439 100644 > > > > --- a/src/android/camera_device.cpp > > > > +++ b/src/android/camera_device.cpp > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > > > > > > > void CameraDevice::close() > > > > { > > > > - streams_.clear(); > > > > + MutexLocker cameraLock(cameraMutex_); > > I'd add a blank line here. > > > > > + if (state_ == CameraFlushing) { > > As mentioned above, I don't think you need to protect against close() > and flush() racing each other. > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > + camera_->release(); > > > > > > > > + return; > > > > + } > > > > + > > > > + streams_.clear(); > > > > stop(); > > > > > > > > camera_->release(); > > > > } > > > > > > > > -void CameraDevice::stop() > > > > +/* > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait > > s/wait/waits/ > > > > > + * until all the in-flight requests have been returned before setting the > > > > + * camera state to stopped. > > > > + * > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and > > > > + * configureStreams(). > > > > + */ > > > > +void CameraDevice::flush() > > > > { > > > > + { > > > > + MutexLocker cameraLock(cameraMutex_); > > > > + > > > > + if (state_ != CameraRunning) > > > > + return; > > > > + > > > > + worker_.stop(); > > > > + camera_->stop(); > > > > + state_ = CameraFlushing; > > > > + } > > > > + > > > > + /* > > > > + * Now wait for all the in-flight requests to be completed before > > > > + * continuing. Stopping the Camera guarantees that all in-flight > > > > + * requests are completed in error state. > > Do we need to wait ? Camera::stop() guarantees that all requests > complete synchronously with the stop() call. I didn't get the API that way... I thought after stop we would receive a sequence of failed requests... Actually I don't see anything that suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion in Camera::stop() > > Partly answering myself here, we'll have to wait for post-processing > tasks to complete once we'll process them in a separate thread, but that > will likely be handled by Thread::wait(). I don't think you need a > condition variable here. I'm I'm not mistaken, this should simplify the > implementation. If Camera::stop() is synchronous we don't need to wait indeed > > > > > + */ > > > > + { > > > > + MutexLocker requestsLock(requestsMutex_); > > > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > > > > + } > > > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In > > > patch 6/8 you add it to protect the state_ variable but here it's > > > > I'm not changing state_ without the mutex acquired, am I ? > > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the > > > lock checking state_, releasing the lock and do some work, retake the > > > lock and update state_ feels like a bad idea. Maybe I'm missing > > > > How so, apart from the fact it feels a bit unusual, I concur ? > > > > If I keep the held the mutex for the whole duration of flush no other > > concurrent method can proceed until all the queued requests have not > > been completed. While flush waits for the flushing_ condition to be > > signaled, processCaptureRequest() can proceed and immediately return > > the newly queued requests in error state by detecting state_ == > > CameraFlushing which signals that flush in is progress. > > Otherwise it would have had to wait for flush to end. But then we're back > > to a situation where we could serialize all calls and that's it, we > > would be done with a single mutex to be held for the whole duration of > > all operations. > > > > If it only was for close() or configureStreams() we could have locked > > for the whole duration of flush(), as they anyway wait for flush to > > complete before proceeding (by waiting on the flushed_ condition here > > below signaled). > > > > > something and this is not a real problem, if so maybe we can capture > > > that in the comment here? > > > > > > > + > > > > + /* > > > > + * Set state to stopped and unlock close() or configureStreams() that > > > > + * might be waiting for flush to be completed. > > > > + */ > > > > MutexLocker cameraLock(cameraMutex_); > > > > + state_ = CameraStopped; > > > > + flushed_.notify_one(); > > You should drop the lock before calling notify_one(). Otherwise you'll > wake up the task waiting on flushed_, which will try to lock > cameraMutex_, which will block immediately. The scheduler will have to > reschedule this task for the function to return and the lock to be > released before the waiter can proceed. That works, but isn't very > efficient. Weird, the cpp reference shows example about notify_one where the caller always has the mutex held locked, but I see your point and seems correct.. > > { > MutexLocker cameraLock(cameraMutex_); > state_ = CameraStopped; > } > > flushed_.notify_one(); > So I could change to this one, if I don't have to drop this part completely if we consider close() and configureStreams() not as possible races... > > > > +} > > > > + > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > > > > +void CameraDevice::stop() > > > > +{ > > > > + ASSERT(state_ != CameraFlushing); > > > > + > > > > if (state_ == CameraStopped) > > > > return; > > > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > > > > */ > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > > > { > > > > - /* Before any configuration attempt, stop the camera. */ > > > > - stop(); > > > > + { > > > > + /* > > > > + * If a flush is in progress, wait for it to complete and to > > > > + * stop the camera, otherwise before any new configuration > > > > + * attempt we have to stop the camera explictely. > > > > + */ > > Same here, I don't think flush() and configure_streams() can race each > other. I believe the only possible race to be between flush() and > process_capture_request(). > Ditto. > > > > + MutexLocker cameraLock(cameraMutex_); > > > > + if (state_ == CameraFlushing) > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > + else > > > > + stop(); > > > > + } > > > > > > > > if (stream_list->num_streams == 0) { > > > > LOG(HAL, Error) << "No streams in configuration"; > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > if (ret) > > > > return ret; > > > > > > > > + /* > > > > + * Just before queuing the request, make sure flush() has not > > > > + * been called after this function has been executed. In that > > > > + * case, immediately return the request with errors. > > > > + */ > > > > + MutexLocker cameraLock(cameraMutex_); > > > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > > > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > > > > + buffer.release_fence = buffer.acquire_fence; > > > > + } > > > > + > > > > + notifyError(descriptor.frameNumber_, > > > > + descriptor.buffers_[0].stream, > > As commented on a previous patch, I think you should pass nullptr for > the stream here. > The "S6. Error management:" section of the camera3.h header does not mention that, not the ? where does you suggestion come from ? I don't find any reference in the review of [1/8] > > > > + CAMERA3_MSG_ERROR_REQUEST); > > > > + > > > > + return 0; > > > > + } > > > > + > > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > > > { > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > > > > return; > > > > } > > > > > > > > + /* Release flush if all the pending requests have been completed. */ > > > > + if (descriptors_.empty()) > > > > + flushing_.notify_one(); > > This will never happen, as you can only get here if descriptors_.find() > has found the descriptor. Did you mean to do this after the extract() > call below ? Ugh. This works only because Camera::stop() is synchronous then ? > > > > > + > > > > node = descriptors_.extract(it); > > > > } > > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > > --- a/src/android/camera_device.h > > > > +++ b/src/android/camera_device.h > > > > @@ -7,6 +7,7 @@ > > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > > > +#include <condition_variable> > > > > #include <map> > > > > #include <memory> > > > > #include <mutex> > > > > @@ -42,6 +43,7 @@ public: > > > > > > > > int open(const hw_module_t *hardwareModule); > > > > void close(); > > > > + void flush(); > > > > > > > > unsigned int id() const { return id_; } > > > > camera3_device_t *camera3Device() { return &camera3Device_; } > > > > @@ -92,6 +94,7 @@ private: > > > > enum State { > > > > CameraStopped, > > > > CameraRunning, > > > > + CameraFlushing, > > > > }; > > > > > > > > void stop(); > > > > @@ -120,8 +123,9 @@ private: > > > > > > > > CameraWorker worker_; > > > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > > > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > > > > State state_; > > > > + std::condition_variable flushed_; > > > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > > @@ -134,8 +138,9 @@ private: > > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > > std::vector<CameraStream> streams_; > > > > > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > > > + std::condition_variable flushing_; > > > > > > > > std::string maker_; > > > > std::string model_; > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > > > > index 696e80436821..8a3cfa175ff5 100644 > > > > --- a/src/android/camera_ops.cpp > > > > +++ b/src/android/camera_ops.cpp > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > > > > { > > > > } > > > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > > { > > > > + if (!dev) > > > > + return -EINVAL; > > > > + > > > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > > > > + camera->flush(); > > > > + > > > > return 0; > > > > } > > > > > > -- > Regards, > > Laurent Pinchart
Hi Jacopo, thank you for the patch. On Mon, May 24, 2021 at 4:47 PM Jacopo Mondi <jacopo@jmondi.org> wrote: > Hi Laurent, > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote: > > Hi Jacopo, > > > > Thank you for the patch. > > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > > > Implement the flush() camera operation in the CameraDevice class > > > > > and make it available to the camera framework by implementing the > > > > > operation wrapper in camera_ops.cpp. > > > > > > > > > > The flush() implementation stops the Camera and the worker thread > and > > > > > waits for all in-flight requests to be returned. Stopping the > Camera > > > > > forces all Requests already queued to be returned immediately in > error > > > > > state. As flush() has to wait until all of them have been > returned, make it > > > > > wait on a newly introduced condition variable which is notified by > the > > > > > request completion handler when the queue of pending requests has > been > > > > > exhausted. > > > > > > > > > > As flush() can race with processCaptureRequest() protect the > requests > > > > > queueing by introducing a new CameraState::CameraFlushing state > that > > > > > processCaptureRequest() inspects before queuing the Request to the > > > > > Camera. If flush() has been called while processCaptureRequest() > was > > > > > executing, return the current Request immediately in error state. > > > > > > > > > > Protect potentially concurrent calls to close() and > configureStreams() > > > > Can this happen ? Quoting camera3.h, > > > > * 12. Alternatively, the framework may call > camera3_device_t->common->close() > > * to end the camera session. This may be called at any time when no > other > > * calls from the framework are active, although the call may block > until all > > * in-flight captures have completed (all results returned, all > buffers > > * filled). After the close call returns, no more calls to the > > * camera3_callback_ops_t functions are allowed from the HAL. Once the > > * close() call is underway, the framework may not call any other HAL > device > > * functions. > > > > The important part is "when no other calss from the framework are > > active". I don't think we need to handle close() racing with anything > > else than process_capture_request(). > > I've been discussing this with Hiro during v1, as initially I didn't > consider close() and configureStreams(). > > https://patchwork.libcamera.org/patch/12248/#16884 > > I initially only considered processCaptureRequest() as a potential > race, but got suggested differently by the cros camera team. > > > > > > > > > by inspecting the CameraState, and force a wait for any flush() > call > > > > > to complete before proceeding. > > > > > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > > > --- > > > > > src/android/camera_device.cpp | 90 > +++++++++++++++++++++++++++++++++-- > > > > > src/android/camera_device.h | 9 +++- > > > > > src/android/camera_ops.cpp | 8 +++- > > > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/src/android/camera_device.cpp > b/src/android/camera_device.cpp > > > > > index 3fce14035718..899afaa49439 100644 > > > > > --- a/src/android/camera_device.cpp > > > > > +++ b/src/android/camera_device.cpp > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t > *hardwareModule) > > > > > > > > > > void CameraDevice::close() > > > > > { > > > > > - streams_.clear(); > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > I'd add a blank line here. > > > > > > > + if (state_ == CameraFlushing) { > > > > As mentioned above, I don't think you need to protect against close() > > and flush() racing each other. > > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != > CameraStopped; }); > > > > > + camera_->release(); > > > > > > > > > > + return; > > > > > + } > > > > > + > > > > > + streams_.clear(); > > > > > stop(); > > > > > > > > > > camera_->release(); > > > > > } > > > > > > > > > > -void CameraDevice::stop() > > > > > +/* > > > > > + * Flush is similar to stop() but sets the camera state to > 'flushing' and wait > > > > s/wait/waits/ > > > > > > > + * until all the in-flight requests have been returned before > setting the > > > > > + * camera state to stopped. > > > > > + * > > > > > + * Once flushing is done it unlocks concurrent calls to camera > close() and > > > > > + * configureStreams(). > > > > > + */ > > > > > +void CameraDevice::flush() > > > > > { > > > > > + { > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > + > > > > > + if (state_ != CameraRunning) > > > > > + return; > > > > > + > > > > > + worker_.stop(); > > > > > + camera_->stop(); > > > > > + state_ = CameraFlushing; > > > > > + } > > > > > + > > > > > + /* > > > > > + * Now wait for all the in-flight requests to be completed > before > > > > > + * continuing. Stopping the Camera guarantees that all > in-flight > > > > > + * requests are completed in error state. > > > > Do we need to wait ? Camera::stop() guarantees that all requests > > complete synchronously with the stop() call. > > I didn't get the API that way... I thought after stop we would receive > a sequence of failed requests... Actually I don't see anything that > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion > in Camera::stop() > > > > > Partly answering myself here, we'll have to wait for post-processing > > tasks to complete once we'll process them in a separate thread, but that > > will likely be handled by Thread::wait(). I don't think you need a > > condition variable here. I'm I'm not mistaken, this should simplify the > > implementation. > > If Camera::stop() is synchronous we don't need to wait indeed > > > > > > > > + */ > > > > > + { > > > > > + MutexLocker requestsLock(requestsMutex_); > > > > > + flushing_.wait(requestsLock, [&] { return > descriptors_.empty(); }); > > > > > + } > > > > > > > > I'm still uneasy about releasing the cameraMutex_ for this section. > In > > > > patch 6/8 you add it to protect the state_ variable but here it's > > > > > > I'm not changing state_ without the mutex acquired, am I ? > > > > > > > ignored. I see the ASSERT() added to stop() but the patter of taking > the > > > > lock checking state_, releasing the lock and do some work, retake the > > > > lock and update state_ feels like a bad idea. Maybe I'm missing > > > > > > How so, apart from the fact it feels a bit unusual, I concur ? > > > > > > If I keep the held the mutex for the whole duration of flush no other > > > concurrent method can proceed until all the queued requests have not > > > been completed. While flush waits for the flushing_ condition to be > > > signaled, processCaptureRequest() can proceed and immediately return > > > the newly queued requests in error state by detecting state_ == > > > CameraFlushing which signals that flush in is progress. > > > Otherwise it would have had to wait for flush to end. But then we're > back > > > to a situation where we could serialize all calls and that's it, we > > > would be done with a single mutex to be held for the whole duration of > > > all operations. > > > > > > If it only was for close() or configureStreams() we could have locked > > > for the whole duration of flush(), as they anyway wait for flush to > > > complete before proceeding (by waiting on the flushed_ condition here > > > below signaled). > > > > > > > something and this is not a real problem, if so maybe we can capture > > > > that in the comment here? > > > > > > > > > + > > > > > + /* > > > > > + * Set state to stopped and unlock close() or > configureStreams() that > > > > > + * might be waiting for flush to be completed. > > > > > + */ > > > > > MutexLocker cameraLock(cameraMutex_); > > > > > + state_ = CameraStopped; > > > > > + flushed_.notify_one(); > > > > You should drop the lock before calling notify_one(). Otherwise you'll > > wake up the task waiting on flushed_, which will try to lock > > cameraMutex_, which will block immediately. The scheduler will have to > > reschedule this task for the function to return and the lock to be > > released before the waiter can proceed. That works, but isn't very > > efficient. > > Weird, the cpp reference shows example about notify_one where the > caller always has the mutex held locked, but I see your point and > seems correct.. > > This is correct. https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one > The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); I know that, but I haven't pointed it out because it is not false. From the Laurent explanation, yeah, we should definitely avoid that. TIL. > > > > { > > MutexLocker cameraLock(cameraMutex_); > > state_ = CameraStopped; > > } > > > > flushed_.notify_one(); > > > > So I could change to this one, if I don't have to drop this part > completely if we consider close() and configureStreams() not as > possible races... > > > > > > +} > > > > > + > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held > by the caller. */ > > > > > +void CameraDevice::stop() > > > > > +{ > > > > > + ASSERT(state_ != CameraFlushing); > > > > > + > > > > > if (state_ == CameraStopped) > > > > > return; > > > > > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int > format) const > > > > > */ > > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t > *stream_list) > > > > > { > > > > > - /* Before any configuration attempt, stop the camera. */ > > > > > - stop(); > > > > > + { > > > > > + /* > > > > > + * If a flush is in progress, wait for it to > complete and to > > > > > + * stop the camera, otherwise before any new > configuration > > > > > + * attempt we have to stop the camera explictely. > > > > > + */ > > > > Same here, I don't think flush() and configure_streams() can race each > > other. I believe the only possible race to be between flush() and > > process_capture_request(). > > > > Ditto. > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > + if (state_ == CameraFlushing) > > > > > + flushed_.wait(cameraLock, [&] { return > state_ != CameraStopped; }); > > > > > + else > > > > > + stop(); > > > > > + } > > > > > > > > > > if (stream_list->num_streams == 0) { > > > > > LOG(HAL, Error) << "No streams in configuration"; > > > > > @@ -1950,6 +2009,25 @@ int > CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > > if (ret) > > > > > return ret; > > > > > > > > > > + /* > > > > > + * Just before queuing the request, make sure flush() has > not > > > > > + * been called after this function has been executed. In > that > > > > > + * case, immediately return the request with errors. > > > > > + */ > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > > > > + for (camera3_stream_buffer_t &buffer : > descriptor.buffers_) { > > > > > + buffer.status = > CAMERA3_BUFFER_STATUS_ERROR; > > > > > + buffer.release_fence = > buffer.acquire_fence; > > > > > + } > > > > > + > > > > > + notifyError(descriptor.frameNumber_, > > > > > + descriptor.buffers_[0].stream, > > > > As commented on a previous patch, I think you should pass nullptr for > > the stream here. > > > > The "S6. Error management:" section of the camera3.h header does not > mention that, not the ? where does you suggestion come from ? I don't find > any reference in the review of [1/8] > > > > > > > + CAMERA3_MSG_ERROR_REQUEST); > > > > > + > > > > > + return 0; > > > > > + } > > > > > + > > > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > > > > > { > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request > *request) > > > > > return; > > > > > } > > > > > > > > > > + /* Release flush if all the pending requests have > been completed. */ > > > > > + if (descriptors_.empty()) > > > > > + flushing_.notify_one(); > > > > This will never happen, as you can only get here if descriptors_.find() > > has found the descriptor. Did you mean to do this after the extract() > > call below ? > > Ugh. This works only because Camera::stop() is synchronous then ? > > Ah, good catch! With this fix, the code looks good to me. I am happy to ongoingly join the discussion. Reviewed-by: Hirokazu Honda <hiroh@chromium.org> > > > > > > > + > > > > > node = descriptors_.extract(it); > > > > > } > > > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > > > diff --git a/src/android/camera_device.h > b/src/android/camera_device.h > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > > > --- a/src/android/camera_device.h > > > > > +++ b/src/android/camera_device.h > > > > > @@ -7,6 +7,7 @@ > > > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > > > > > +#include <condition_variable> > > > > > #include <map> > > > > > #include <memory> > > > > > #include <mutex> > > > > > @@ -42,6 +43,7 @@ public: > > > > > > > > > > int open(const hw_module_t *hardwareModule); > > > > > void close(); > > > > > + void flush(); > > > > > > > > > > unsigned int id() const { return id_; } > > > > > camera3_device_t *camera3Device() { return > &camera3Device_; } > > > > > @@ -92,6 +94,7 @@ private: > > > > > enum State { > > > > > CameraStopped, > > > > > CameraRunning, > > > > > + CameraFlushing, > > > > > }; > > > > > > > > > > void stop(); > > > > > @@ -120,8 +123,9 @@ private: > > > > > > > > > > CameraWorker worker_; > > > > > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the > camera state. */ > > > > > + libcamera::Mutex cameraMutex_; /* Protects the camera > state and flushed_. */ > > > > > State state_; > > > > > + std::condition_variable flushed_; > > > > > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > > > @@ -134,8 +138,9 @@ private: > > > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > > > std::vector<CameraStream> streams_; > > > > > > > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. > */ > > > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ > and flushing_. */ > > > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > > > > + std::condition_variable flushing_; > > > > > > > > > > std::string maker_; > > > > > std::string model_; > > > > > diff --git a/src/android/camera_ops.cpp > b/src/android/camera_ops.cpp > > > > > index 696e80436821..8a3cfa175ff5 100644 > > > > > --- a/src/android/camera_ops.cpp > > > > > +++ b/src/android/camera_ops.cpp > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const > struct camera3_device *dev, > > > > > { > > > > > } > > > > > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct > camera3_device *dev) > > > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > > > { > > > > > + if (!dev) > > > > > + return -EINVAL; > > > > > + > > > > > + CameraDevice *camera = reinterpret_cast<CameraDevice > *>(dev->priv); > > > > > + camera->flush(); > > > > > + > > > > > return 0; > > > > > } > > > > > > > > > -- > > Regards, > > > > Laurent Pinchart >
Hi Jacopo, (expanding the CC list to finalize the race conditions discussion) On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote: > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote: > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > > > Implement the flush() camera operation in the CameraDevice class > > > > > and make it available to the camera framework by implementing the > > > > > operation wrapper in camera_ops.cpp. > > > > > > > > > > The flush() implementation stops the Camera and the worker thread and > > > > > waits for all in-flight requests to be returned. Stopping the Camera > > > > > forces all Requests already queued to be returned immediately in error > > > > > state. As flush() has to wait until all of them have been returned, make it > > > > > wait on a newly introduced condition variable which is notified by the > > > > > request completion handler when the queue of pending requests has been > > > > > exhausted. > > > > > > > > > > As flush() can race with processCaptureRequest() protect the requests > > > > > queueing by introducing a new CameraState::CameraFlushing state that > > > > > processCaptureRequest() inspects before queuing the Request to the > > > > > Camera. If flush() has been called while processCaptureRequest() was > > > > > executing, return the current Request immediately in error state. > > > > > > > > > > Protect potentially concurrent calls to close() and configureStreams() > > > > Can this happen ? Quoting camera3.h, > > > > * 12. Alternatively, the framework may call camera3_device_t->common->close() > > * to end the camera session. This may be called at any time when no other > > * calls from the framework are active, although the call may block until all > > * in-flight captures have completed (all results returned, all buffers > > * filled). After the close call returns, no more calls to the > > * camera3_callback_ops_t functions are allowed from the HAL. Once the > > * close() call is underway, the framework may not call any other HAL device > > * functions. > > > > The important part is "when no other calss from the framework are > > active". I don't think we need to handle close() racing with anything > > else than process_capture_request(). > > I've been discussing this with Hiro during v1, as initially I didn't > consider close() and configureStreams(). > > https://patchwork.libcamera.org/patch/12248/#16884 > > I initially only considered processCaptureRequest() as a potential > race, but got suggested differently by the cros camera team. Let's try to get to the bottom of this. Section S2 ("Startup and general expected operation sequence") states: * 12. Alternatively, the framework may call camera3_device_t->common->close() * to end the camera session. This may be called at any time when no other * calls from the framework are active, although the call may block until all * in-flight captures have completed (all results returned, all buffers * filled). After the close call returns, no more calls to the * camera3_callback_ops_t functions are allowed from the HAL. Once the * close() call is underway, the framework may not call any other HAL device * functions. There can be in-flight requests when .close() is called, but it can't be called concurrently with any other call. There's thus no race condition to protect against. The .configure_streams() documentation states: * Preconditions: * * The framework will only call this method when no captures are being * processed. That is, all results have been returned to the framework, and * all in-flight input and output buffers have been returned and their * release sync fences have been signaled by the HAL. The framework will not * submit new requests for capture while the configure_streams() call is * underway. This clearly forbids calling .configure_streams() and .process_capture_request() concurrently. The .flush() documentation states: * Flush all currently in-process captures and all buffers in the pipeline * on the given device. The framework will use this to dump all state as * quickly as possible in order to prepare for a configure_streams() call. I interpret this as at least a very strong hint that .flush() and .configure_streams() can't be called concurrently :-) If anyone disagrees, I'd like compelling evidence that those races can occur. > > > > > by inspecting the CameraState, and force a wait for any flush() call > > > > > to complete before proceeding. > > > > > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > > > --- > > > > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > > > > > src/android/camera_device.h | 9 +++- > > > > > src/android/camera_ops.cpp | 8 +++- > > > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > > > index 3fce14035718..899afaa49439 100644 > > > > > --- a/src/android/camera_device.cpp > > > > > +++ b/src/android/camera_device.cpp > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > > > > > > > > > void CameraDevice::close() > > > > > { > > > > > - streams_.clear(); > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > I'd add a blank line here. > > > > > > > + if (state_ == CameraFlushing) { > > > > As mentioned above, I don't think you need to protect against close() > > and flush() racing each other. > > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > > + camera_->release(); > > > > > > > > > > + return; > > > > > + } > > > > > + > > > > > + streams_.clear(); > > > > > stop(); > > > > > > > > > > camera_->release(); > > > > > } > > > > > > > > > > -void CameraDevice::stop() > > > > > +/* > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait > > > > s/wait/waits/ > > > > > > > + * until all the in-flight requests have been returned before setting the > > > > > + * camera state to stopped. > > > > > + * > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and > > > > > + * configureStreams(). > > > > > + */ > > > > > +void CameraDevice::flush() > > > > > { > > > > > + { > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > + > > > > > + if (state_ != CameraRunning) > > > > > + return; > > > > > + > > > > > + worker_.stop(); > > > > > + camera_->stop(); > > > > > + state_ = CameraFlushing; > > > > > + } > > > > > + > > > > > + /* > > > > > + * Now wait for all the in-flight requests to be completed before > > > > > + * continuing. Stopping the Camera guarantees that all in-flight > > > > > + * requests are completed in error state. > > > > Do we need to wait ? Camera::stop() guarantees that all requests > > complete synchronously with the stop() call. > > I didn't get the API that way... I thought after stop we would receive > a sequence of failed requests... Actually I don't see anything that > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion > in Camera::stop() The camera::stop() documentation states * This method stops capturing and processing requests immediately. All pending * requests are cancelled and complete synchronously in an error state. Is this ambiguous ? > > Partly answering myself here, we'll have to wait for post-processing > > tasks to complete once we'll process them in a separate thread, but that > > will likely be handled by Thread::wait(). I don't think you need a > > condition variable here. I'm I'm not mistaken, this should simplify the > > implementation. > > If Camera::stop() is synchronous we don't need to wait indeed > > > > > > + */ > > > > > + { > > > > > + MutexLocker requestsLock(requestsMutex_); > > > > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > > > > > + } > > > > > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In > > > > patch 6/8 you add it to protect the state_ variable but here it's > > > > > > I'm not changing state_ without the mutex acquired, am I ? > > > > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the > > > > lock checking state_, releasing the lock and do some work, retake the > > > > lock and update state_ feels like a bad idea. Maybe I'm missing > > > > > > How so, apart from the fact it feels a bit unusual, I concur ? > > > > > > If I keep the held the mutex for the whole duration of flush no other > > > concurrent method can proceed until all the queued requests have not > > > been completed. While flush waits for the flushing_ condition to be > > > signaled, processCaptureRequest() can proceed and immediately return > > > the newly queued requests in error state by detecting state_ == > > > CameraFlushing which signals that flush in is progress. > > > Otherwise it would have had to wait for flush to end. But then we're back > > > to a situation where we could serialize all calls and that's it, we > > > would be done with a single mutex to be held for the whole duration of > > > all operations. > > > > > > If it only was for close() or configureStreams() we could have locked > > > for the whole duration of flush(), as they anyway wait for flush to > > > complete before proceeding (by waiting on the flushed_ condition here > > > below signaled). > > > > > > > something and this is not a real problem, if so maybe we can capture > > > > that in the comment here? > > > > > > > > > + > > > > > + /* > > > > > + * Set state to stopped and unlock close() or configureStreams() that > > > > > + * might be waiting for flush to be completed. > > > > > + */ > > > > > MutexLocker cameraLock(cameraMutex_); > > > > > + state_ = CameraStopped; > > > > > + flushed_.notify_one(); > > > > You should drop the lock before calling notify_one(). Otherwise you'll > > wake up the task waiting on flushed_, which will try to lock > > cameraMutex_, which will block immediately. The scheduler will have to > > reschedule this task for the function to return and the lock to be > > released before the waiter can proceed. That works, but isn't very > > efficient. > > Weird, the cpp reference shows example about notify_one where the > caller always has the mutex held locked, but I see your point and > seems correct.. I'm looking at https://en.cppreference.com/w/cpp/thread/condition_variable and https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one and both calls to notify_one() in the example are made without the lock held, aren't they ? > > > > { > > MutexLocker cameraLock(cameraMutex_); > > state_ = CameraStopped; > > } > > > > flushed_.notify_one(); > > > > So I could change to this one, if I don't have to drop this part > completely if we consider close() and configureStreams() not as > possible races... > > > > > > +} > > > > > + > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > > > > > +void CameraDevice::stop() > > > > > +{ > > > > > + ASSERT(state_ != CameraFlushing); > > > > > + > > > > > if (state_ == CameraStopped) > > > > > return; > > > > > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > > > > > */ > > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > > > > { > > > > > - /* Before any configuration attempt, stop the camera. */ > > > > > - stop(); > > > > > + { > > > > > + /* > > > > > + * If a flush is in progress, wait for it to complete and to > > > > > + * stop the camera, otherwise before any new configuration > > > > > + * attempt we have to stop the camera explictely. > > > > > + */ > > > > Same here, I don't think flush() and configure_streams() can race each > > other. I believe the only possible race to be between flush() and > > process_capture_request(). > > Ditto. > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > + if (state_ == CameraFlushing) > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > > + else > > > > > + stop(); > > > > > + } > > > > > > > > > > if (stream_list->num_streams == 0) { > > > > > LOG(HAL, Error) << "No streams in configuration"; > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > > if (ret) > > > > > return ret; > > > > > > > > > > + /* > > > > > + * Just before queuing the request, make sure flush() has not > > > > > + * been called after this function has been executed. In that > > > > > + * case, immediately return the request with errors. > > > > > + */ > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > > > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > > > > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > > > > > + buffer.release_fence = buffer.acquire_fence; > > > > > + } > > > > > + > > > > > + notifyError(descriptor.frameNumber_, > > > > > + descriptor.buffers_[0].stream, > > > > As commented on a previous patch, I think you should pass nullptr for > > the stream here. > > The "S6. Error management:" section of the camera3.h header does not > mention that, not the ? Indeed, that section doesn't mention the camera3_error_msg::error_stream field at all. The field is documented in the structure as /** * Pointer to the stream that had a failure. NULL if the stream isn't * applicable to the error. */ The question is thus when the stream is applicable to the error. The documentation of enum camera3_error_msg_code mentions error_stream in the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to the device, the request or the result metadata, which are not specific to a stream. > where does you suggestion come from ? I don't find any reference in > the review of [1/8] ([PATCH v3 1/8] android: Rework request completion notification' (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com) > > > > > + CAMERA3_MSG_ERROR_REQUEST); > > > > > + > > > > > + return 0; > > > > > + } > > > > > + > > > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > > > > > { > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > > > > > return; > > > > > } > > > > > > > > > > + /* Release flush if all the pending requests have been completed. */ > > > > > + if (descriptors_.empty()) > > > > > + flushing_.notify_one(); > > > > This will never happen, as you can only get here if descriptors_.find() > > has found the descriptor. Did you mean to do this after the extract() > > call below ? > > Ugh. This works only because Camera::stop() is synchronous then ? I believe so. > > > > > + > > > > > node = descriptors_.extract(it); > > > > > } > > > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > > > --- a/src/android/camera_device.h > > > > > +++ b/src/android/camera_device.h > > > > > @@ -7,6 +7,7 @@ > > > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > > > > > +#include <condition_variable> > > > > > #include <map> > > > > > #include <memory> > > > > > #include <mutex> > > > > > @@ -42,6 +43,7 @@ public: > > > > > > > > > > int open(const hw_module_t *hardwareModule); > > > > > void close(); > > > > > + void flush(); > > > > > > > > > > unsigned int id() const { return id_; } > > > > > camera3_device_t *camera3Device() { return &camera3Device_; } > > > > > @@ -92,6 +94,7 @@ private: > > > > > enum State { > > > > > CameraStopped, > > > > > CameraRunning, > > > > > + CameraFlushing, > > > > > }; > > > > > > > > > > void stop(); > > > > > @@ -120,8 +123,9 @@ private: > > > > > > > > > > CameraWorker worker_; > > > > > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > > > > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > > > > > State state_; > > > > > + std::condition_variable flushed_; > > > > > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > > > @@ -134,8 +138,9 @@ private: > > > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > > > std::vector<CameraStream> streams_; > > > > > > > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > > > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > > > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > > > > + std::condition_variable flushing_; > > > > > > > > > > std::string maker_; > > > > > std::string model_; > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > > > > > index 696e80436821..8a3cfa175ff5 100644 > > > > > --- a/src/android/camera_ops.cpp > > > > > +++ b/src/android/camera_ops.cpp > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > > > > > { > > > > > } > > > > > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > > > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > > > { > > > > > + if (!dev) > > > > > + return -EINVAL; > > > > > + > > > > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > > > > > + camera->flush(); > > > > > + > > > > > return 0; > > > > > } > > > > >
Hi Laurent, On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Jacopo, > > (expanding the CC list to finalize the race conditions discussion) > > On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote: > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote: > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > > > > Implement the flush() camera operation in the CameraDevice class > > > > > > and make it available to the camera framework by implementing the > > > > > > operation wrapper in camera_ops.cpp. > > > > > > > > > > > > The flush() implementation stops the Camera and the worker thread and > > > > > > waits for all in-flight requests to be returned. Stopping the Camera > > > > > > forces all Requests already queued to be returned immediately in error > > > > > > state. As flush() has to wait until all of them have been returned, make it > > > > > > wait on a newly introduced condition variable which is notified by the > > > > > > request completion handler when the queue of pending requests has been > > > > > > exhausted. > > > > > > > > > > > > As flush() can race with processCaptureRequest() protect the requests > > > > > > queueing by introducing a new CameraState::CameraFlushing state that > > > > > > processCaptureRequest() inspects before queuing the Request to the > > > > > > Camera. If flush() has been called while processCaptureRequest() was > > > > > > executing, return the current Request immediately in error state. > > > > > > > > > > > > Protect potentially concurrent calls to close() and configureStreams() > > > > > > Can this happen ? Quoting camera3.h, > > > > > > * 12. Alternatively, the framework may call camera3_device_t->common->close() > > > * to end the camera session. This may be called at any time when no other > > > * calls from the framework are active, although the call may block until all > > > * in-flight captures have completed (all results returned, all buffers > > > * filled). After the close call returns, no more calls to the > > > * camera3_callback_ops_t functions are allowed from the HAL. Once the > > > * close() call is underway, the framework may not call any other HAL device > > > * functions. > > > > > > The important part is "when no other calss from the framework are > > > active". I don't think we need to handle close() racing with anything > > > else than process_capture_request(). > > > > I've been discussing this with Hiro during v1, as initially I didn't > > consider close() and configureStreams(). > > > > https://patchwork.libcamera.org/patch/12248/#16884 > > > > I initially only considered processCaptureRequest() as a potential > > race, but got suggested differently by the cros camera team. > > Let's try to get to the bottom of this. > > Section S2 ("Startup and general expected operation sequence") states: > > * 12. Alternatively, the framework may call camera3_device_t->common->close() > * to end the camera session. This may be called at any time when no other > * calls from the framework are active, although the call may block until all > * in-flight captures have completed (all results returned, all buffers > * filled). After the close call returns, no more calls to the > * camera3_callback_ops_t functions are allowed from the HAL. Once the > * close() call is underway, the framework may not call any other HAL device > * functions. > > There can be in-flight requests when .close() is called, but it can't be > called concurrently with any other call. There's thus no race condition > to protect against. Note that camera3.h is considered outdated, as the interface updates have been reflected only in the HIDL version [1]. However, I don't see a HIDL counterpart of the section you mentioned. [1] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/ > > The .configure_streams() documentation states: > > * Preconditions: > * > * The framework will only call this method when no captures are being > * processed. That is, all results have been returned to the framework, and > * all in-flight input and output buffers have been returned and their > * release sync fences have been signaled by the HAL. The framework will not > * submit new requests for capture while the configure_streams() call is > * underway. > > This clearly forbids calling .configure_streams() and > .process_capture_request() concurrently. > > The .flush() documentation states: > > * Flush all currently in-process captures and all buffers in the pipeline > * on the given device. The framework will use this to dump all state as > * quickly as possible in order to prepare for a configure_streams() call. > > I interpret this as at least a very strong hint that .flush() and > .configure_streams() can't be called concurrently :-) > > If anyone disagrees, I'd like compelling evidence that those races can > occur. Indeed, the newest documentation [2] seems to be stating the same. [2] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107 My reading of the documentation is the same as Laurent's. Hiro, did you hear something contradictory from the Android framework team? Best regards, Tomasz > > > > > > > by inspecting the CameraState, and force a wait for any flush() call > > > > > > to complete before proceeding. > > > > > > > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > > > > --- > > > > > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > > > > > > src/android/camera_device.h | 9 +++- > > > > > > src/android/camera_ops.cpp | 8 +++- > > > > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > > > > index 3fce14035718..899afaa49439 100644 > > > > > > --- a/src/android/camera_device.cpp > > > > > > +++ b/src/android/camera_device.cpp > > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > > > > > > > > > > > void CameraDevice::close() > > > > > > { > > > > > > - streams_.clear(); > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > I'd add a blank line here. > > > > > > > > > + if (state_ == CameraFlushing) { > > > > > > As mentioned above, I don't think you need to protect against close() > > > and flush() racing each other. > > > > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > > > + camera_->release(); > > > > > > > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + streams_.clear(); > > > > > > stop(); > > > > > > > > > > > > camera_->release(); > > > > > > } > > > > > > > > > > > > -void CameraDevice::stop() > > > > > > +/* > > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait > > > > > > s/wait/waits/ > > > > > > > > > + * until all the in-flight requests have been returned before setting the > > > > > > + * camera state to stopped. > > > > > > + * > > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and > > > > > > + * configureStreams(). > > > > > > + */ > > > > > > +void CameraDevice::flush() > > > > > > { > > > > > > + { > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > + > > > > > > + if (state_ != CameraRunning) > > > > > > + return; > > > > > > + > > > > > > + worker_.stop(); > > > > > > + camera_->stop(); > > > > > > + state_ = CameraFlushing; > > > > > > + } > > > > > > + > > > > > > + /* > > > > > > + * Now wait for all the in-flight requests to be completed before > > > > > > + * continuing. Stopping the Camera guarantees that all in-flight > > > > > > + * requests are completed in error state. > > > > > > Do we need to wait ? Camera::stop() guarantees that all requests > > > complete synchronously with the stop() call. > > > > I didn't get the API that way... I thought after stop we would receive > > a sequence of failed requests... Actually I don't see anything that > > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion > > in Camera::stop() > > The camera::stop() documentation states > > * This method stops capturing and processing requests immediately. All pending > * requests are cancelled and complete synchronously in an error state. > > Is this ambiguous ? > > > > Partly answering myself here, we'll have to wait for post-processing > > > tasks to complete once we'll process them in a separate thread, but that > > > will likely be handled by Thread::wait(). I don't think you need a > > > condition variable here. I'm I'm not mistaken, this should simplify the > > > implementation. > > > > If Camera::stop() is synchronous we don't need to wait indeed > > > > > > > > + */ > > > > > > + { > > > > > > + MutexLocker requestsLock(requestsMutex_); > > > > > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > > > > > > + } > > > > > > > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In > > > > > patch 6/8 you add it to protect the state_ variable but here it's > > > > > > > > I'm not changing state_ without the mutex acquired, am I ? > > > > > > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the > > > > > lock checking state_, releasing the lock and do some work, retake the > > > > > lock and update state_ feels like a bad idea. Maybe I'm missing > > > > > > > > How so, apart from the fact it feels a bit unusual, I concur ? > > > > > > > > If I keep the held the mutex for the whole duration of flush no other > > > > concurrent method can proceed until all the queued requests have not > > > > been completed. While flush waits for the flushing_ condition to be > > > > signaled, processCaptureRequest() can proceed and immediately return > > > > the newly queued requests in error state by detecting state_ == > > > > CameraFlushing which signals that flush in is progress. > > > > Otherwise it would have had to wait for flush to end. But then we're back > > > > to a situation where we could serialize all calls and that's it, we > > > > would be done with a single mutex to be held for the whole duration of > > > > all operations. > > > > > > > > If it only was for close() or configureStreams() we could have locked > > > > for the whole duration of flush(), as they anyway wait for flush to > > > > complete before proceeding (by waiting on the flushed_ condition here > > > > below signaled). > > > > > > > > > something and this is not a real problem, if so maybe we can capture > > > > > that in the comment here? > > > > > > > > > > > + > > > > > > + /* > > > > > > + * Set state to stopped and unlock close() or configureStreams() that > > > > > > + * might be waiting for flush to be completed. > > > > > > + */ > > > > > > MutexLocker cameraLock(cameraMutex_); > > > > > > + state_ = CameraStopped; > > > > > > + flushed_.notify_one(); > > > > > > You should drop the lock before calling notify_one(). Otherwise you'll > > > wake up the task waiting on flushed_, which will try to lock > > > cameraMutex_, which will block immediately. The scheduler will have to > > > reschedule this task for the function to return and the lock to be > > > released before the waiter can proceed. That works, but isn't very > > > efficient. > > > > Weird, the cpp reference shows example about notify_one where the > > caller always has the mutex held locked, but I see your point and > > seems correct.. > > I'm looking at > https://en.cppreference.com/w/cpp/thread/condition_variable and > https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one > and both calls to notify_one() in the example are made without the lock > held, aren't they ? > > > > > > > { > > > MutexLocker cameraLock(cameraMutex_); > > > state_ = CameraStopped; > > > } > > > > > > flushed_.notify_one(); > > > > > > > So I could change to this one, if I don't have to drop this part > > completely if we consider close() and configureStreams() not as > > possible races... > > > > > > > > +} > > > > > > + > > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > > > > > > +void CameraDevice::stop() > > > > > > +{ > > > > > > + ASSERT(state_ != CameraFlushing); > > > > > > + > > > > > > if (state_ == CameraStopped) > > > > > > return; > > > > > > > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > > > > > > */ > > > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > > > > > { > > > > > > - /* Before any configuration attempt, stop the camera. */ > > > > > > - stop(); > > > > > > + { > > > > > > + /* > > > > > > + * If a flush is in progress, wait for it to complete and to > > > > > > + * stop the camera, otherwise before any new configuration > > > > > > + * attempt we have to stop the camera explictely. > > > > > > + */ > > > > > > Same here, I don't think flush() and configure_streams() can race each > > > other. I believe the only possible race to be between flush() and > > > process_capture_request(). > > > > Ditto. > > > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > + if (state_ == CameraFlushing) > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > > > + else > > > > > > + stop(); > > > > > > + } > > > > > > > > > > > > if (stream_list->num_streams == 0) { > > > > > > LOG(HAL, Error) << "No streams in configuration"; > > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > > > if (ret) > > > > > > return ret; > > > > > > > > > > > > + /* > > > > > > + * Just before queuing the request, make sure flush() has not > > > > > > + * been called after this function has been executed. In that > > > > > > + * case, immediately return the request with errors. > > > > > > + */ > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > > > > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > > > > > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > > > > > > + buffer.release_fence = buffer.acquire_fence; > > > > > > + } > > > > > > + > > > > > > + notifyError(descriptor.frameNumber_, > > > > > > + descriptor.buffers_[0].stream, > > > > > > As commented on a previous patch, I think you should pass nullptr for > > > the stream here. > > > > The "S6. Error management:" section of the camera3.h header does not > > mention that, not the ? > > Indeed, that section doesn't mention the camera3_error_msg::error_stream > field at all. The field is documented in the structure as > > /** > * Pointer to the stream that had a failure. NULL if the stream isn't > * applicable to the error. > */ > > The question is thus when the stream is applicable to the error. The > documentation of enum camera3_error_msg_code mentions error_stream in > the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to > the device, the request or the result metadata, which are not specific > to a stream. > > > where does you suggestion come from ? I don't find any reference in > > the review of [1/8] > > ([PATCH v3 1/8] android: Rework request completion notification' > (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com) > > > > > > > + CAMERA3_MSG_ERROR_REQUEST); > > > > > > + > > > > > > + return 0; > > > > > > + } > > > > > > + > > > > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > > > > > > > { > > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > > > > > > return; > > > > > > } > > > > > > > > > > > > + /* Release flush if all the pending requests have been completed. */ > > > > > > + if (descriptors_.empty()) > > > > > > + flushing_.notify_one(); > > > > > > This will never happen, as you can only get here if descriptors_.find() > > > has found the descriptor. Did you mean to do this after the extract() > > > call below ? > > > > Ugh. This works only because Camera::stop() is synchronous then ? > > I believe so. > > > > > > > + > > > > > > node = descriptors_.extract(it); > > > > > > } > > > > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > > > > --- a/src/android/camera_device.h > > > > > > +++ b/src/android/camera_device.h > > > > > > @@ -7,6 +7,7 @@ > > > > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > > > > > > > +#include <condition_variable> > > > > > > #include <map> > > > > > > #include <memory> > > > > > > #include <mutex> > > > > > > @@ -42,6 +43,7 @@ public: > > > > > > > > > > > > int open(const hw_module_t *hardwareModule); > > > > > > void close(); > > > > > > + void flush(); > > > > > > > > > > > > unsigned int id() const { return id_; } > > > > > > camera3_device_t *camera3Device() { return &camera3Device_; } > > > > > > @@ -92,6 +94,7 @@ private: > > > > > > enum State { > > > > > > CameraStopped, > > > > > > CameraRunning, > > > > > > + CameraFlushing, > > > > > > }; > > > > > > > > > > > > void stop(); > > > > > > @@ -120,8 +123,9 @@ private: > > > > > > > > > > > > CameraWorker worker_; > > > > > > > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > > > > > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > > > > > > State state_; > > > > > > + std::condition_variable flushed_; > > > > > > > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > > > > @@ -134,8 +138,9 @@ private: > > > > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > > > > std::vector<CameraStream> streams_; > > > > > > > > > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > > > > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > > > > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > > > > > + std::condition_variable flushing_; > > > > > > > > > > > > std::string maker_; > > > > > > std::string model_; > > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > > > > > > index 696e80436821..8a3cfa175ff5 100644 > > > > > > --- a/src/android/camera_ops.cpp > > > > > > +++ b/src/android/camera_ops.cpp > > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > > > > > > { > > > > > > } > > > > > > > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > > > > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > > > > { > > > > > > + if (!dev) > > > > > > + return -EINVAL; > > > > > > + > > > > > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > > > > > > + camera->flush(); > > > > > > + > > > > > > return 0; > > > > > > } > > > > > > > > -- > Regards, > > Laurent Pinchart
Hi Laurent and Tomasz. On Thu, May 27, 2021 at 11:49 AM Tomasz Figa <tfiga@chromium.org> wrote: > Hi Laurent, > > On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart > <laurent.pinchart@ideasonboard.com> wrote: > > > > Hi Jacopo, > > > > (expanding the CC list to finalize the race conditions discussion) > > > > On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote: > > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote: > > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > > > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > > > > > Implement the flush() camera operation in the CameraDevice > class > > > > > > > and make it available to the camera framework by implementing > the > > > > > > > operation wrapper in camera_ops.cpp. > > > > > > > > > > > > > > The flush() implementation stops the Camera and the worker > thread and > > > > > > > waits for all in-flight requests to be returned. Stopping the > Camera > > > > > > > forces all Requests already queued to be returned immediately > in error > > > > > > > state. As flush() has to wait until all of them have been > returned, make it > > > > > > > wait on a newly introduced condition variable which is > notified by the > > > > > > > request completion handler when the queue of pending requests > has been > > > > > > > exhausted. > > > > > > > > > > > > > > As flush() can race with processCaptureRequest() protect the > requests > > > > > > > queueing by introducing a new CameraState::CameraFlushing > state that > > > > > > > processCaptureRequest() inspects before queuing the Request to > the > > > > > > > Camera. If flush() has been called while > processCaptureRequest() was > > > > > > > executing, return the current Request immediately in error > state. > > > > > > > > > > > > > > Protect potentially concurrent calls to close() and > configureStreams() > > > > > > > > Can this happen ? Quoting camera3.h, > > > > > > > > * 12. Alternatively, the framework may call > camera3_device_t->common->close() > > > > * to end the camera session. This may be called at any time when > no other > > > > * calls from the framework are active, although the call may > block until all > > > > * in-flight captures have completed (all results returned, all > buffers > > > > * filled). After the close call returns, no more calls to the > > > > * camera3_callback_ops_t functions are allowed from the HAL. > Once the > > > > * close() call is underway, the framework may not call any other > HAL device > > > > * functions. > > > > > > > > The important part is "when no other calss from the framework are > > > > active". I don't think we need to handle close() racing with anything > > > > else than process_capture_request(). > > > > > > I've been discussing this with Hiro during v1, as initially I didn't > > > consider close() and configureStreams(). > > > > > > https://patchwork.libcamera.org/patch/12248/#16884 > > > > > > I initially only considered processCaptureRequest() as a potential > > > race, but got suggested differently by the cros camera team. > > > > Let's try to get to the bottom of this. > > > > Section S2 ("Startup and general expected operation sequence") states: > > > > * 12. Alternatively, the framework may call > camera3_device_t->common->close() > > * to end the camera session. This may be called at any time when no > other > > * calls from the framework are active, although the call may block > until all > > * in-flight captures have completed (all results returned, all > buffers > > * filled). After the close call returns, no more calls to the > > * camera3_callback_ops_t functions are allowed from the HAL. Once the > > * close() call is underway, the framework may not call any other HAL > device > > * functions. > > > > There can be in-flight requests when .close() is called, but it can't be > > called concurrently with any other call. There's thus no race condition > > to protect against. > > Note that camera3.h is considered outdated, as the interface updates > have been reflected only in the HIDL version [1]. However, I don't see > a HIDL counterpart of the section you mentioned. > > [1] > https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/ > > > > > The .configure_streams() documentation states: > > > > * Preconditions: > > * > > * The framework will only call this method when no captures are > being > > * processed. That is, all results have been returned to the > framework, and > > * all in-flight input and output buffers have been returned and > their > > * release sync fences have been signaled by the HAL. The framework > will not > > * submit new requests for capture while the configure_streams() > call is > > * underway. > > > > This clearly forbids calling .configure_streams() and > > .process_capture_request() concurrently. > > > > The .flush() documentation states: > > > > * Flush all currently in-process captures and all buffers in the > pipeline > > * on the given device. The framework will use this to dump all > state as > > * quickly as possible in order to prepare for a configure_streams() > call. > > > > I interpret this as at least a very strong hint that .flush() and > > .configure_streams() can't be called concurrently :-) > > > > If anyone disagrees, I'd like compelling evidence that those races can > > occur. > > Indeed, the newest documentation [2] seems to be stating the same. > > [2] > https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107 > > My reading of the documentation is the same as Laurent's. Hiro, did > you hear something contradictory from the Android framework team? > > I got from Ricky that flush() and close() can be called anytime. Ricky, have we requested the Android framework team to specify this in the document? Additionally, in the discussion mail with Android framework team, I found the Android team told > We also guarantee we won't call configureStreams while processCaptureRequest is being called (or if there are any capture requests that haven't yet been completed, for that matter). So probably some protection in Jacopo's code can be saved. Sorry Jacopo for reworking. -Hiro > Best regards, > Tomasz > > > > > > > > > > by inspecting the CameraState, and force a wait for any > flush() call > > > > > > > to complete before proceeding. > > > > > > > > > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > > > > > --- > > > > > > > src/android/camera_device.cpp | 90 > +++++++++++++++++++++++++++++++++-- > > > > > > > src/android/camera_device.h | 9 +++- > > > > > > > src/android/camera_ops.cpp | 8 +++- > > > > > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > diff --git a/src/android/camera_device.cpp > b/src/android/camera_device.cpp > > > > > > > index 3fce14035718..899afaa49439 100644 > > > > > > > --- a/src/android/camera_device.cpp > > > > > > > +++ b/src/android/camera_device.cpp > > > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t > *hardwareModule) > > > > > > > > > > > > > > void CameraDevice::close() > > > > > > > { > > > > > > > - streams_.clear(); > > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > > > I'd add a blank line here. > > > > > > > > > > > + if (state_ == CameraFlushing) { > > > > > > > > As mentioned above, I don't think you need to protect against close() > > > > and flush() racing each other. > > > > > > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != > CameraStopped; }); > > > > > > > + camera_->release(); > > > > > > > > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > + streams_.clear(); > > > > > > > stop(); > > > > > > > > > > > > > > camera_->release(); > > > > > > > } > > > > > > > > > > > > > > -void CameraDevice::stop() > > > > > > > +/* > > > > > > > + * Flush is similar to stop() but sets the camera state to > 'flushing' and wait > > > > > > > > s/wait/waits/ > > > > > > > > > > > + * until all the in-flight requests have been returned before > setting the > > > > > > > + * camera state to stopped. > > > > > > > + * > > > > > > > + * Once flushing is done it unlocks concurrent calls to > camera close() and > > > > > > > + * configureStreams(). > > > > > > > + */ > > > > > > > +void CameraDevice::flush() > > > > > > > { > > > > > > > + { > > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > > + > > > > > > > + if (state_ != CameraRunning) > > > > > > > + return; > > > > > > > + > > > > > > > + worker_.stop(); > > > > > > > + camera_->stop(); > > > > > > > + state_ = CameraFlushing; > > > > > > > + } > > > > > > > + > > > > > > > + /* > > > > > > > + * Now wait for all the in-flight requests to be > completed before > > > > > > > + * continuing. Stopping the Camera guarantees that all > in-flight > > > > > > > + * requests are completed in error state. > > > > > > > > Do we need to wait ? Camera::stop() guarantees that all requests > > > > complete synchronously with the stop() call. > > > > > > I didn't get the API that way... I thought after stop we would receive > > > a sequence of failed requests... Actually I don't see anything that > > > suggests that in camera.cpp or pipeline_handler.cpp apart from an > assertion > > > in Camera::stop() > > > > The camera::stop() documentation states > > > > * This method stops capturing and processing requests immediately. All > pending > > * requests are cancelled and complete synchronously in an error state. > > > > Is this ambiguous ? > Perhaps, should "and complete synchronously" be rephrased "and Camera::requestComplete() is executed for all of them before this returns"? > > > > > Partly answering myself here, we'll have to wait for post-processing > > > > tasks to complete once we'll process them in a separate thread, but > that > > > > will likely be handled by Thread::wait(). I don't think you need a > > > > condition variable here. I'm I'm not mistaken, this should simplify > the > > > > implementation. > > > > > > If Camera::stop() is synchronous we don't need to wait indeed > > > > > > > > > > + */ > > > > > > > + { > > > > > > > + MutexLocker requestsLock(requestsMutex_); > > > > > > > + flushing_.wait(requestsLock, [&] { return > descriptors_.empty(); }); > > > > > > > + } > > > > > > > > > > > > I'm still uneasy about releasing the cameraMutex_ for this > section. In > > > > > > patch 6/8 you add it to protect the state_ variable but here it's > > > > > > > > > > I'm not changing state_ without the mutex acquired, am I ? > > > > > > > > > > > ignored. I see the ASSERT() added to stop() but the patter of > taking the > > > > > > lock checking state_, releasing the lock and do some work, > retake the > > > > > > lock and update state_ feels like a bad idea. Maybe I'm missing > > > > > > > > > > How so, apart from the fact it feels a bit unusual, I concur ? > > > > > > > > > > If I keep the held the mutex for the whole duration of flush no > other > > > > > concurrent method can proceed until all the queued requests have > not > > > > > been completed. While flush waits for the flushing_ condition to be > > > > > signaled, processCaptureRequest() can proceed and immediately > return > > > > > the newly queued requests in error state by detecting state_ == > > > > > CameraFlushing which signals that flush in is progress. > > > > > Otherwise it would have had to wait for flush to end. But then > we're back > > > > > to a situation where we could serialize all calls and that's it, we > > > > > would be done with a single mutex to be held for the whole > duration of > > > > > all operations. > > > > > > > > > > If it only was for close() or configureStreams() we could have > locked > > > > > for the whole duration of flush(), as they anyway wait for flush to > > > > > complete before proceeding (by waiting on the flushed_ condition > here > > > > > below signaled). > > > > > > > > > > > something and this is not a real problem, if so maybe we can > capture > > > > > > that in the comment here? > > > > > > > > > > > > > + > > > > > > > + /* > > > > > > > + * Set state to stopped and unlock close() or > configureStreams() that > > > > > > > + * might be waiting for flush to be completed. > > > > > > > + */ > > > > > > > MutexLocker cameraLock(cameraMutex_); > > > > > > > + state_ = CameraStopped; > > > > > > > + flushed_.notify_one(); > > > > > > > > You should drop the lock before calling notify_one(). Otherwise > you'll > > > > wake up the task waiting on flushed_, which will try to lock > > > > cameraMutex_, which will block immediately. The scheduler will have > to > > > > reschedule this task for the function to return and the lock to be > > > > released before the waiter can proceed. That works, but isn't very > > > > efficient. > > > > > > Weird, the cpp reference shows example about notify_one where the > > > caller always has the mutex held locked, but I see your point and > > > seems correct.. > > > > I'm looking at > > https://en.cppreference.com/w/cpp/thread/condition_variable and > > https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one > > and both calls to notify_one() in the example are made without the lock > > held, aren't they ? > > > > > > > > > > { > > > > MutexLocker cameraLock(cameraMutex_); > > > > state_ = CameraStopped; > > > > } > > > > > > > > flushed_.notify_one(); > > > > > > > > > > So I could change to this one, if I don't have to drop this part > > > completely if we consider close() and configureStreams() not as > > > possible races... > > > > > > > > > > +} > > > > > > > + > > > > > > > +/* Calls to stop() must be protected by cameraMutex_ being > held by the caller. */ > > > > > > > +void CameraDevice::stop() > > > > > > > +{ > > > > > > > + ASSERT(state_ != CameraFlushing); > > > > > > > + > > > > > > > if (state_ == CameraStopped) > > > > > > > return; > > > > > > > > > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat > CameraDevice::toPixelFormat(int format) const > > > > > > > */ > > > > > > > int > CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > > > > > > { > > > > > > > - /* Before any configuration attempt, stop the camera. */ > > > > > > > - stop(); > > > > > > > + { > > > > > > > + /* > > > > > > > + * If a flush is in progress, wait for it to > complete and to > > > > > > > + * stop the camera, otherwise before any new > configuration > > > > > > > + * attempt we have to stop the camera explictely. > > > > > > > + */ > > > > > > > > Same here, I don't think flush() and configure_streams() can race > each > > > > other. I believe the only possible race to be between flush() and > > > > process_capture_request(). > > > > > > Ditto. > > > > > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > > + if (state_ == CameraFlushing) > > > > > > > + flushed_.wait(cameraLock, [&] { return > state_ != CameraStopped; }); > > > > > > > + else > > > > > > > + stop(); > > > > > > > + } > > > > > > > > > > > > > > if (stream_list->num_streams == 0) { > > > > > > > LOG(HAL, Error) << "No streams in configuration"; > > > > > > > @@ -1950,6 +2009,25 @@ int > CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > > > > if (ret) > > > > > > > return ret; > > > > > > > > > > > > > > + /* > > > > > > > + * Just before queuing the request, make sure flush() > has not > > > > > > > + * been called after this function has been executed. In > that > > > > > > > + * case, immediately return the request with errors. > > > > > > > + */ > > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > > + if (state_ == CameraFlushing || state_ == CameraStopped) > { > > > > > > > + for (camera3_stream_buffer_t &buffer : > descriptor.buffers_) { > > > > > > > + buffer.status = > CAMERA3_BUFFER_STATUS_ERROR; > > > > > > > + buffer.release_fence = > buffer.acquire_fence; > > > > > > > + } > > > > > > > + > > > > > > > + notifyError(descriptor.frameNumber_, > > > > > > > + descriptor.buffers_[0].stream, > > > > > > > > As commented on a previous patch, I think you should pass nullptr for > > > > the stream here. > > > > > > The "S6. Error management:" section of the camera3.h header does not > > > mention that, not the ? > > > > Indeed, that section doesn't mention the camera3_error_msg::error_stream > > field at all. The field is documented in the structure as > > > > /** > > * Pointer to the stream that had a failure. NULL if the stream isn't > > * applicable to the error. > > */ > > > > The question is thus when the stream is applicable to the error. The > > documentation of enum camera3_error_msg_code mentions error_stream in > > the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to > > the device, the request or the result metadata, which are not specific > > to a stream. > > > > > where does you suggestion come from ? I don't find any reference in > > > the review of [1/8] > > > > ([PATCH v3 1/8] android: Rework request completion notification' > > (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com) > > > > > > > > > + CAMERA3_MSG_ERROR_REQUEST); > > > > > > > + > > > > > > > + return 0; > > > > > > > + } > > > > > > > + > > > > > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > > > > > > > > > { > > > > > > > @@ -1979,6 +2057,10 @@ void > CameraDevice::requestComplete(Request *request) > > > > > > > return; > > > > > > > } > > > > > > > > > > > > > > + /* Release flush if all the pending requests > have been completed. */ > > > > > > > + if (descriptors_.empty()) > > > > > > > + flushing_.notify_one(); > > > > > > > > This will never happen, as you can only get here if > descriptors_.find() > > > > has found the descriptor. Did you mean to do this after the extract() > > > > call below ? > > > > > > Ugh. This works only because Camera::stop() is synchronous then ? > > > > I believe so. > > > > > > > > > + > > > > > > > node = descriptors_.extract(it); > > > > > > > } > > > > > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > > > > > diff --git a/src/android/camera_device.h > b/src/android/camera_device.h > > > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > > > > > --- a/src/android/camera_device.h > > > > > > > +++ b/src/android/camera_device.h > > > > > > > @@ -7,6 +7,7 @@ > > > > > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > > > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > > > > > > > > > +#include <condition_variable> > > > > > > > #include <map> > > > > > > > #include <memory> > > > > > > > #include <mutex> > > > > > > > @@ -42,6 +43,7 @@ public: > > > > > > > > > > > > > > int open(const hw_module_t *hardwareModule); > > > > > > > void close(); > > > > > > > + void flush(); > > > > > > > > > > > > > > unsigned int id() const { return id_; } > > > > > > > camera3_device_t *camera3Device() { return > &camera3Device_; } > > > > > > > @@ -92,6 +94,7 @@ private: > > > > > > > enum State { > > > > > > > CameraStopped, > > > > > > > CameraRunning, > > > > > > > + CameraFlushing, > > > > > > > }; > > > > > > > > > > > > > > void stop(); > > > > > > > @@ -120,8 +123,9 @@ private: > > > > > > > > > > > > > > CameraWorker worker_; > > > > > > > > > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the > camera state. */ > > > > > > > + libcamera::Mutex cameraMutex_; /* Protects the camera > state and flushed_. */ > > > > > > > State state_; > > > > > > > + std::condition_variable flushed_; > > > > > > > > > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > > > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > > > > > @@ -134,8 +138,9 @@ private: > > > > > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > > > > > std::vector<CameraStream> streams_; > > > > > > > > > > > > > > - libcamera::Mutex requestsMutex_; /* Protects > descriptors_. */ > > > > > > > + libcamera::Mutex requestsMutex_; /* Protects > descriptors_ and flushing_. */ > > > > > > > std::map<uint64_t, Camera3RequestDescriptor> > descriptors_; > > > > > > > + std::condition_variable flushing_; > > > > > > > > > > > > > > std::string maker_; > > > > > > > std::string model_; > > > > > > > diff --git a/src/android/camera_ops.cpp > b/src/android/camera_ops.cpp > > > > > > > index 696e80436821..8a3cfa175ff5 100644 > > > > > > > --- a/src/android/camera_ops.cpp > > > > > > > +++ b/src/android/camera_ops.cpp > > > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] > const struct camera3_device *dev, > > > > > > > { > > > > > > > } > > > > > > > > > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct > camera3_device *dev) > > > > > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > > > > > { > > > > > > > + if (!dev) > > > > > > > + return -EINVAL; > > > > > > > + > > > > > > > + CameraDevice *camera = reinterpret_cast<CameraDevice > *>(dev->priv); > > > > > > > + camera->flush(); > > > > > > > + > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > -- > > Regards, > > > > Laurent Pinchart >
Hi Laurent, thanks for the detailed answer On Thu, May 27, 2021 at 05:26:51AM +0300, Laurent Pinchart wrote: > Hi Jacopo, > > (expanding the CC list to finalize the race conditions discussion) > > On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote: > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote: > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote: > > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote: > > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote: > > > > > > Implement the flush() camera operation in the CameraDevice class > > > > > > and make it available to the camera framework by implementing the > > > > > > operation wrapper in camera_ops.cpp. > > > > > > > > > > > > The flush() implementation stops the Camera and the worker thread and > > > > > > waits for all in-flight requests to be returned. Stopping the Camera > > > > > > forces all Requests already queued to be returned immediately in error > > > > > > state. As flush() has to wait until all of them have been returned, make it > > > > > > wait on a newly introduced condition variable which is notified by the > > > > > > request completion handler when the queue of pending requests has been > > > > > > exhausted. > > > > > > > > > > > > As flush() can race with processCaptureRequest() protect the requests > > > > > > queueing by introducing a new CameraState::CameraFlushing state that > > > > > > processCaptureRequest() inspects before queuing the Request to the > > > > > > Camera. If flush() has been called while processCaptureRequest() was > > > > > > executing, return the current Request immediately in error state. > > > > > > > > > > > > Protect potentially concurrent calls to close() and configureStreams() > > > > > > Can this happen ? Quoting camera3.h, > > > > > > * 12. Alternatively, the framework may call camera3_device_t->common->close() > > > * to end the camera session. This may be called at any time when no other > > > * calls from the framework are active, although the call may block until all > > > * in-flight captures have completed (all results returned, all buffers > > > * filled). After the close call returns, no more calls to the > > > * camera3_callback_ops_t functions are allowed from the HAL. Once the > > > * close() call is underway, the framework may not call any other HAL device > > > * functions. > > > > > > The important part is "when no other calss from the framework are > > > active". I don't think we need to handle close() racing with anything > > > else than process_capture_request(). > > > > I've been discussing this with Hiro during v1, as initially I didn't > > consider close() and configureStreams(). > > > > https://patchwork.libcamera.org/patch/12248/#16884 > > > > I initially only considered processCaptureRequest() as a potential > > race, but got suggested differently by the cros camera team. > > Let's try to get to the bottom of this. > > Section S2 ("Startup and general expected operation sequence") states: > > * 12. Alternatively, the framework may call camera3_device_t->common->close() > * to end the camera session. This may be called at any time when no other > * calls from the framework are active, although the call may block until all > * in-flight captures have completed (all results returned, all buffers > * filled). After the close call returns, no more calls to the > * camera3_callback_ops_t functions are allowed from the HAL. Once the > * close() call is underway, the framework may not call any other HAL device > * functions. > > There can be in-flight requests when .close() is called, but it can't be > called concurrently with any other call. There's thus no race condition > to protect against. > > The .configure_streams() documentation states: > > * Preconditions: > * > * The framework will only call this method when no captures are being > * processed. That is, all results have been returned to the framework, and > * all in-flight input and output buffers have been returned and their > * release sync fences have been signaled by the HAL. The framework will not > * submit new requests for capture while the configure_streams() call is > * underway. > > This clearly forbids calling .configure_streams() and > .process_capture_request() concurrently. > > The .flush() documentation states: > > * Flush all currently in-process captures and all buffers in the pipeline > * on the given device. The framework will use this to dump all state as > * quickly as possible in order to prepare for a configure_streams() call. > > I interpret this as at least a very strong hint that .flush() and > .configure_streams() can't be called concurrently :-) > > If anyone disagrees, I'd like compelling evidence that those races can > occur. > > > > > > > by inspecting the CameraState, and force a wait for any flush() call > > > > > > to complete before proceeding. > > > > > > > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> > > > > > > --- > > > > > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- > > > > > > src/android/camera_device.h | 9 +++- > > > > > > src/android/camera_ops.cpp | 8 +++- > > > > > > 3 files changed, 100 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > > > > index 3fce14035718..899afaa49439 100644 > > > > > > --- a/src/android/camera_device.cpp > > > > > > +++ b/src/android/camera_device.cpp > > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) > > > > > > > > > > > > void CameraDevice::close() > > > > > > { > > > > > > - streams_.clear(); > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > I'd add a blank line here. > > > > > > > > > + if (state_ == CameraFlushing) { > > > > > > As mentioned above, I don't think you need to protect against close() > > > and flush() racing each other. > > > > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > > > + camera_->release(); > > > > > > > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + streams_.clear(); > > > > > > stop(); > > > > > > > > > > > > camera_->release(); > > > > > > } > > > > > > > > > > > > -void CameraDevice::stop() > > > > > > +/* > > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait > > > > > > s/wait/waits/ > > > > > > > > > + * until all the in-flight requests have been returned before setting the > > > > > > + * camera state to stopped. > > > > > > + * > > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and > > > > > > + * configureStreams(). > > > > > > + */ > > > > > > +void CameraDevice::flush() > > > > > > { > > > > > > + { > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > + > > > > > > + if (state_ != CameraRunning) > > > > > > + return; > > > > > > + > > > > > > + worker_.stop(); > > > > > > + camera_->stop(); > > > > > > + state_ = CameraFlushing; > > > > > > + } > > > > > > + > > > > > > + /* > > > > > > + * Now wait for all the in-flight requests to be completed before > > > > > > + * continuing. Stopping the Camera guarantees that all in-flight > > > > > > + * requests are completed in error state. > > > > > > Do we need to wait ? Camera::stop() guarantees that all requests > > > complete synchronously with the stop() call. > > > > I didn't get the API that way... I thought after stop we would receive > > a sequence of failed requests... Actually I don't see anything that > > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion > > in Camera::stop() > > The camera::stop() documentation states > > * This method stops capturing and processing requests immediately. All pending > * requests are cancelled and complete synchronously in an error state. > > Is this ambiguous ? > I admit I haven't looked at documentation but only the code paths.. > > > Partly answering myself here, we'll have to wait for post-processing > > > tasks to complete once we'll process them in a separate thread, but that > > > will likely be handled by Thread::wait(). I don't think you need a > > > condition variable here. I'm I'm not mistaken, this should simplify the > > > implementation. > > > > If Camera::stop() is synchronous we don't need to wait indeed > > > > > > > > + */ > > > > > > + { > > > > > > + MutexLocker requestsLock(requestsMutex_); > > > > > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); > > > > > > + } > > > > > > > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In > > > > > patch 6/8 you add it to protect the state_ variable but here it's > > > > > > > > I'm not changing state_ without the mutex acquired, am I ? > > > > > > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the > > > > > lock checking state_, releasing the lock and do some work, retake the > > > > > lock and update state_ feels like a bad idea. Maybe I'm missing > > > > > > > > How so, apart from the fact it feels a bit unusual, I concur ? > > > > > > > > If I keep the held the mutex for the whole duration of flush no other > > > > concurrent method can proceed until all the queued requests have not > > > > been completed. While flush waits for the flushing_ condition to be > > > > signaled, processCaptureRequest() can proceed and immediately return > > > > the newly queued requests in error state by detecting state_ == > > > > CameraFlushing which signals that flush in is progress. > > > > Otherwise it would have had to wait for flush to end. But then we're back > > > > to a situation where we could serialize all calls and that's it, we > > > > would be done with a single mutex to be held for the whole duration of > > > > all operations. > > > > > > > > If it only was for close() or configureStreams() we could have locked > > > > for the whole duration of flush(), as they anyway wait for flush to > > > > complete before proceeding (by waiting on the flushed_ condition here > > > > below signaled). > > > > > > > > > something and this is not a real problem, if so maybe we can capture > > > > > that in the comment here? > > > > > > > > > > > + > > > > > > + /* > > > > > > + * Set state to stopped and unlock close() or configureStreams() that > > > > > > + * might be waiting for flush to be completed. > > > > > > + */ > > > > > > MutexLocker cameraLock(cameraMutex_); > > > > > > + state_ = CameraStopped; > > > > > > + flushed_.notify_one(); > > > > > > You should drop the lock before calling notify_one(). Otherwise you'll > > > wake up the task waiting on flushed_, which will try to lock > > > cameraMutex_, which will block immediately. The scheduler will have to > > > reschedule this task for the function to return and the lock to be > > > released before the waiter can proceed. That works, but isn't very > > > efficient. > > > > Weird, the cpp reference shows example about notify_one where the > > caller always has the mutex held locked, but I see your point and > > seems correct.. > > I'm looking at > https://en.cppreference.com/w/cpp/thread/condition_variable and > https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one > and both calls to notify_one() in the example are made without the lock > held, aren't they ? o_0 I would swear I've seen different producer/consumer examples in the documentation. As I assume they haven't changed overnight, it's clearly my immagination.. > > > > > > > { > > > MutexLocker cameraLock(cameraMutex_); > > > state_ = CameraStopped; > > > } > > > > > > flushed_.notify_one(); > > > > > > > So I could change to this one, if I don't have to drop this part > > completely if we consider close() and configureStreams() not as > > possible races... > > > > > > > > +} > > > > > > + > > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ > > > > > > +void CameraDevice::stop() > > > > > > +{ > > > > > > + ASSERT(state_ != CameraFlushing); > > > > > > + > > > > > > if (state_ == CameraStopped) > > > > > > return; > > > > > > > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const > > > > > > */ > > > > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) > > > > > > { > > > > > > - /* Before any configuration attempt, stop the camera. */ > > > > > > - stop(); > > > > > > + { > > > > > > + /* > > > > > > + * If a flush is in progress, wait for it to complete and to > > > > > > + * stop the camera, otherwise before any new configuration > > > > > > + * attempt we have to stop the camera explictely. > > > > > > + */ > > > > > > Same here, I don't think flush() and configure_streams() can race each > > > other. I believe the only possible race to be between flush() and > > > process_capture_request(). > > > > Ditto. > > > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > + if (state_ == CameraFlushing) > > > > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); > > > > > > + else > > > > > > + stop(); > > > > > > + } > > > > > > > > > > > > if (stream_list->num_streams == 0) { > > > > > > LOG(HAL, Error) << "No streams in configuration"; > > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > > > if (ret) > > > > > > return ret; > > > > > > > > > > > > + /* > > > > > > + * Just before queuing the request, make sure flush() has not > > > > > > + * been called after this function has been executed. In that > > > > > > + * case, immediately return the request with errors. > > > > > > + */ > > > > > > + MutexLocker cameraLock(cameraMutex_); > > > > > > + if (state_ == CameraFlushing || state_ == CameraStopped) { > > > > > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { > > > > > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; > > > > > > + buffer.release_fence = buffer.acquire_fence; > > > > > > + } > > > > > > + > > > > > > + notifyError(descriptor.frameNumber_, > > > > > > + descriptor.buffers_[0].stream, > > > > > > As commented on a previous patch, I think you should pass nullptr for > > > the stream here. > > > > The "S6. Error management:" section of the camera3.h header does not > > mention that, not the ? > > Indeed, that section doesn't mention the camera3_error_msg::error_stream > field at all. The field is documented in the structure as > > /** > * Pointer to the stream that had a failure. NULL if the stream isn't > * applicable to the error. > */ > > The question is thus when the stream is applicable to the error. The > documentation of enum camera3_error_msg_code mentions error_stream in > the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to > the device, the request or the result metadata, which are not specific > to a stream. > Ack, thanks for clarifying > > where does you suggestion come from ? I don't find any reference in > > the review of [1/8] > > ([PATCH v3 1/8] android: Rework request completion notification' > (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com) > > > > > > > + CAMERA3_MSG_ERROR_REQUEST); > > > > > > + > > > > > > + return 0; > > > > > > + } > > > > > > + > > > > > > worker_.queueRequest(descriptor.request_.get()); > > > > > > > > > > > > { > > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) > > > > > > return; > > > > > > } > > > > > > > > > > > > + /* Release flush if all the pending requests have been completed. */ > > > > > > + if (descriptors_.empty()) > > > > > > + flushing_.notify_one(); > > > > > > This will never happen, as you can only get here if descriptors_.find() > > > has found the descriptor. Did you mean to do this after the extract() > > > call below ? > > > > Ugh. This works only because Camera::stop() is synchronous then ? > > I believe so. > > > > > > > + > > > > > > node = descriptors_.extract(it); > > > > > > } > > > > > > Camera3RequestDescriptor &descriptor = node.mapped(); > > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h > > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644 > > > > > > --- a/src/android/camera_device.h > > > > > > +++ b/src/android/camera_device.h > > > > > > @@ -7,6 +7,7 @@ > > > > > > #ifndef __ANDROID_CAMERA_DEVICE_H__ > > > > > > #define __ANDROID_CAMERA_DEVICE_H__ > > > > > > > > > > > > +#include <condition_variable> > > > > > > #include <map> > > > > > > #include <memory> > > > > > > #include <mutex> > > > > > > @@ -42,6 +43,7 @@ public: > > > > > > > > > > > > int open(const hw_module_t *hardwareModule); > > > > > > void close(); > > > > > > + void flush(); > > > > > > > > > > > > unsigned int id() const { return id_; } > > > > > > camera3_device_t *camera3Device() { return &camera3Device_; } > > > > > > @@ -92,6 +94,7 @@ private: > > > > > > enum State { > > > > > > CameraStopped, > > > > > > CameraRunning, > > > > > > + CameraFlushing, > > > > > > }; > > > > > > > > > > > > void stop(); > > > > > > @@ -120,8 +123,9 @@ private: > > > > > > > > > > > > CameraWorker worker_; > > > > > > > > > > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ > > > > > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ > > > > > > State state_; > > > > > > + std::condition_variable flushed_; > > > > > > > > > > > > std::shared_ptr<libcamera::Camera> camera_; > > > > > > std::unique_ptr<libcamera::CameraConfiguration> config_; > > > > > > @@ -134,8 +138,9 @@ private: > > > > > > std::map<int, libcamera::PixelFormat> formatsMap_; > > > > > > std::vector<CameraStream> streams_; > > > > > > > > > > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ > > > > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ > > > > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > > > > > + std::condition_variable flushing_; > > > > > > > > > > > > std::string maker_; > > > > > > std::string model_; > > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp > > > > > > index 696e80436821..8a3cfa175ff5 100644 > > > > > > --- a/src/android/camera_ops.cpp > > > > > > +++ b/src/android/camera_ops.cpp > > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, > > > > > > { > > > > > > } > > > > > > > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) > > > > > > +static int hal_dev_flush(const struct camera3_device *dev) > > > > > > { > > > > > > + if (!dev) > > > > > > + return -EINVAL; > > > > > > + > > > > > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); > > > > > > + camera->flush(); > > > > > > + > > > > > > return 0; > > > > > > } > > > > > > > > -- > Regards, > > Laurent Pinchart
diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp index 3fce14035718..899afaa49439 100644 --- a/src/android/camera_device.cpp +++ b/src/android/camera_device.cpp @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule) void CameraDevice::close() { - streams_.clear(); + MutexLocker cameraLock(cameraMutex_); + if (state_ == CameraFlushing) { + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); + camera_->release(); + return; + } + + streams_.clear(); stop(); camera_->release(); } -void CameraDevice::stop() +/* + * Flush is similar to stop() but sets the camera state to 'flushing' and wait + * until all the in-flight requests have been returned before setting the + * camera state to stopped. + * + * Once flushing is done it unlocks concurrent calls to camera close() and + * configureStreams(). + */ +void CameraDevice::flush() { + { + MutexLocker cameraLock(cameraMutex_); + + if (state_ != CameraRunning) + return; + + worker_.stop(); + camera_->stop(); + state_ = CameraFlushing; + } + + /* + * Now wait for all the in-flight requests to be completed before + * continuing. Stopping the Camera guarantees that all in-flight + * requests are completed in error state. + */ + { + MutexLocker requestsLock(requestsMutex_); + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); }); + } + + /* + * Set state to stopped and unlock close() or configureStreams() that + * might be waiting for flush to be completed. + */ MutexLocker cameraLock(cameraMutex_); + state_ = CameraStopped; + flushed_.notify_one(); +} + +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */ +void CameraDevice::stop() +{ + ASSERT(state_ != CameraFlushing); + if (state_ == CameraStopped) return; @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const */ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list) { - /* Before any configuration attempt, stop the camera. */ - stop(); + { + /* + * If a flush is in progress, wait for it to complete and to + * stop the camera, otherwise before any new configuration + * attempt we have to stop the camera explictely. + */ + MutexLocker cameraLock(cameraMutex_); + if (state_ == CameraFlushing) + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; }); + else + stop(); + } if (stream_list->num_streams == 0) { LOG(HAL, Error) << "No streams in configuration"; @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques if (ret) return ret; + /* + * Just before queuing the request, make sure flush() has not + * been called after this function has been executed. In that + * case, immediately return the request with errors. + */ + MutexLocker cameraLock(cameraMutex_); + if (state_ == CameraFlushing || state_ == CameraStopped) { + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) { + buffer.status = CAMERA3_BUFFER_STATUS_ERROR; + buffer.release_fence = buffer.acquire_fence; + } + + notifyError(descriptor.frameNumber_, + descriptor.buffers_[0].stream, + CAMERA3_MSG_ERROR_REQUEST); + + return 0; + } + worker_.queueRequest(descriptor.request_.get()); { @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request) return; } + /* Release flush if all the pending requests have been completed. */ + if (descriptors_.empty()) + flushing_.notify_one(); + node = descriptors_.extract(it); } Camera3RequestDescriptor &descriptor = node.mapped(); diff --git a/src/android/camera_device.h b/src/android/camera_device.h index 7cf8e8370387..e1b3bf7d30f2 100644 --- a/src/android/camera_device.h +++ b/src/android/camera_device.h @@ -7,6 +7,7 @@ #ifndef __ANDROID_CAMERA_DEVICE_H__ #define __ANDROID_CAMERA_DEVICE_H__ +#include <condition_variable> #include <map> #include <memory> #include <mutex> @@ -42,6 +43,7 @@ public: int open(const hw_module_t *hardwareModule); void close(); + void flush(); unsigned int id() const { return id_; } camera3_device_t *camera3Device() { return &camera3Device_; } @@ -92,6 +94,7 @@ private: enum State { CameraStopped, CameraRunning, + CameraFlushing, }; void stop(); @@ -120,8 +123,9 @@ private: CameraWorker worker_; - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */ + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */ State state_; + std::condition_variable flushed_; std::shared_ptr<libcamera::Camera> camera_; std::unique_ptr<libcamera::CameraConfiguration> config_; @@ -134,8 +138,9 @@ private: std::map<int, libcamera::PixelFormat> formatsMap_; std::vector<CameraStream> streams_; - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */ + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */ std::map<uint64_t, Camera3RequestDescriptor> descriptors_; + std::condition_variable flushing_; std::string maker_; std::string model_; diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp index 696e80436821..8a3cfa175ff5 100644 --- a/src/android/camera_ops.cpp +++ b/src/android/camera_ops.cpp @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev, { } -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev) +static int hal_dev_flush(const struct camera3_device *dev) { + if (!dev) + return -EINVAL; + + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv); + camera->flush(); + return 0; }
Implement the flush() camera operation in the CameraDevice class and make it available to the camera framework by implementing the operation wrapper in camera_ops.cpp. The flush() implementation stops the Camera and the worker thread and waits for all in-flight requests to be returned. Stopping the Camera forces all Requests already queued to be returned immediately in error state. As flush() has to wait until all of them have been returned, make it wait on a newly introduced condition variable which is notified by the request completion handler when the queue of pending requests has been exhausted. As flush() can race with processCaptureRequest() protect the requests queueing by introducing a new CameraState::CameraFlushing state that processCaptureRequest() inspects before queuing the Request to the Camera. If flush() has been called while processCaptureRequest() was executing, return the current Request immediately in error state. Protect potentially concurrent calls to close() and configureStreams() by inspecting the CameraState, and force a wait for any flush() call to complete before proceeding. Signed-off-by: Jacopo Mondi <jacopo@jmondi.org> --- src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++-- src/android/camera_device.h | 9 +++- src/android/camera_ops.cpp | 8 +++- 3 files changed, 100 insertions(+), 7 deletions(-)