[{"id":17113,"web_url":"https://patchwork.libcamera.org/comment/17113/","msgid":"<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>","date":"2021-05-22T09:55:36","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":5,"url":"https://patchwork.libcamera.org/api/people/5/","name":"Niklas Söderlund","email":"niklas.soderlund@ragnatech.se"},"content":"Hi Jacopo,\n\nThanks for your work.\n\nOn 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> Implement the flush() camera operation in the CameraDevice class\n> and make it available to the camera framework by implementing the\n> operation wrapper in camera_ops.cpp.\n> \n> The flush() implementation stops the Camera and the worker thread and\n> waits for all in-flight requests to be returned. Stopping the Camera\n> forces all Requests already queued to be returned immediately in error\n> state. As flush() has to wait until all of them have been returned, make it\n> wait on a newly introduced condition variable which is notified by the\n> request completion handler when the queue of pending requests has been\n> exhausted.\n> \n> As flush() can race with processCaptureRequest() protect the requests\n> queueing by introducing a new CameraState::CameraFlushing state that\n> processCaptureRequest() inspects before queuing the Request to the\n> Camera. If flush() has been called while processCaptureRequest() was\n> executing, return the current Request immediately in error state.\n> \n> Protect potentially concurrent calls to close() and configureStreams()\n> by inspecting the CameraState, and force a wait for any flush() call\n> to complete before proceeding.\n> \n> Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> ---\n>  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n>  src/android/camera_device.h   |  9 +++-\n>  src/android/camera_ops.cpp    |  8 +++-\n>  3 files changed, 100 insertions(+), 7 deletions(-)\n> \n> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> index 3fce14035718..899afaa49439 100644\n> --- a/src/android/camera_device.cpp\n> +++ b/src/android/camera_device.cpp\n> @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n>  \n>  void CameraDevice::close()\n>  {\n> -\tstreams_.clear();\n> +\tMutexLocker cameraLock(cameraMutex_);\n> +\tif (state_ == CameraFlushing) {\n> +\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> +\t\tcamera_->release();\n>  \n> +\t\treturn;\n> +\t}\n> +\n> +\tstreams_.clear();\n>  \tstop();\n>  \n>  \tcamera_->release();\n>  }\n>  \n> -void CameraDevice::stop()\n> +/*\n> + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n> + * until all the in-flight requests have been returned before setting the\n> + * camera state to stopped.\n> + *\n> + * Once flushing is done it unlocks concurrent calls to camera close() and\n> + * configureStreams().\n> + */\n> +void CameraDevice::flush()\n>  {\n> +\t{\n> +\t\tMutexLocker cameraLock(cameraMutex_);\n> +\n> +\t\tif (state_ != CameraRunning)\n> +\t\t\treturn;\n> +\n> +\t\tworker_.stop();\n> +\t\tcamera_->stop();\n> +\t\tstate_ = CameraFlushing;\n> +\t}\n> +\n> +\t/*\n> +\t * Now wait for all the in-flight requests to be completed before\n> +\t * continuing. Stopping the Camera guarantees that all in-flight\n> +\t * requests are completed in error state.\n> +\t */\n> +\t{\n> +\t\tMutexLocker requestsLock(requestsMutex_);\n> +\t\tflushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> +\t}\n\nI'm still uneasy about releasing the cameraMutex_ for this section. In \npatch 6/8 you add it to protect the state_ variable but here it's \nignored. I see the ASSERT() added to stop() but the patter of taking the \nlock checking state_, releasing the lock and do some work, retake the \nlock and update state_ feels like a bad idea. Maybe I'm missing \nsomething and this is not a real problem, if so maybe we can capture \nthat in the comment here?\n\n> +\n> +\t/*\n> +\t * Set state to stopped and unlock close() or configureStreams() that\n> +\t * might be waiting for flush to be completed.\n> +\t */\n>  \tMutexLocker cameraLock(cameraMutex_);\n> +\tstate_ = CameraStopped;\n> +\tflushed_.notify_one();\n> +}\n> +\n> +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> +void CameraDevice::stop()\n> +{\n> +\tASSERT(state_ != CameraFlushing);\n> +\n>  \tif (state_ == CameraStopped)\n>  \t\treturn;\n>  \n> @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n>   */\n>  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n>  {\n> -\t/* Before any configuration attempt, stop the camera. */\n> -\tstop();\n> +\t{\n> +\t\t/*\n> +\t\t * If a flush is in progress, wait for it to complete and to\n> +\t\t * stop the camera, otherwise before any new configuration\n> +\t\t * attempt we have to stop the camera explictely.\n> +\t\t */\n> +\t\tMutexLocker cameraLock(cameraMutex_);\n> +\t\tif (state_ == CameraFlushing)\n> +\t\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> +\t\telse\n> +\t\t\tstop();\n> +\t}\n>  \n>  \tif (stream_list->num_streams == 0) {\n>  \t\tLOG(HAL, Error) << \"No streams in configuration\";\n> @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n>  \tif (ret)\n>  \t\treturn ret;\n>  \n> +\t/*\n> +\t * Just before queuing the request, make sure flush() has not\n> +\t * been called after this function has been executed. In that\n> +\t * case, immediately return the request with errors.\n> +\t */\n> +\tMutexLocker cameraLock(cameraMutex_);\n> +\tif (state_ == CameraFlushing || state_ == CameraStopped) {\n> +\t\tfor (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> +\t\t\tbuffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> +\t\t\tbuffer.release_fence = buffer.acquire_fence;\n> +\t\t}\n> +\n> +\t\tnotifyError(descriptor.frameNumber_,\n> +\t\t\t    descriptor.buffers_[0].stream,\n> +\t\t\t    CAMERA3_MSG_ERROR_REQUEST);\n> +\n> +\t\treturn 0;\n> +\t}\n> +\n>  \tworker_.queueRequest(descriptor.request_.get());\n>  \n>  \t{\n> @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n>  \t\t\treturn;\n>  \t\t}\n>  \n> +\t\t/* Release flush if all the pending requests have been completed. */\n> +\t\tif (descriptors_.empty())\n> +\t\t\tflushing_.notify_one();\n> +\n>  \t\tnode = descriptors_.extract(it);\n>  \t}\n>  \tCamera3RequestDescriptor &descriptor = node.mapped();\n> diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> index 7cf8e8370387..e1b3bf7d30f2 100644\n> --- a/src/android/camera_device.h\n> +++ b/src/android/camera_device.h\n> @@ -7,6 +7,7 @@\n>  #ifndef __ANDROID_CAMERA_DEVICE_H__\n>  #define __ANDROID_CAMERA_DEVICE_H__\n>  \n> +#include <condition_variable>\n>  #include <map>\n>  #include <memory>\n>  #include <mutex>\n> @@ -42,6 +43,7 @@ public:\n>  \n>  \tint open(const hw_module_t *hardwareModule);\n>  \tvoid close();\n> +\tvoid flush();\n>  \n>  \tunsigned int id() const { return id_; }\n>  \tcamera3_device_t *camera3Device() { return &camera3Device_; }\n> @@ -92,6 +94,7 @@ private:\n>  \tenum State {\n>  \t\tCameraStopped,\n>  \t\tCameraRunning,\n> +\t\tCameraFlushing,\n>  \t};\n>  \n>  \tvoid stop();\n> @@ -120,8 +123,9 @@ private:\n>  \n>  \tCameraWorker worker_;\n>  \n> -\tlibcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> +\tlibcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n>  \tState state_;\n> +\tstd::condition_variable flushed_;\n>  \n>  \tstd::shared_ptr<libcamera::Camera> camera_;\n>  \tstd::unique_ptr<libcamera::CameraConfiguration> config_;\n> @@ -134,8 +138,9 @@ private:\n>  \tstd::map<int, libcamera::PixelFormat> formatsMap_;\n>  \tstd::vector<CameraStream> streams_;\n>  \n> -\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> +\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n>  \tstd::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> +\tstd::condition_variable flushing_;\n>  \n>  \tstd::string maker_;\n>  \tstd::string model_;\n> diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> index 696e80436821..8a3cfa175ff5 100644\n> --- a/src/android/camera_ops.cpp\n> +++ b/src/android/camera_ops.cpp\n> @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n>  {\n>  }\n>  \n> -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> +static int hal_dev_flush(const struct camera3_device *dev)\n>  {\n> +\tif (!dev)\n> +\t\treturn -EINVAL;\n> +\n> +\tCameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> +\tcamera->flush();\n> +\n>  \treturn 0;\n>  }\n>  \n> -- \n> 2.31.1\n>","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id AB86EC3201\n\tfor <parsemail@patchwork.libcamera.org>;\n\tSat, 22 May 2021 09:55:40 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 1D8A06891F;\n\tSat, 22 May 2021 11:55:40 +0200 (CEST)","from mail-lf1-x131.google.com (mail-lf1-x131.google.com\n\t[IPv6:2a00:1450:4864:20::131])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 703DD602B0\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tSat, 22 May 2021 11:55:38 +0200 (CEST)","by mail-lf1-x131.google.com with SMTP id r5so33292053lfr.5\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tSat, 22 May 2021 02:55:38 -0700 (PDT)","from localhost (h-155-4-209-203.A463.priv.bahnhof.se.\n\t[155.4.209.203]) by smtp.gmail.com with ESMTPSA id\n\ts1sm944041lfd.270.2021.05.22.02.55.37\n\t(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n\tSat, 22 May 2021 02:55:37 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com; dkim=pass (2048-bit key;\n\tunprotected) header.d=ragnatech-se.20150623.gappssmtp.com\n\theader.i=@ragnatech-se.20150623.gappssmtp.com\n\theader.b=\"qcZ6k17z\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=ragnatech-se.20150623.gappssmtp.com; s=20150623;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:content-transfer-encoding:in-reply-to;\n\tbh=2a2iDOGgWgTi4Q/3TYMjYLfwLA+ElPOKH/TbF8C3hPM=;\n\tb=qcZ6k17z13ntQ9TKbz7GkgmPAh1LkiCo45IkTlJ9slKBfu/T5IIRCX05BpZ553BQ3d\n\t4sD8RYQRc3vd4ued88MmJ0IqmYHAqFfGiTp/fG/5ZLo3ReRRA7Ck2k57sr1Xlcw0c1SC\n\tTIRRzXdkSAKRMQ+R7dyyn7VYikfBE4z3lNyVFZjZHbqiA1kHAygAMBVrWfKBHG8BXV2/\n\t6RwiMpbM8hc91uDCsGOi4Qz2l3Pzu4EsYSUvF1T8csT9FClT5kbKyxxIEI0q/C43UDNQ\n\t8ts40Gmxdyqe6Td8v9Y1shuutgDQ9vNwwN6ROr788A7I31CgUP1/g3hOPWtMOm3yT0SA\n\tRNbA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:content-transfer-encoding\n\t:in-reply-to;\n\tbh=2a2iDOGgWgTi4Q/3TYMjYLfwLA+ElPOKH/TbF8C3hPM=;\n\tb=URqg6TRReTsbAnYJFP/RnormOnyYjbD3QUHd6nCY7AMYj8x1WbC4Qg84CMf+Vi2VTy\n\tsZHCvjcXr8bv8tTA/C8/VBRWC0QA0U/wXdW6/J1205hs7N6ae3xGA6wXZPfzhJxWtHa2\n\tP4sgWLtKRQseZHCtd95dVPUWvS6tyB1m8raYKfzrRS85s/wr3s5EdECSL9FPMfXOGRbi\n\tS3qaz2yJFY3EE7kBaTprgUlfKis4uX/PwQuyE0iJOmMCZSwTOK1zicotsdV+DPomj6wV\n\tzKPRlQPaGlq+WKiyPQB4k62ld0lU4lb4A3BPjcjujjkPMz8CBN3PaUE3RhY/lnfrxSui\n\ttA1g==","X-Gm-Message-State":"AOAM531mrgl7s4+iHC3VsN+WlrBa979f1YwIhFOlPMw+VL+WXYhk9FdZ\n\tEZY9aF5zJuDAxWkywTc0jJBqFg==","X-Google-Smtp-Source":"ABdhPJyw+oChfdvoDgv8aRGYKe5V8r3b8SkvApLswWBvys8HM3BH9FrClPzgdmYBz40mUH9aZmLb+A==","X-Received":"by 2002:ac2:5c03:: with SMTP id r3mr4886588lfp.504.1621677337778;\n\tSat, 22 May 2021 02:55:37 -0700 (PDT)","Date":"Sat, 22 May 2021 11:55:36 +0200","From":"Niklas =?iso-8859-1?q?S=F6derlund?= <niklas.soderlund@ragnatech.se>","To":"Jacopo Mondi <jacopo@jmondi.org>","Message-ID":"<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=iso-8859-1","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<20210521154227.60186-9-jacopo@jmondi.org>","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17131,"web_url":"https://patchwork.libcamera.org/comment/17131/","msgid":"<20210523142251.nnxgwho3quitpjfb@uno.localdomain>","date":"2021-05-23T14:22:51","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":3,"url":"https://patchwork.libcamera.org/api/people/3/","name":"Jacopo Mondi","email":"jacopo@jmondi.org"},"content":"Hi Niklas,\n\nOn Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> Hi Jacopo,\n>\n> Thanks for your work.\n>\n> On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > Implement the flush() camera operation in the CameraDevice class\n> > and make it available to the camera framework by implementing the\n> > operation wrapper in camera_ops.cpp.\n> >\n> > The flush() implementation stops the Camera and the worker thread and\n> > waits for all in-flight requests to be returned. Stopping the Camera\n> > forces all Requests already queued to be returned immediately in error\n> > state. As flush() has to wait until all of them have been returned, make it\n> > wait on a newly introduced condition variable which is notified by the\n> > request completion handler when the queue of pending requests has been\n> > exhausted.\n> >\n> > As flush() can race with processCaptureRequest() protect the requests\n> > queueing by introducing a new CameraState::CameraFlushing state that\n> > processCaptureRequest() inspects before queuing the Request to the\n> > Camera. If flush() has been called while processCaptureRequest() was\n> > executing, return the current Request immediately in error state.\n> >\n> > Protect potentially concurrent calls to close() and configureStreams()\n> > by inspecting the CameraState, and force a wait for any flush() call\n> > to complete before proceeding.\n> >\n> > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > ---\n> >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n> >  src/android/camera_device.h   |  9 +++-\n> >  src/android/camera_ops.cpp    |  8 +++-\n> >  3 files changed, 100 insertions(+), 7 deletions(-)\n> >\n> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> > index 3fce14035718..899afaa49439 100644\n> > --- a/src/android/camera_device.cpp\n> > +++ b/src/android/camera_device.cpp\n> > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n> >\n> >  void CameraDevice::close()\n> >  {\n> > -\tstreams_.clear();\n> > +\tMutexLocker cameraLock(cameraMutex_);\n> > +\tif (state_ == CameraFlushing) {\n> > +\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > +\t\tcamera_->release();\n> >\n> > +\t\treturn;\n> > +\t}\n> > +\n> > +\tstreams_.clear();\n> >  \tstop();\n> >\n> >  \tcamera_->release();\n> >  }\n> >\n> > -void CameraDevice::stop()\n> > +/*\n> > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n> > + * until all the in-flight requests have been returned before setting the\n> > + * camera state to stopped.\n> > + *\n> > + * Once flushing is done it unlocks concurrent calls to camera close() and\n> > + * configureStreams().\n> > + */\n> > +void CameraDevice::flush()\n> >  {\n> > +\t{\n> > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > +\n> > +\t\tif (state_ != CameraRunning)\n> > +\t\t\treturn;\n> > +\n> > +\t\tworker_.stop();\n> > +\t\tcamera_->stop();\n> > +\t\tstate_ = CameraFlushing;\n> > +\t}\n> > +\n> > +\t/*\n> > +\t * Now wait for all the in-flight requests to be completed before\n> > +\t * continuing. Stopping the Camera guarantees that all in-flight\n> > +\t * requests are completed in error state.\n> > +\t */\n> > +\t{\n> > +\t\tMutexLocker requestsLock(requestsMutex_);\n> > +\t\tflushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> > +\t}\n>\n> I'm still uneasy about releasing the cameraMutex_ for this section. In\n> patch 6/8 you add it to protect the state_ variable but here it's\n\nI'm not changing state_ without the mutex acquired, am I ?\n\n> ignored. I see the ASSERT() added to stop() but the patter of taking the\n> lock checking state_, releasing the lock and do some work, retake the\n> lock and update state_ feels like a bad idea. Maybe I'm missing\n\nHow so, apart from the fact it feels a bit unusual, I concur ?\n\nIf I keep the held the mutex for the whole duration of flush no other\nconcurrent method can proceed until all the queued requests have not\nbeen completed. While flush waits for the flushing_ condition to be\nsignaled, processCaptureRequest() can proceed and immediately return\nthe newly queued requests in error state by detecting state_ ==\nCameraFlushing which signals that flush in is progress.\nOtherwise it would have had to wait for flush to end. But then we're back\nto a situation where we could serialize all calls and that's it, we\nwould be done with a single mutex to be held for the whole duration of\nall operations.\n\nIf it only was for close() or configureStreams() we could have locked\nfor the whole duration of flush(), as they anyway wait for flush to\ncomplete before proceeding (by waiting on the flushed_ condition here\nbelow signaled).\n\n> something and this is not a real problem, if so maybe we can capture\n> that in the comment here?\n>\n> > +\n> > +\t/*\n> > +\t * Set state to stopped and unlock close() or configureStreams() that\n> > +\t * might be waiting for flush to be completed.\n> > +\t */\n> >  \tMutexLocker cameraLock(cameraMutex_);\n> > +\tstate_ = CameraStopped;\n> > +\tflushed_.notify_one();\n> > +}\n> > +\n> > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> > +void CameraDevice::stop()\n> > +{\n> > +\tASSERT(state_ != CameraFlushing);\n> > +\n> >  \tif (state_ == CameraStopped)\n> >  \t\treturn;\n> >\n> > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n> >   */\n> >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> >  {\n> > -\t/* Before any configuration attempt, stop the camera. */\n> > -\tstop();\n> > +\t{\n> > +\t\t/*\n> > +\t\t * If a flush is in progress, wait for it to complete and to\n> > +\t\t * stop the camera, otherwise before any new configuration\n> > +\t\t * attempt we have to stop the camera explictely.\n> > +\t\t */\n> > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > +\t\tif (state_ == CameraFlushing)\n> > +\t\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > +\t\telse\n> > +\t\t\tstop();\n> > +\t}\n> >\n> >  \tif (stream_list->num_streams == 0) {\n> >  \t\tLOG(HAL, Error) << \"No streams in configuration\";\n> > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> >  \tif (ret)\n> >  \t\treturn ret;\n> >\n> > +\t/*\n> > +\t * Just before queuing the request, make sure flush() has not\n> > +\t * been called after this function has been executed. In that\n> > +\t * case, immediately return the request with errors.\n> > +\t */\n> > +\tMutexLocker cameraLock(cameraMutex_);\n> > +\tif (state_ == CameraFlushing || state_ == CameraStopped) {\n> > +\t\tfor (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> > +\t\t\tbuffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> > +\t\t\tbuffer.release_fence = buffer.acquire_fence;\n> > +\t\t}\n> > +\n> > +\t\tnotifyError(descriptor.frameNumber_,\n> > +\t\t\t    descriptor.buffers_[0].stream,\n> > +\t\t\t    CAMERA3_MSG_ERROR_REQUEST);\n> > +\n> > +\t\treturn 0;\n> > +\t}\n> > +\n> >  \tworker_.queueRequest(descriptor.request_.get());\n> >\n> >  \t{\n> > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n> >  \t\t\treturn;\n> >  \t\t}\n> >\n> > +\t\t/* Release flush if all the pending requests have been completed. */\n> > +\t\tif (descriptors_.empty())\n> > +\t\t\tflushing_.notify_one();\n> > +\n> >  \t\tnode = descriptors_.extract(it);\n> >  \t}\n> >  \tCamera3RequestDescriptor &descriptor = node.mapped();\n> > diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > --- a/src/android/camera_device.h\n> > +++ b/src/android/camera_device.h\n> > @@ -7,6 +7,7 @@\n> >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> >  #define __ANDROID_CAMERA_DEVICE_H__\n> >\n> > +#include <condition_variable>\n> >  #include <map>\n> >  #include <memory>\n> >  #include <mutex>\n> > @@ -42,6 +43,7 @@ public:\n> >\n> >  \tint open(const hw_module_t *hardwareModule);\n> >  \tvoid close();\n> > +\tvoid flush();\n> >\n> >  \tunsigned int id() const { return id_; }\n> >  \tcamera3_device_t *camera3Device() { return &camera3Device_; }\n> > @@ -92,6 +94,7 @@ private:\n> >  \tenum State {\n> >  \t\tCameraStopped,\n> >  \t\tCameraRunning,\n> > +\t\tCameraFlushing,\n> >  \t};\n> >\n> >  \tvoid stop();\n> > @@ -120,8 +123,9 @@ private:\n> >\n> >  \tCameraWorker worker_;\n> >\n> > -\tlibcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> > +\tlibcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n> >  \tState state_;\n> > +\tstd::condition_variable flushed_;\n> >\n> >  \tstd::shared_ptr<libcamera::Camera> camera_;\n> >  \tstd::unique_ptr<libcamera::CameraConfiguration> config_;\n> > @@ -134,8 +138,9 @@ private:\n> >  \tstd::map<int, libcamera::PixelFormat> formatsMap_;\n> >  \tstd::vector<CameraStream> streams_;\n> >\n> > -\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> > +\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n> >  \tstd::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > +\tstd::condition_variable flushing_;\n> >\n> >  \tstd::string maker_;\n> >  \tstd::string model_;\n> > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> > index 696e80436821..8a3cfa175ff5 100644\n> > --- a/src/android/camera_ops.cpp\n> > +++ b/src/android/camera_ops.cpp\n> > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n> >  {\n> >  }\n> >\n> > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> > +static int hal_dev_flush(const struct camera3_device *dev)\n> >  {\n> > +\tif (!dev)\n> > +\t\treturn -EINVAL;\n> > +\n> > +\tCameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> > +\tcamera->flush();\n> > +\n> >  \treturn 0;\n> >  }\n> >\n> > --\n> > 2.31.1\n> >\n>\n> --\n> Regards,\n> Niklas Söderlund","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 4D98FC3201\n\tfor <parsemail@patchwork.libcamera.org>;\n\tSun, 23 May 2021 14:22:09 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 8F15F6891B;\n\tSun, 23 May 2021 16:22:08 +0200 (CEST)","from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net\n\t[217.70.183.194])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 224FF601A9\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tSun, 23 May 2021 16:22:07 +0200 (CEST)","(Authenticated sender: jacopo@jmondi.org)\n\tby relay2-d.mail.gandi.net (Postfix) with ESMTPSA id 639D640003;\n\tSun, 23 May 2021 14:22:05 +0000 (UTC)"],"Date":"Sun, 23 May 2021 16:22:51 +0200","From":"Jacopo Mondi <jacopo@jmondi.org>","To":"Niklas =?utf-8?q?S=C3=B6derlund?= <niklas.soderlund@ragnatech.se>","Message-ID":"<20210523142251.nnxgwho3quitpjfb@uno.localdomain>","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17141,"web_url":"https://patchwork.libcamera.org/comment/17141/","msgid":"<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>","date":"2021-05-23T18:50:46","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"Hi Jacopo,\n\nThank you for the patch.\n\nOn Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > Implement the flush() camera operation in the CameraDevice class\n> > > and make it available to the camera framework by implementing the\n> > > operation wrapper in camera_ops.cpp.\n> > >\n> > > The flush() implementation stops the Camera and the worker thread and\n> > > waits for all in-flight requests to be returned. Stopping the Camera\n> > > forces all Requests already queued to be returned immediately in error\n> > > state. As flush() has to wait until all of them have been returned, make it\n> > > wait on a newly introduced condition variable which is notified by the\n> > > request completion handler when the queue of pending requests has been\n> > > exhausted.\n> > >\n> > > As flush() can race with processCaptureRequest() protect the requests\n> > > queueing by introducing a new CameraState::CameraFlushing state that\n> > > processCaptureRequest() inspects before queuing the Request to the\n> > > Camera. If flush() has been called while processCaptureRequest() was\n> > > executing, return the current Request immediately in error state.\n> > >\n> > > Protect potentially concurrent calls to close() and configureStreams()\n\nCan this happen ? Quoting camera3.h,\n\n * 12. Alternatively, the framework may call camera3_device_t->common->close()\n *    to end the camera session. This may be called at any time when no other\n *    calls from the framework are active, although the call may block until all\n *    in-flight captures have completed (all results returned, all buffers\n *    filled). After the close call returns, no more calls to the\n *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n *    close() call is underway, the framework may not call any other HAL device\n *    functions.\n\nThe important part is \"when no other calss from the framework are\nactive\". I don't think we need to handle close() racing with anything\nelse than process_capture_request().\n\n> > > by inspecting the CameraState, and force a wait for any flush() call\n> > > to complete before proceeding.\n> > >\n> > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > ---\n> > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n> > >  src/android/camera_device.h   |  9 +++-\n> > >  src/android/camera_ops.cpp    |  8 +++-\n> > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > >\n> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> > > index 3fce14035718..899afaa49439 100644\n> > > --- a/src/android/camera_device.cpp\n> > > +++ b/src/android/camera_device.cpp\n> > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n> > >\n> > >  void CameraDevice::close()\n> > >  {\n> > > -\tstreams_.clear();\n> > > +\tMutexLocker cameraLock(cameraMutex_);\n\nI'd add a blank line here.\n\n> > > +\tif (state_ == CameraFlushing) {\n\nAs mentioned above, I don't think you need to protect against close()\nand flush() racing each other.\n\n> > > +\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > +\t\tcamera_->release();\n> > >\n> > > +\t\treturn;\n> > > +\t}\n> > > +\n> > > +\tstreams_.clear();\n> > >  \tstop();\n> > >\n> > >  \tcamera_->release();\n> > >  }\n> > >\n> > > -void CameraDevice::stop()\n> > > +/*\n> > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n\ns/wait/waits/\n\n> > > + * until all the in-flight requests have been returned before setting the\n> > > + * camera state to stopped.\n> > > + *\n> > > + * Once flushing is done it unlocks concurrent calls to camera close() and\n> > > + * configureStreams().\n> > > + */\n> > > +void CameraDevice::flush()\n> > >  {\n> > > +\t{\n> > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > +\n> > > +\t\tif (state_ != CameraRunning)\n> > > +\t\t\treturn;\n> > > +\n> > > +\t\tworker_.stop();\n> > > +\t\tcamera_->stop();\n> > > +\t\tstate_ = CameraFlushing;\n> > > +\t}\n> > > +\n> > > +\t/*\n> > > +\t * Now wait for all the in-flight requests to be completed before\n> > > +\t * continuing. Stopping the Camera guarantees that all in-flight\n> > > +\t * requests are completed in error state.\n\nDo we need to wait ? Camera::stop() guarantees that all requests\ncomplete synchronously with the stop() call.\n\nPartly answering myself here, we'll have to wait for post-processing\ntasks to complete once we'll process them in a separate thread, but that\nwill likely be handled by Thread::wait(). I don't think you need a\ncondition variable here. I'm I'm not mistaken, this should simplify the\nimplementation.\n\n> > > +\t */\n> > > +\t{\n> > > +\t\tMutexLocker requestsLock(requestsMutex_);\n> > > +\t\tflushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> > > +\t}\n> >\n> > I'm still uneasy about releasing the cameraMutex_ for this section. In\n> > patch 6/8 you add it to protect the state_ variable but here it's\n> \n> I'm not changing state_ without the mutex acquired, am I ?\n> \n> > ignored. I see the ASSERT() added to stop() but the patter of taking the\n> > lock checking state_, releasing the lock and do some work, retake the\n> > lock and update state_ feels like a bad idea. Maybe I'm missing\n> \n> How so, apart from the fact it feels a bit unusual, I concur ?\n> \n> If I keep the held the mutex for the whole duration of flush no other\n> concurrent method can proceed until all the queued requests have not\n> been completed. While flush waits for the flushing_ condition to be\n> signaled, processCaptureRequest() can proceed and immediately return\n> the newly queued requests in error state by detecting state_ ==\n> CameraFlushing which signals that flush in is progress.\n> Otherwise it would have had to wait for flush to end. But then we're back\n> to a situation where we could serialize all calls and that's it, we\n> would be done with a single mutex to be held for the whole duration of\n> all operations.\n> \n> If it only was for close() or configureStreams() we could have locked\n> for the whole duration of flush(), as they anyway wait for flush to\n> complete before proceeding (by waiting on the flushed_ condition here\n> below signaled).\n> \n> > something and this is not a real problem, if so maybe we can capture\n> > that in the comment here?\n> >\n> > > +\n> > > +\t/*\n> > > +\t * Set state to stopped and unlock close() or configureStreams() that\n> > > +\t * might be waiting for flush to be completed.\n> > > +\t */\n> > >  \tMutexLocker cameraLock(cameraMutex_);\n> > > +\tstate_ = CameraStopped;\n> > > +\tflushed_.notify_one();\n\nYou should drop the lock before calling notify_one(). Otherwise you'll\nwake up the task waiting on flushed_, which will try to lock\ncameraMutex_, which will block immediately. The scheduler will have to\nreschedule this task for the function to return and the lock to be\nreleased before the waiter can proceed. That works, but isn't very\nefficient.\n\n\t{\n\t\tMutexLocker cameraLock(cameraMutex_);\n\t\tstate_ = CameraStopped;\n\t}\n\n\tflushed_.notify_one();\n\n> > > +}\n> > > +\n> > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> > > +void CameraDevice::stop()\n> > > +{\n> > > +\tASSERT(state_ != CameraFlushing);\n> > > +\n> > >  \tif (state_ == CameraStopped)\n> > >  \t\treturn;\n> > >\n> > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n> > >   */\n> > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> > >  {\n> > > -\t/* Before any configuration attempt, stop the camera. */\n> > > -\tstop();\n> > > +\t{\n> > > +\t\t/*\n> > > +\t\t * If a flush is in progress, wait for it to complete and to\n> > > +\t\t * stop the camera, otherwise before any new configuration\n> > > +\t\t * attempt we have to stop the camera explictely.\n> > > +\t\t */\n\nSame here, I don't think flush() and configure_streams() can race each\nother. I believe the only possible race to be between flush() and\nprocess_capture_request().\n\n> > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > +\t\tif (state_ == CameraFlushing)\n> > > +\t\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > +\t\telse\n> > > +\t\t\tstop();\n> > > +\t}\n> > >\n> > >  \tif (stream_list->num_streams == 0) {\n> > >  \t\tLOG(HAL, Error) << \"No streams in configuration\";\n> > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > >  \tif (ret)\n> > >  \t\treturn ret;\n> > >\n> > > +\t/*\n> > > +\t * Just before queuing the request, make sure flush() has not\n> > > +\t * been called after this function has been executed. In that\n> > > +\t * case, immediately return the request with errors.\n> > > +\t */\n> > > +\tMutexLocker cameraLock(cameraMutex_);\n> > > +\tif (state_ == CameraFlushing || state_ == CameraStopped) {\n> > > +\t\tfor (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> > > +\t\t\tbuffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> > > +\t\t\tbuffer.release_fence = buffer.acquire_fence;\n> > > +\t\t}\n> > > +\n> > > +\t\tnotifyError(descriptor.frameNumber_,\n> > > +\t\t\t    descriptor.buffers_[0].stream,\n\nAs commented on a previous patch, I think you should pass nullptr for\nthe stream here.\n\n> > > +\t\t\t    CAMERA3_MSG_ERROR_REQUEST);\n> > > +\n> > > +\t\treturn 0;\n> > > +\t}\n> > > +\n> > >  \tworker_.queueRequest(descriptor.request_.get());\n> > >\n> > >  \t{\n> > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n> > >  \t\t\treturn;\n> > >  \t\t}\n> > >\n> > > +\t\t/* Release flush if all the pending requests have been completed. */\n> > > +\t\tif (descriptors_.empty())\n> > > +\t\t\tflushing_.notify_one();\n\nThis will never happen, as you can only get here if descriptors_.find()\nhas found the descriptor. Did you mean to do this after the extract()\ncall below ?\n\n> > > +\n> > >  \t\tnode = descriptors_.extract(it);\n> > >  \t}\n> > >  \tCamera3RequestDescriptor &descriptor = node.mapped();\n> > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > --- a/src/android/camera_device.h\n> > > +++ b/src/android/camera_device.h\n> > > @@ -7,6 +7,7 @@\n> > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > >\n> > > +#include <condition_variable>\n> > >  #include <map>\n> > >  #include <memory>\n> > >  #include <mutex>\n> > > @@ -42,6 +43,7 @@ public:\n> > >\n> > >  \tint open(const hw_module_t *hardwareModule);\n> > >  \tvoid close();\n> > > +\tvoid flush();\n> > >\n> > >  \tunsigned int id() const { return id_; }\n> > >  \tcamera3_device_t *camera3Device() { return &camera3Device_; }\n> > > @@ -92,6 +94,7 @@ private:\n> > >  \tenum State {\n> > >  \t\tCameraStopped,\n> > >  \t\tCameraRunning,\n> > > +\t\tCameraFlushing,\n> > >  \t};\n> > >\n> > >  \tvoid stop();\n> > > @@ -120,8 +123,9 @@ private:\n> > >\n> > >  \tCameraWorker worker_;\n> > >\n> > > -\tlibcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> > > +\tlibcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n> > >  \tState state_;\n> > > +\tstd::condition_variable flushed_;\n> > >\n> > >  \tstd::shared_ptr<libcamera::Camera> camera_;\n> > >  \tstd::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > @@ -134,8 +138,9 @@ private:\n> > >  \tstd::map<int, libcamera::PixelFormat> formatsMap_;\n> > >  \tstd::vector<CameraStream> streams_;\n> > >\n> > > -\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> > > +\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n> > >  \tstd::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > > +\tstd::condition_variable flushing_;\n> > >\n> > >  \tstd::string maker_;\n> > >  \tstd::string model_;\n> > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> > > index 696e80436821..8a3cfa175ff5 100644\n> > > --- a/src/android/camera_ops.cpp\n> > > +++ b/src/android/camera_ops.cpp\n> > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n> > >  {\n> > >  }\n> > >\n> > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > >  {\n> > > +\tif (!dev)\n> > > +\t\treturn -EINVAL;\n> > > +\n> > > +\tCameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> > > +\tcamera->flush();\n> > > +\n> > >  \treturn 0;\n> > >  }\n> > >","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 76668C3200\n\tfor <parsemail@patchwork.libcamera.org>;\n\tSun, 23 May 2021 18:50:52 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id CB9666891B;\n\tSun, 23 May 2021 20:50:51 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[213.167.242.64])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id C79ED601A9\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tSun, 23 May 2021 20:50:49 +0200 (CEST)","from pendragon.ideasonboard.com (62-78-145-57.bb.dnainternet.fi\n\t[62.78.145.57])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 48D912A8;\n\tSun, 23 May 2021 20:50:49 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"l/g5xg6Y\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1621795849;\n\tbh=Xk/XR+BStyX3qUDCjbplhjVTilmlAN9hLh8ogAMJ2v8=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=l/g5xg6Y2hWDiJ/U+t7dY7gQn/a48Z1lIYIiSI/UTFXPqdQxcvHqcDsaJ5iWRqToj\n\tIVecTOyujI7EKlinlru7j+qHBYnT+XjGmmqmhsVOyfw+mtLyF3reu19skD1mSLJnU4\n\tTjf/m7rG4IhgY/g/0EthnmgnTKN/5DfQjuPvfdw4=","Date":"Sun, 23 May 2021 21:50:46 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Jacopo Mondi <jacopo@jmondi.org>","Message-ID":"<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<20210523142251.nnxgwho3quitpjfb@uno.localdomain>","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17174,"web_url":"https://patchwork.libcamera.org/comment/17174/","msgid":"<20210524074755.krl5ykzukwvx76ca@uno.localdomain>","date":"2021-05-24T07:47:55","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":3,"url":"https://patchwork.libcamera.org/api/people/3/","name":"Jacopo Mondi","email":"jacopo@jmondi.org"},"content":"Hi Laurent,\n\nOn Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:\n> Hi Jacopo,\n>\n> Thank you for the patch.\n>\n> On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > > Implement the flush() camera operation in the CameraDevice class\n> > > > and make it available to the camera framework by implementing the\n> > > > operation wrapper in camera_ops.cpp.\n> > > >\n> > > > The flush() implementation stops the Camera and the worker thread and\n> > > > waits for all in-flight requests to be returned. Stopping the Camera\n> > > > forces all Requests already queued to be returned immediately in error\n> > > > state. As flush() has to wait until all of them have been returned, make it\n> > > > wait on a newly introduced condition variable which is notified by the\n> > > > request completion handler when the queue of pending requests has been\n> > > > exhausted.\n> > > >\n> > > > As flush() can race with processCaptureRequest() protect the requests\n> > > > queueing by introducing a new CameraState::CameraFlushing state that\n> > > > processCaptureRequest() inspects before queuing the Request to the\n> > > > Camera. If flush() has been called while processCaptureRequest() was\n> > > > executing, return the current Request immediately in error state.\n> > > >\n> > > > Protect potentially concurrent calls to close() and configureStreams()\n>\n> Can this happen ? Quoting camera3.h,\n>\n>  * 12. Alternatively, the framework may call camera3_device_t->common->close()\n>  *    to end the camera session. This may be called at any time when no other\n>  *    calls from the framework are active, although the call may block until all\n>  *    in-flight captures have completed (all results returned, all buffers\n>  *    filled). After the close call returns, no more calls to the\n>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n>  *    close() call is underway, the framework may not call any other HAL device\n>  *    functions.\n>\n> The important part is \"when no other calss from the framework are\n> active\". I don't think we need to handle close() racing with anything\n> else than process_capture_request().\n\nI've been discussing this with Hiro during v1, as initially I didn't\nconsider close() and configureStreams().\n\nhttps://patchwork.libcamera.org/patch/12248/#16884\n\nI initially only considered processCaptureRequest() as a potential\nrace, but got suggested differently by the cros camera team.\n\n\n>\n> > > > by inspecting the CameraState, and force a wait for any flush() call\n> > > > to complete before proceeding.\n> > > >\n> > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > > ---\n> > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n> > > >  src/android/camera_device.h   |  9 +++-\n> > > >  src/android/camera_ops.cpp    |  8 +++-\n> > > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > > >\n> > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> > > > index 3fce14035718..899afaa49439 100644\n> > > > --- a/src/android/camera_device.cpp\n> > > > +++ b/src/android/camera_device.cpp\n> > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n> > > >\n> > > >  void CameraDevice::close()\n> > > >  {\n> > > > -\tstreams_.clear();\n> > > > +\tMutexLocker cameraLock(cameraMutex_);\n>\n> I'd add a blank line here.\n>\n> > > > +\tif (state_ == CameraFlushing) {\n>\n> As mentioned above, I don't think you need to protect against close()\n> and flush() racing each other.\n>\n> > > > +\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > +\t\tcamera_->release();\n> > > >\n> > > > +\t\treturn;\n> > > > +\t}\n> > > > +\n> > > > +\tstreams_.clear();\n> > > >  \tstop();\n> > > >\n> > > >  \tcamera_->release();\n> > > >  }\n> > > >\n> > > > -void CameraDevice::stop()\n> > > > +/*\n> > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n>\n> s/wait/waits/\n>\n> > > > + * until all the in-flight requests have been returned before setting the\n> > > > + * camera state to stopped.\n> > > > + *\n> > > > + * Once flushing is done it unlocks concurrent calls to camera close() and\n> > > > + * configureStreams().\n> > > > + */\n> > > > +void CameraDevice::flush()\n> > > >  {\n> > > > +\t{\n> > > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > > +\n> > > > +\t\tif (state_ != CameraRunning)\n> > > > +\t\t\treturn;\n> > > > +\n> > > > +\t\tworker_.stop();\n> > > > +\t\tcamera_->stop();\n> > > > +\t\tstate_ = CameraFlushing;\n> > > > +\t}\n> > > > +\n> > > > +\t/*\n> > > > +\t * Now wait for all the in-flight requests to be completed before\n> > > > +\t * continuing. Stopping the Camera guarantees that all in-flight\n> > > > +\t * requests are completed in error state.\n>\n> Do we need to wait ? Camera::stop() guarantees that all requests\n> complete synchronously with the stop() call.\n\nI didn't get the API that way... I thought after stop we would receive\na sequence of failed requests... Actually I don't see anything that\nsuggests that in camera.cpp or pipeline_handler.cpp apart from an assertion\nin Camera::stop()\n\n>\n> Partly answering myself here, we'll have to wait for post-processing\n> tasks to complete once we'll process them in a separate thread, but that\n> will likely be handled by Thread::wait(). I don't think you need a\n> condition variable here. I'm I'm not mistaken, this should simplify the\n> implementation.\n\nIf Camera::stop() is synchronous we don't need to wait indeed\n\n>\n> > > > +\t */\n> > > > +\t{\n> > > > +\t\tMutexLocker requestsLock(requestsMutex_);\n> > > > +\t\tflushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> > > > +\t}\n> > >\n> > > I'm still uneasy about releasing the cameraMutex_ for this section. In\n> > > patch 6/8 you add it to protect the state_ variable but here it's\n> >\n> > I'm not changing state_ without the mutex acquired, am I ?\n> >\n> > > ignored. I see the ASSERT() added to stop() but the patter of taking the\n> > > lock checking state_, releasing the lock and do some work, retake the\n> > > lock and update state_ feels like a bad idea. Maybe I'm missing\n> >\n> > How so, apart from the fact it feels a bit unusual, I concur ?\n> >\n> > If I keep the held the mutex for the whole duration of flush no other\n> > concurrent method can proceed until all the queued requests have not\n> > been completed. While flush waits for the flushing_ condition to be\n> > signaled, processCaptureRequest() can proceed and immediately return\n> > the newly queued requests in error state by detecting state_ ==\n> > CameraFlushing which signals that flush in is progress.\n> > Otherwise it would have had to wait for flush to end. But then we're back\n> > to a situation where we could serialize all calls and that's it, we\n> > would be done with a single mutex to be held for the whole duration of\n> > all operations.\n> >\n> > If it only was for close() or configureStreams() we could have locked\n> > for the whole duration of flush(), as they anyway wait for flush to\n> > complete before proceeding (by waiting on the flushed_ condition here\n> > below signaled).\n> >\n> > > something and this is not a real problem, if so maybe we can capture\n> > > that in the comment here?\n> > >\n> > > > +\n> > > > +\t/*\n> > > > +\t * Set state to stopped and unlock close() or configureStreams() that\n> > > > +\t * might be waiting for flush to be completed.\n> > > > +\t */\n> > > >  \tMutexLocker cameraLock(cameraMutex_);\n> > > > +\tstate_ = CameraStopped;\n> > > > +\tflushed_.notify_one();\n>\n> You should drop the lock before calling notify_one(). Otherwise you'll\n> wake up the task waiting on flushed_, which will try to lock\n> cameraMutex_, which will block immediately. The scheduler will have to\n> reschedule this task for the function to return and the lock to be\n> released before the waiter can proceed. That works, but isn't very\n> efficient.\n\nWeird, the cpp reference shows example about notify_one where the\ncaller always has the mutex held locked, but I see your point and\nseems correct..\n\n>\n> \t{\n> \t\tMutexLocker cameraLock(cameraMutex_);\n> \t\tstate_ = CameraStopped;\n> \t}\n>\n> \tflushed_.notify_one();\n>\n\nSo I could change to this one, if I don't have to drop this part\ncompletely if we consider close() and configureStreams() not as\npossible races...\n\n> > > > +}\n> > > > +\n> > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> > > > +void CameraDevice::stop()\n> > > > +{\n> > > > +\tASSERT(state_ != CameraFlushing);\n> > > > +\n> > > >  \tif (state_ == CameraStopped)\n> > > >  \t\treturn;\n> > > >\n> > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n> > > >   */\n> > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> > > >  {\n> > > > -\t/* Before any configuration attempt, stop the camera. */\n> > > > -\tstop();\n> > > > +\t{\n> > > > +\t\t/*\n> > > > +\t\t * If a flush is in progress, wait for it to complete and to\n> > > > +\t\t * stop the camera, otherwise before any new configuration\n> > > > +\t\t * attempt we have to stop the camera explictely.\n> > > > +\t\t */\n>\n> Same here, I don't think flush() and configure_streams() can race each\n> other. I believe the only possible race to be between flush() and\n> process_capture_request().\n>\n\nDitto.\n\n> > > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > > +\t\tif (state_ == CameraFlushing)\n> > > > +\t\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > +\t\telse\n> > > > +\t\t\tstop();\n> > > > +\t}\n> > > >\n> > > >  \tif (stream_list->num_streams == 0) {\n> > > >  \t\tLOG(HAL, Error) << \"No streams in configuration\";\n> > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > > >  \tif (ret)\n> > > >  \t\treturn ret;\n> > > >\n> > > > +\t/*\n> > > > +\t * Just before queuing the request, make sure flush() has not\n> > > > +\t * been called after this function has been executed. In that\n> > > > +\t * case, immediately return the request with errors.\n> > > > +\t */\n> > > > +\tMutexLocker cameraLock(cameraMutex_);\n> > > > +\tif (state_ == CameraFlushing || state_ == CameraStopped) {\n> > > > +\t\tfor (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> > > > +\t\t\tbuffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> > > > +\t\t\tbuffer.release_fence = buffer.acquire_fence;\n> > > > +\t\t}\n> > > > +\n> > > > +\t\tnotifyError(descriptor.frameNumber_,\n> > > > +\t\t\t    descriptor.buffers_[0].stream,\n>\n> As commented on a previous patch, I think you should pass nullptr for\n> the stream here.\n>\n\nThe \"S6. Error management:\" section of the camera3.h header does not\nmention that, not the ? where does you suggestion come from ? I don't find\nany reference in the review of [1/8]\n\n\n> > > > +\t\t\t    CAMERA3_MSG_ERROR_REQUEST);\n> > > > +\n> > > > +\t\treturn 0;\n> > > > +\t}\n> > > > +\n> > > >  \tworker_.queueRequest(descriptor.request_.get());\n> > > >\n> > > >  \t{\n> > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n> > > >  \t\t\treturn;\n> > > >  \t\t}\n> > > >\n> > > > +\t\t/* Release flush if all the pending requests have been completed. */\n> > > > +\t\tif (descriptors_.empty())\n> > > > +\t\t\tflushing_.notify_one();\n>\n> This will never happen, as you can only get here if descriptors_.find()\n> has found the descriptor. Did you mean to do this after the extract()\n> call below ?\n\nUgh. This works only because Camera::stop() is synchronous then ?\n\n>\n> > > > +\n> > > >  \t\tnode = descriptors_.extract(it);\n> > > >  \t}\n> > > >  \tCamera3RequestDescriptor &descriptor = node.mapped();\n> > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> > > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > > --- a/src/android/camera_device.h\n> > > > +++ b/src/android/camera_device.h\n> > > > @@ -7,6 +7,7 @@\n> > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > > >\n> > > > +#include <condition_variable>\n> > > >  #include <map>\n> > > >  #include <memory>\n> > > >  #include <mutex>\n> > > > @@ -42,6 +43,7 @@ public:\n> > > >\n> > > >  \tint open(const hw_module_t *hardwareModule);\n> > > >  \tvoid close();\n> > > > +\tvoid flush();\n> > > >\n> > > >  \tunsigned int id() const { return id_; }\n> > > >  \tcamera3_device_t *camera3Device() { return &camera3Device_; }\n> > > > @@ -92,6 +94,7 @@ private:\n> > > >  \tenum State {\n> > > >  \t\tCameraStopped,\n> > > >  \t\tCameraRunning,\n> > > > +\t\tCameraFlushing,\n> > > >  \t};\n> > > >\n> > > >  \tvoid stop();\n> > > > @@ -120,8 +123,9 @@ private:\n> > > >\n> > > >  \tCameraWorker worker_;\n> > > >\n> > > > -\tlibcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> > > > +\tlibcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n> > > >  \tState state_;\n> > > > +\tstd::condition_variable flushed_;\n> > > >\n> > > >  \tstd::shared_ptr<libcamera::Camera> camera_;\n> > > >  \tstd::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > > @@ -134,8 +138,9 @@ private:\n> > > >  \tstd::map<int, libcamera::PixelFormat> formatsMap_;\n> > > >  \tstd::vector<CameraStream> streams_;\n> > > >\n> > > > -\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> > > > +\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n> > > >  \tstd::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > > > +\tstd::condition_variable flushing_;\n> > > >\n> > > >  \tstd::string maker_;\n> > > >  \tstd::string model_;\n> > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> > > > index 696e80436821..8a3cfa175ff5 100644\n> > > > --- a/src/android/camera_ops.cpp\n> > > > +++ b/src/android/camera_ops.cpp\n> > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n> > > >  {\n> > > >  }\n> > > >\n> > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> > > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > > >  {\n> > > > +\tif (!dev)\n> > > > +\t\treturn -EINVAL;\n> > > > +\n> > > > +\tCameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> > > > +\tcamera->flush();\n> > > > +\n> > > >  \treturn 0;\n> > > >  }\n> > > >\n>\n> --\n> Regards,\n>\n> Laurent Pinchart","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 5097BC3201\n\tfor <parsemail@patchwork.libcamera.org>;\n\tMon, 24 May 2021 07:47:12 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 9065C6891C;\n\tMon, 24 May 2021 09:47:11 +0200 (CEST)","from relay12.mail.gandi.net (relay12.mail.gandi.net\n\t[217.70.178.232])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id F0BCF68918\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tMon, 24 May 2021 09:47:09 +0200 (CEST)","(Authenticated sender: jacopo@jmondi.org)\n\tby relay12.mail.gandi.net (Postfix) with ESMTPSA id 2650C200003;\n\tMon, 24 May 2021 07:47:08 +0000 (UTC)"],"Date":"Mon, 24 May 2021 09:47:55 +0200","From":"Jacopo Mondi <jacopo@jmondi.org>","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","Message-ID":"<20210524074755.krl5ykzukwvx76ca@uno.localdomain>","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>\n\t<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17223,"web_url":"https://patchwork.libcamera.org/comment/17223/","msgid":"<CAO5uPHPOyg4W6ZRmoXeamiDQYpURUCKp7wrr1Prq7midMumYyA@mail.gmail.com>","date":"2021-05-25T02:49:17","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":63,"url":"https://patchwork.libcamera.org/api/people/63/","name":"Hirokazu Honda","email":"hiroh@chromium.org"},"content":"Hi Jacopo, thank you for the patch.\n\nOn Mon, May 24, 2021 at 4:47 PM Jacopo Mondi <jacopo@jmondi.org> wrote:\n\n> Hi Laurent,\n>\n> On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:\n> > Hi Jacopo,\n> >\n> > Thank you for the patch.\n> >\n> > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > > > Implement the flush() camera operation in the CameraDevice class\n> > > > > and make it available to the camera framework by implementing the\n> > > > > operation wrapper in camera_ops.cpp.\n> > > > >\n> > > > > The flush() implementation stops the Camera and the worker thread\n> and\n> > > > > waits for all in-flight requests to be returned. Stopping the\n> Camera\n> > > > > forces all Requests already queued to be returned immediately in\n> error\n> > > > > state. As flush() has to wait until all of them have been\n> returned, make it\n> > > > > wait on a newly introduced condition variable which is notified by\n> the\n> > > > > request completion handler when the queue of pending requests has\n> been\n> > > > > exhausted.\n> > > > >\n> > > > > As flush() can race with processCaptureRequest() protect the\n> requests\n> > > > > queueing by introducing a new CameraState::CameraFlushing state\n> that\n> > > > > processCaptureRequest() inspects before queuing the Request to the\n> > > > > Camera. If flush() has been called while processCaptureRequest()\n> was\n> > > > > executing, return the current Request immediately in error state.\n> > > > >\n> > > > > Protect potentially concurrent calls to close() and\n> configureStreams()\n> >\n> > Can this happen ? Quoting camera3.h,\n> >\n> >  * 12. Alternatively, the framework may call\n> camera3_device_t->common->close()\n> >  *    to end the camera session. This may be called at any time when no\n> other\n> >  *    calls from the framework are active, although the call may block\n> until all\n> >  *    in-flight captures have completed (all results returned, all\n> buffers\n> >  *    filled). After the close call returns, no more calls to the\n> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n> >  *    close() call is underway, the framework may not call any other HAL\n> device\n> >  *    functions.\n> >\n> > The important part is \"when no other calss from the framework are\n> > active\". I don't think we need to handle close() racing with anything\n> > else than process_capture_request().\n>\n> I've been discussing this with Hiro during v1, as initially I didn't\n> consider close() and configureStreams().\n>\n> https://patchwork.libcamera.org/patch/12248/#16884\n>\n> I initially only considered processCaptureRequest() as a potential\n> race, but got suggested differently by the cros camera team.\n>\n>\n> >\n> > > > > by inspecting the CameraState, and force a wait for any flush()\n> call\n> > > > > to complete before proceeding.\n> > > > >\n> > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > > > ---\n> > > > >  src/android/camera_device.cpp | 90\n> +++++++++++++++++++++++++++++++++--\n> > > > >  src/android/camera_device.h   |  9 +++-\n> > > > >  src/android/camera_ops.cpp    |  8 +++-\n> > > > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > > > >\n> > > > > diff --git a/src/android/camera_device.cpp\n> b/src/android/camera_device.cpp\n> > > > > index 3fce14035718..899afaa49439 100644\n> > > > > --- a/src/android/camera_device.cpp\n> > > > > +++ b/src/android/camera_device.cpp\n> > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t\n> *hardwareModule)\n> > > > >\n> > > > >  void CameraDevice::close()\n> > > > >  {\n> > > > > -       streams_.clear();\n> > > > > +       MutexLocker cameraLock(cameraMutex_);\n> >\n> > I'd add a blank line here.\n> >\n> > > > > +       if (state_ == CameraFlushing) {\n> >\n> > As mentioned above, I don't think you need to protect against close()\n> > and flush() racing each other.\n> >\n> > > > > +               flushed_.wait(cameraLock, [&] { return state_ !=\n> CameraStopped; });\n> > > > > +               camera_->release();\n> > > > >\n> > > > > +               return;\n> > > > > +       }\n> > > > > +\n> > > > > +       streams_.clear();\n> > > > >         stop();\n> > > > >\n> > > > >         camera_->release();\n> > > > >  }\n> > > > >\n> > > > > -void CameraDevice::stop()\n> > > > > +/*\n> > > > > + * Flush is similar to stop() but sets the camera state to\n> 'flushing' and wait\n> >\n> > s/wait/waits/\n> >\n> > > > > + * until all the in-flight requests have been returned before\n> setting the\n> > > > > + * camera state to stopped.\n> > > > > + *\n> > > > > + * Once flushing is done it unlocks concurrent calls to camera\n> close() and\n> > > > > + * configureStreams().\n> > > > > + */\n> > > > > +void CameraDevice::flush()\n> > > > >  {\n> > > > > +       {\n> > > > > +               MutexLocker cameraLock(cameraMutex_);\n> > > > > +\n> > > > > +               if (state_ != CameraRunning)\n> > > > > +                       return;\n> > > > > +\n> > > > > +               worker_.stop();\n> > > > > +               camera_->stop();\n> > > > > +               state_ = CameraFlushing;\n> > > > > +       }\n> > > > > +\n> > > > > +       /*\n> > > > > +        * Now wait for all the in-flight requests to be completed\n> before\n> > > > > +        * continuing. Stopping the Camera guarantees that all\n> in-flight\n> > > > > +        * requests are completed in error state.\n> >\n> > Do we need to wait ? Camera::stop() guarantees that all requests\n> > complete synchronously with the stop() call.\n>\n> I didn't get the API that way... I thought after stop we would receive\n> a sequence of failed requests... Actually I don't see anything that\n> suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion\n> in Camera::stop()\n>\n> >\n> > Partly answering myself here, we'll have to wait for post-processing\n> > tasks to complete once we'll process them in a separate thread, but that\n> > will likely be handled by Thread::wait(). I don't think you need a\n> > condition variable here. I'm I'm not mistaken, this should simplify the\n> > implementation.\n>\n> If Camera::stop() is synchronous we don't need to wait indeed\n>\n> >\n> > > > > +        */\n> > > > > +       {\n> > > > > +               MutexLocker requestsLock(requestsMutex_);\n> > > > > +               flushing_.wait(requestsLock, [&] { return\n> descriptors_.empty(); });\n> > > > > +       }\n> > > >\n> > > > I'm still uneasy about releasing the cameraMutex_ for this section.\n> In\n> > > > patch 6/8 you add it to protect the state_ variable but here it's\n> > >\n> > > I'm not changing state_ without the mutex acquired, am I ?\n> > >\n> > > > ignored. I see the ASSERT() added to stop() but the patter of taking\n> the\n> > > > lock checking state_, releasing the lock and do some work, retake the\n> > > > lock and update state_ feels like a bad idea. Maybe I'm missing\n> > >\n> > > How so, apart from the fact it feels a bit unusual, I concur ?\n> > >\n> > > If I keep the held the mutex for the whole duration of flush no other\n> > > concurrent method can proceed until all the queued requests have not\n> > > been completed. While flush waits for the flushing_ condition to be\n> > > signaled, processCaptureRequest() can proceed and immediately return\n> > > the newly queued requests in error state by detecting state_ ==\n> > > CameraFlushing which signals that flush in is progress.\n> > > Otherwise it would have had to wait for flush to end. But then we're\n> back\n> > > to a situation where we could serialize all calls and that's it, we\n> > > would be done with a single mutex to be held for the whole duration of\n> > > all operations.\n> > >\n> > > If it only was for close() or configureStreams() we could have locked\n> > > for the whole duration of flush(), as they anyway wait for flush to\n> > > complete before proceeding (by waiting on the flushed_ condition here\n> > > below signaled).\n> > >\n> > > > something and this is not a real problem, if so maybe we can capture\n> > > > that in the comment here?\n> > > >\n> > > > > +\n> > > > > +       /*\n> > > > > +        * Set state to stopped and unlock close() or\n> configureStreams() that\n> > > > > +        * might be waiting for flush to be completed.\n> > > > > +        */\n> > > > >         MutexLocker cameraLock(cameraMutex_);\n> > > > > +       state_ = CameraStopped;\n> > > > > +       flushed_.notify_one();\n> >\n> > You should drop the lock before calling notify_one(). Otherwise you'll\n> > wake up the task waiting on flushed_, which will try to lock\n> > cameraMutex_, which will block immediately. The scheduler will have to\n> > reschedule this task for the function to return and the lock to be\n> > released before the waiter can proceed. That works, but isn't very\n> > efficient.\n>\n> Weird, the cpp reference shows example about notify_one where the\n> caller always has the mutex held locked, but I see your point and\n> seems correct..\n>\n>\nThis is correct.\nhttps://en.cppreference.com/w/cpp/thread/condition_variable/notify_one\n> The notifying thread does not need to hold the lock on the same mutex as\nthe one held by the waiting thread(s);\n\nI know that, but I haven't pointed it out because it is not false.\nFrom the Laurent explanation, yeah, we should definitely avoid that. TIL.\n\n\n> >\n> >       {\n> >               MutexLocker cameraLock(cameraMutex_);\n> >               state_ = CameraStopped;\n> >       }\n> >\n> >       flushed_.notify_one();\n> >\n>\n> So I could change to this one, if I don't have to drop this part\n> completely if we consider close() and configureStreams() not as\n> possible races...\n>\n> > > > > +}\n> > > > > +\n> > > > > +/* Calls to stop() must be protected by cameraMutex_ being held\n> by the caller. */\n> > > > > +void CameraDevice::stop()\n> > > > > +{\n> > > > > +       ASSERT(state_ != CameraFlushing);\n> > > > > +\n> > > > >         if (state_ == CameraStopped)\n> > > > >                 return;\n> > > > >\n> > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int\n> format) const\n> > > > >   */\n> > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t\n> *stream_list)\n> > > > >  {\n> > > > > -       /* Before any configuration attempt, stop the camera. */\n> > > > > -       stop();\n> > > > > +       {\n> > > > > +               /*\n> > > > > +                * If a flush is in progress, wait for it to\n> complete and to\n> > > > > +                * stop the camera, otherwise before any new\n> configuration\n> > > > > +                * attempt we have to stop the camera explictely.\n> > > > > +                */\n> >\n> > Same here, I don't think flush() and configure_streams() can race each\n> > other. I believe the only possible race to be between flush() and\n> > process_capture_request().\n> >\n>\n> Ditto.\n>\n> > > > > +               MutexLocker cameraLock(cameraMutex_);\n> > > > > +               if (state_ == CameraFlushing)\n> > > > > +                       flushed_.wait(cameraLock, [&] { return\n> state_ != CameraStopped; });\n> > > > > +               else\n> > > > > +                       stop();\n> > > > > +       }\n> > > > >\n> > > > >         if (stream_list->num_streams == 0) {\n> > > > >                 LOG(HAL, Error) << \"No streams in configuration\";\n> > > > > @@ -1950,6 +2009,25 @@ int\n> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > > > >         if (ret)\n> > > > >                 return ret;\n> > > > >\n> > > > > +       /*\n> > > > > +        * Just before queuing the request, make sure flush() has\n> not\n> > > > > +        * been called after this function has been executed. In\n> that\n> > > > > +        * case, immediately return the request with errors.\n> > > > > +        */\n> > > > > +       MutexLocker cameraLock(cameraMutex_);\n> > > > > +       if (state_ == CameraFlushing || state_ == CameraStopped) {\n> > > > > +               for (camera3_stream_buffer_t &buffer :\n> descriptor.buffers_) {\n> > > > > +                       buffer.status =\n> CAMERA3_BUFFER_STATUS_ERROR;\n> > > > > +                       buffer.release_fence =\n> buffer.acquire_fence;\n> > > > > +               }\n> > > > > +\n> > > > > +               notifyError(descriptor.frameNumber_,\n> > > > > +                           descriptor.buffers_[0].stream,\n> >\n> > As commented on a previous patch, I think you should pass nullptr for\n> > the stream here.\n> >\n>\n> The \"S6. Error management:\" section of the camera3.h header does not\n> mention that, not the ? where does you suggestion come from ? I don't find\n> any reference in the review of [1/8]\n>\n>\n> > > > > +                           CAMERA3_MSG_ERROR_REQUEST);\n> > > > > +\n> > > > > +               return 0;\n> > > > > +       }\n> > > > > +\n> > > > >         worker_.queueRequest(descriptor.request_.get());\n> > > > >\n> > > > >         {\n> > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request\n> *request)\n> > > > >                         return;\n> > > > >                 }\n> > > > >\n> > > > > +               /* Release flush if all the pending requests have\n> been completed. */\n> > > > > +               if (descriptors_.empty())\n> > > > > +                       flushing_.notify_one();\n> >\n> > This will never happen, as you can only get here if descriptors_.find()\n> > has found the descriptor. Did you mean to do this after the extract()\n> > call below ?\n>\n> Ugh. This works only because Camera::stop() is synchronous then ?\n>\n>\nAh, good catch!\n\nWith this fix, the code looks good to me.\nI am happy to ongoingly join the discussion.\n\nReviewed-by: Hirokazu Honda <hiroh@chromium.org>\n\n\n> >\n> > > > > +\n> > > > >                 node = descriptors_.extract(it);\n> > > > >         }\n> > > > >         Camera3RequestDescriptor &descriptor = node.mapped();\n> > > > > diff --git a/src/android/camera_device.h\n> b/src/android/camera_device.h\n> > > > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > > > --- a/src/android/camera_device.h\n> > > > > +++ b/src/android/camera_device.h\n> > > > > @@ -7,6 +7,7 @@\n> > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > > > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > > > >\n> > > > > +#include <condition_variable>\n> > > > >  #include <map>\n> > > > >  #include <memory>\n> > > > >  #include <mutex>\n> > > > > @@ -42,6 +43,7 @@ public:\n> > > > >\n> > > > >         int open(const hw_module_t *hardwareModule);\n> > > > >         void close();\n> > > > > +       void flush();\n> > > > >\n> > > > >         unsigned int id() const { return id_; }\n> > > > >         camera3_device_t *camera3Device() { return\n> &camera3Device_; }\n> > > > > @@ -92,6 +94,7 @@ private:\n> > > > >         enum State {\n> > > > >                 CameraStopped,\n> > > > >                 CameraRunning,\n> > > > > +               CameraFlushing,\n> > > > >         };\n> > > > >\n> > > > >         void stop();\n> > > > > @@ -120,8 +123,9 @@ private:\n> > > > >\n> > > > >         CameraWorker worker_;\n> > > > >\n> > > > > -       libcamera::Mutex cameraMutex_; /* Protects access to the\n> camera state. */\n> > > > > +       libcamera::Mutex cameraMutex_; /* Protects the camera\n> state and flushed_. */\n> > > > >         State state_;\n> > > > > +       std::condition_variable flushed_;\n> > > > >\n> > > > >         std::shared_ptr<libcamera::Camera> camera_;\n> > > > >         std::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > > > @@ -134,8 +138,9 @@ private:\n> > > > >         std::map<int, libcamera::PixelFormat> formatsMap_;\n> > > > >         std::vector<CameraStream> streams_;\n> > > > >\n> > > > > -       libcamera::Mutex requestsMutex_; /* Protects descriptors_.\n> */\n> > > > > +       libcamera::Mutex requestsMutex_; /* Protects descriptors_\n> and flushing_. */\n> > > > >         std::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > > > > +       std::condition_variable flushing_;\n> > > > >\n> > > > >         std::string maker_;\n> > > > >         std::string model_;\n> > > > > diff --git a/src/android/camera_ops.cpp\n> b/src/android/camera_ops.cpp\n> > > > > index 696e80436821..8a3cfa175ff5 100644\n> > > > > --- a/src/android/camera_ops.cpp\n> > > > > +++ b/src/android/camera_ops.cpp\n> > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const\n> struct camera3_device *dev,\n> > > > >  {\n> > > > >  }\n> > > > >\n> > > > > -static int hal_dev_flush([[maybe_unused]] const struct\n> camera3_device *dev)\n> > > > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > > > >  {\n> > > > > +       if (!dev)\n> > > > > +               return -EINVAL;\n> > > > > +\n> > > > > +       CameraDevice *camera = reinterpret_cast<CameraDevice\n> *>(dev->priv);\n> > > > > +       camera->flush();\n> > > > > +\n> > > > >         return 0;\n> > > > >  }\n> > > > >\n> >\n> > --\n> > Regards,\n> >\n> > Laurent Pinchart\n>","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id A7BA1C3200\n\tfor <parsemail@patchwork.libcamera.org>;\n\tTue, 25 May 2021 02:49:30 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 165C668921;\n\tTue, 25 May 2021 04:49:30 +0200 (CEST)","from mail-ej1-x636.google.com (mail-ej1-x636.google.com\n\t[IPv6:2a00:1450:4864:20::636])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 5BCCF602AC\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tTue, 25 May 2021 04:49:28 +0200 (CEST)","by mail-ej1-x636.google.com with SMTP id b9so9649746ejc.13\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tMon, 24 May 2021 19:49:28 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"N1yfrHZI\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org;\n\ts=google; \n\th=mime-version:references:in-reply-to:from:date:message-id:subject:to\n\t:cc; bh=Zi1BXb3KiJfC4hlaRPN237CdojHm1M3p60v2cOt4IIY=;\n\tb=N1yfrHZI8tRM6w6r3qUi0uwYcdyGHB0CxvNRBQMsdT5QkXs+S58n5V2NJHesrKzuG2\n\tXe+/raVOzYadBFBs8x1GUgQF41nb2k2h7Nox4ElvEULxsb2+jotTpJ0Nt1p82TK3W53n\n\t/ADmx+BLU5sLryqhnI1WecHbHnbPyyw9SxWPQ=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:references:in-reply-to:from:date\n\t:message-id:subject:to:cc;\n\tbh=Zi1BXb3KiJfC4hlaRPN237CdojHm1M3p60v2cOt4IIY=;\n\tb=VB6fONBjvm6RYmucPTjXbwltFyfd2qifZIw+dBQR77N9yfETQMdZeIz567bC1azuvY\n\t7+54Fs0Z5kspInjKse9UJVSTTD7mxpUim1W0cl/LxzkFhRv/Pbjo2eLkZYEDXD3Co3zy\n\tfHUckkoVq4nK3iNCG48kjWfXZywh2vrCIzO24EQlHwqGjxOiYKtfJKL5XzdCaZRFYVJI\n\trzIALwqsy/CYVYNFqqHEP0cGYnGSKHViKQkfwkaTK2Pqe899SZhg/A5bvGwx8Ad8w2d2\n\tF0yhkaHme2elBthOhZljtdTsCcvZtBud46krQzGttTEpwWF06QC1C6sHLiidZRN6pjew\n\t84og==","X-Gm-Message-State":"AOAM531E9cFnMp/ZWFUqs8mswyXemDspdR1QtMHw2sxEBi2nckCgWskU\n\tPSA1elBU3E9e1hnZBYtXZoDw7+PEvXtVzISF5IWMRQ==","X-Google-Smtp-Source":"ABdhPJw1ClKpez154ZGSOqz/Wh2hL4ROomjQIiKg8MDKiZbdVRe+0CtaUoYE2QPVayBWegmHhS+rFCKOL0Z4QznU+R0=","X-Received":"by 2002:a17:907:3ea0:: with SMTP id\n\ths32mr26379683ejc.475.1621910967949; \n\tMon, 24 May 2021 19:49:27 -0700 (PDT)","MIME-Version":"1.0","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>\n\t<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>\n\t<20210524074755.krl5ykzukwvx76ca@uno.localdomain>","In-Reply-To":"<20210524074755.krl5ykzukwvx76ca@uno.localdomain>","From":"Hirokazu Honda <hiroh@chromium.org>","Date":"Tue, 25 May 2021 11:49:17 +0900","Message-ID":"<CAO5uPHPOyg4W6ZRmoXeamiDQYpURUCKp7wrr1Prq7midMumYyA@mail.gmail.com>","To":"Jacopo Mondi <jacopo@jmondi.org>","Content-Type":"multipart/alternative; boundary=\"0000000000001a2ea005c31e9320\"","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera devel <libcamera-devel@lists.libcamera.org>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17298,"web_url":"https://patchwork.libcamera.org/comment/17298/","msgid":"<YK8DawB2BxidonoT@pendragon.ideasonboard.com>","date":"2021-05-27T02:26:51","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"Hi Jacopo,\n\n(expanding the CC list to finalize the race conditions discussion)\n\nOn Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:\n> On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:\n> > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > > > Implement the flush() camera operation in the CameraDevice class\n> > > > > and make it available to the camera framework by implementing the\n> > > > > operation wrapper in camera_ops.cpp.\n> > > > >\n> > > > > The flush() implementation stops the Camera and the worker thread and\n> > > > > waits for all in-flight requests to be returned. Stopping the Camera\n> > > > > forces all Requests already queued to be returned immediately in error\n> > > > > state. As flush() has to wait until all of them have been returned, make it\n> > > > > wait on a newly introduced condition variable which is notified by the\n> > > > > request completion handler when the queue of pending requests has been\n> > > > > exhausted.\n> > > > >\n> > > > > As flush() can race with processCaptureRequest() protect the requests\n> > > > > queueing by introducing a new CameraState::CameraFlushing state that\n> > > > > processCaptureRequest() inspects before queuing the Request to the\n> > > > > Camera. If flush() has been called while processCaptureRequest() was\n> > > > > executing, return the current Request immediately in error state.\n> > > > >\n> > > > > Protect potentially concurrent calls to close() and configureStreams()\n> >\n> > Can this happen ? Quoting camera3.h,\n> >\n> >  * 12. Alternatively, the framework may call camera3_device_t->common->close()\n> >  *    to end the camera session. This may be called at any time when no other\n> >  *    calls from the framework are active, although the call may block until all\n> >  *    in-flight captures have completed (all results returned, all buffers\n> >  *    filled). After the close call returns, no more calls to the\n> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n> >  *    close() call is underway, the framework may not call any other HAL device\n> >  *    functions.\n> >\n> > The important part is \"when no other calss from the framework are\n> > active\". I don't think we need to handle close() racing with anything\n> > else than process_capture_request().\n> \n> I've been discussing this with Hiro during v1, as initially I didn't\n> consider close() and configureStreams().\n> \n> https://patchwork.libcamera.org/patch/12248/#16884\n> \n> I initially only considered processCaptureRequest() as a potential\n> race, but got suggested differently by the cros camera team.\n\nLet's try to get to the bottom of this.\n\nSection S2 (\"Startup and general expected operation sequence\") states:\n\n * 12. Alternatively, the framework may call camera3_device_t->common->close()\n *    to end the camera session. This may be called at any time when no other\n *    calls from the framework are active, although the call may block until all\n *    in-flight captures have completed (all results returned, all buffers\n *    filled). After the close call returns, no more calls to the\n *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n *    close() call is underway, the framework may not call any other HAL device\n *    functions.\n\nThere can be in-flight requests when .close() is called, but it can't be\ncalled concurrently with any other call. There's thus no race condition\nto protect against.\n\nThe .configure_streams() documentation states:\n\n     * Preconditions:\n     *\n     * The framework will only call this method when no captures are being\n     * processed. That is, all results have been returned to the framework, and\n     * all in-flight input and output buffers have been returned and their\n     * release sync fences have been signaled by the HAL. The framework will not\n     * submit new requests for capture while the configure_streams() call is\n     * underway.\n\nThis clearly forbids calling .configure_streams() and\n.process_capture_request() concurrently.\n\nThe .flush() documentation states:\n\n     * Flush all currently in-process captures and all buffers in the pipeline\n     * on the given device. The framework will use this to dump all state as\n     * quickly as possible in order to prepare for a configure_streams() call.\n\nI interpret this as at least a very strong hint that .flush() and\n.configure_streams() can't be called concurrently :-)\n\nIf anyone disagrees, I'd like compelling evidence that those races can\noccur.\n\n> > > > > by inspecting the CameraState, and force a wait for any flush() call\n> > > > > to complete before proceeding.\n> > > > >\n> > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > > > ---\n> > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n> > > > >  src/android/camera_device.h   |  9 +++-\n> > > > >  src/android/camera_ops.cpp    |  8 +++-\n> > > > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > > > >\n> > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> > > > > index 3fce14035718..899afaa49439 100644\n> > > > > --- a/src/android/camera_device.cpp\n> > > > > +++ b/src/android/camera_device.cpp\n> > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n> > > > >\n> > > > >  void CameraDevice::close()\n> > > > >  {\n> > > > > -\tstreams_.clear();\n> > > > > +\tMutexLocker cameraLock(cameraMutex_);\n> >\n> > I'd add a blank line here.\n> >\n> > > > > +\tif (state_ == CameraFlushing) {\n> >\n> > As mentioned above, I don't think you need to protect against close()\n> > and flush() racing each other.\n> >\n> > > > > +\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > > +\t\tcamera_->release();\n> > > > >\n> > > > > +\t\treturn;\n> > > > > +\t}\n> > > > > +\n> > > > > +\tstreams_.clear();\n> > > > >  \tstop();\n> > > > >\n> > > > >  \tcamera_->release();\n> > > > >  }\n> > > > >\n> > > > > -void CameraDevice::stop()\n> > > > > +/*\n> > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n> >\n> > s/wait/waits/\n> >\n> > > > > + * until all the in-flight requests have been returned before setting the\n> > > > > + * camera state to stopped.\n> > > > > + *\n> > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and\n> > > > > + * configureStreams().\n> > > > > + */\n> > > > > +void CameraDevice::flush()\n> > > > >  {\n> > > > > +\t{\n> > > > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > > > +\n> > > > > +\t\tif (state_ != CameraRunning)\n> > > > > +\t\t\treturn;\n> > > > > +\n> > > > > +\t\tworker_.stop();\n> > > > > +\t\tcamera_->stop();\n> > > > > +\t\tstate_ = CameraFlushing;\n> > > > > +\t}\n> > > > > +\n> > > > > +\t/*\n> > > > > +\t * Now wait for all the in-flight requests to be completed before\n> > > > > +\t * continuing. Stopping the Camera guarantees that all in-flight\n> > > > > +\t * requests are completed in error state.\n> >\n> > Do we need to wait ? Camera::stop() guarantees that all requests\n> > complete synchronously with the stop() call.\n> \n> I didn't get the API that way... I thought after stop we would receive\n> a sequence of failed requests... Actually I don't see anything that\n> suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion\n> in Camera::stop()\n\nThe camera::stop() documentation states\n\n * This method stops capturing and processing requests immediately. All pending\n * requests are cancelled and complete synchronously in an error state.\n\nIs this ambiguous ?\n\n> > Partly answering myself here, we'll have to wait for post-processing\n> > tasks to complete once we'll process them in a separate thread, but that\n> > will likely be handled by Thread::wait(). I don't think you need a\n> > condition variable here. I'm I'm not mistaken, this should simplify the\n> > implementation.\n> \n> If Camera::stop() is synchronous we don't need to wait indeed\n> \n> > > > > +\t */\n> > > > > +\t{\n> > > > > +\t\tMutexLocker requestsLock(requestsMutex_);\n> > > > > +\t\tflushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> > > > > +\t}\n> > > >\n> > > > I'm still uneasy about releasing the cameraMutex_ for this section. In\n> > > > patch 6/8 you add it to protect the state_ variable but here it's\n> > >\n> > > I'm not changing state_ without the mutex acquired, am I ?\n> > >\n> > > > ignored. I see the ASSERT() added to stop() but the patter of taking the\n> > > > lock checking state_, releasing the lock and do some work, retake the\n> > > > lock and update state_ feels like a bad idea. Maybe I'm missing\n> > >\n> > > How so, apart from the fact it feels a bit unusual, I concur ?\n> > >\n> > > If I keep the held the mutex for the whole duration of flush no other\n> > > concurrent method can proceed until all the queued requests have not\n> > > been completed. While flush waits for the flushing_ condition to be\n> > > signaled, processCaptureRequest() can proceed and immediately return\n> > > the newly queued requests in error state by detecting state_ ==\n> > > CameraFlushing which signals that flush in is progress.\n> > > Otherwise it would have had to wait for flush to end. But then we're back\n> > > to a situation where we could serialize all calls and that's it, we\n> > > would be done with a single mutex to be held for the whole duration of\n> > > all operations.\n> > >\n> > > If it only was for close() or configureStreams() we could have locked\n> > > for the whole duration of flush(), as they anyway wait for flush to\n> > > complete before proceeding (by waiting on the flushed_ condition here\n> > > below signaled).\n> > >\n> > > > something and this is not a real problem, if so maybe we can capture\n> > > > that in the comment here?\n> > > >\n> > > > > +\n> > > > > +\t/*\n> > > > > +\t * Set state to stopped and unlock close() or configureStreams() that\n> > > > > +\t * might be waiting for flush to be completed.\n> > > > > +\t */\n> > > > >  \tMutexLocker cameraLock(cameraMutex_);\n> > > > > +\tstate_ = CameraStopped;\n> > > > > +\tflushed_.notify_one();\n> >\n> > You should drop the lock before calling notify_one(). Otherwise you'll\n> > wake up the task waiting on flushed_, which will try to lock\n> > cameraMutex_, which will block immediately. The scheduler will have to\n> > reschedule this task for the function to return and the lock to be\n> > released before the waiter can proceed. That works, but isn't very\n> > efficient.\n> \n> Weird, the cpp reference shows example about notify_one where the\n> caller always has the mutex held locked, but I see your point and\n> seems correct..\n\nI'm looking at\nhttps://en.cppreference.com/w/cpp/thread/condition_variable and\nhttps://en.cppreference.com/w/cpp/thread/condition_variable/notify_one\nand both calls to notify_one() in the example are made without the lock\nheld, aren't they ?\n\n> >\n> > \t{\n> > \t\tMutexLocker cameraLock(cameraMutex_);\n> > \t\tstate_ = CameraStopped;\n> > \t}\n> >\n> > \tflushed_.notify_one();\n> >\n> \n> So I could change to this one, if I don't have to drop this part\n> completely if we consider close() and configureStreams() not as\n> possible races...\n> \n> > > > > +}\n> > > > > +\n> > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> > > > > +void CameraDevice::stop()\n> > > > > +{\n> > > > > +\tASSERT(state_ != CameraFlushing);\n> > > > > +\n> > > > >  \tif (state_ == CameraStopped)\n> > > > >  \t\treturn;\n> > > > >\n> > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n> > > > >   */\n> > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> > > > >  {\n> > > > > -\t/* Before any configuration attempt, stop the camera. */\n> > > > > -\tstop();\n> > > > > +\t{\n> > > > > +\t\t/*\n> > > > > +\t\t * If a flush is in progress, wait for it to complete and to\n> > > > > +\t\t * stop the camera, otherwise before any new configuration\n> > > > > +\t\t * attempt we have to stop the camera explictely.\n> > > > > +\t\t */\n> >\n> > Same here, I don't think flush() and configure_streams() can race each\n> > other. I believe the only possible race to be between flush() and\n> > process_capture_request().\n> \n> Ditto.\n> \n> > > > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > > > +\t\tif (state_ == CameraFlushing)\n> > > > > +\t\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > > +\t\telse\n> > > > > +\t\t\tstop();\n> > > > > +\t}\n> > > > >\n> > > > >  \tif (stream_list->num_streams == 0) {\n> > > > >  \t\tLOG(HAL, Error) << \"No streams in configuration\";\n> > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > > > >  \tif (ret)\n> > > > >  \t\treturn ret;\n> > > > >\n> > > > > +\t/*\n> > > > > +\t * Just before queuing the request, make sure flush() has not\n> > > > > +\t * been called after this function has been executed. In that\n> > > > > +\t * case, immediately return the request with errors.\n> > > > > +\t */\n> > > > > +\tMutexLocker cameraLock(cameraMutex_);\n> > > > > +\tif (state_ == CameraFlushing || state_ == CameraStopped) {\n> > > > > +\t\tfor (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> > > > > +\t\t\tbuffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> > > > > +\t\t\tbuffer.release_fence = buffer.acquire_fence;\n> > > > > +\t\t}\n> > > > > +\n> > > > > +\t\tnotifyError(descriptor.frameNumber_,\n> > > > > +\t\t\t    descriptor.buffers_[0].stream,\n> >\n> > As commented on a previous patch, I think you should pass nullptr for\n> > the stream here.\n> \n> The \"S6. Error management:\" section of the camera3.h header does not\n> mention that, not the ?\n\nIndeed, that section doesn't mention the camera3_error_msg::error_stream\nfield at all. The field is documented in the structure as\n\n    /**\n     * Pointer to the stream that had a failure. NULL if the stream isn't\n     * applicable to the error.\n     */\n\nThe question is thus when the stream is applicable to the error. The\ndocumentation of enum camera3_error_msg_code mentions error_stream in\nthe CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to\nthe device, the request or the result metadata, which are not specific\nto a stream.\n\n> where does you suggestion come from ? I don't find any reference in\n> the review of [1/8]\n\n([PATCH v3 1/8] android: Rework request completion notification'\n(YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)\n\n> > > > > +\t\t\t    CAMERA3_MSG_ERROR_REQUEST);\n> > > > > +\n> > > > > +\t\treturn 0;\n> > > > > +\t}\n> > > > > +\n> > > > >  \tworker_.queueRequest(descriptor.request_.get());\n> > > > >\n> > > > >  \t{\n> > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n> > > > >  \t\t\treturn;\n> > > > >  \t\t}\n> > > > >\n> > > > > +\t\t/* Release flush if all the pending requests have been completed. */\n> > > > > +\t\tif (descriptors_.empty())\n> > > > > +\t\t\tflushing_.notify_one();\n> >\n> > This will never happen, as you can only get here if descriptors_.find()\n> > has found the descriptor. Did you mean to do this after the extract()\n> > call below ?\n> \n> Ugh. This works only because Camera::stop() is synchronous then ?\n\nI believe so.\n\n> > > > > +\n> > > > >  \t\tnode = descriptors_.extract(it);\n> > > > >  \t}\n> > > > >  \tCamera3RequestDescriptor &descriptor = node.mapped();\n> > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> > > > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > > > --- a/src/android/camera_device.h\n> > > > > +++ b/src/android/camera_device.h\n> > > > > @@ -7,6 +7,7 @@\n> > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > > > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > > > >\n> > > > > +#include <condition_variable>\n> > > > >  #include <map>\n> > > > >  #include <memory>\n> > > > >  #include <mutex>\n> > > > > @@ -42,6 +43,7 @@ public:\n> > > > >\n> > > > >  \tint open(const hw_module_t *hardwareModule);\n> > > > >  \tvoid close();\n> > > > > +\tvoid flush();\n> > > > >\n> > > > >  \tunsigned int id() const { return id_; }\n> > > > >  \tcamera3_device_t *camera3Device() { return &camera3Device_; }\n> > > > > @@ -92,6 +94,7 @@ private:\n> > > > >  \tenum State {\n> > > > >  \t\tCameraStopped,\n> > > > >  \t\tCameraRunning,\n> > > > > +\t\tCameraFlushing,\n> > > > >  \t};\n> > > > >\n> > > > >  \tvoid stop();\n> > > > > @@ -120,8 +123,9 @@ private:\n> > > > >\n> > > > >  \tCameraWorker worker_;\n> > > > >\n> > > > > -\tlibcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> > > > > +\tlibcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n> > > > >  \tState state_;\n> > > > > +\tstd::condition_variable flushed_;\n> > > > >\n> > > > >  \tstd::shared_ptr<libcamera::Camera> camera_;\n> > > > >  \tstd::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > > > @@ -134,8 +138,9 @@ private:\n> > > > >  \tstd::map<int, libcamera::PixelFormat> formatsMap_;\n> > > > >  \tstd::vector<CameraStream> streams_;\n> > > > >\n> > > > > -\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> > > > > +\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n> > > > >  \tstd::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > > > > +\tstd::condition_variable flushing_;\n> > > > >\n> > > > >  \tstd::string maker_;\n> > > > >  \tstd::string model_;\n> > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> > > > > index 696e80436821..8a3cfa175ff5 100644\n> > > > > --- a/src/android/camera_ops.cpp\n> > > > > +++ b/src/android/camera_ops.cpp\n> > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n> > > > >  {\n> > > > >  }\n> > > > >\n> > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> > > > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > > > >  {\n> > > > > +\tif (!dev)\n> > > > > +\t\treturn -EINVAL;\n> > > > > +\n> > > > > +\tCameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> > > > > +\tcamera->flush();\n> > > > > +\n> > > > >  \treturn 0;\n> > > > >  }\n> > > > >","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 95A8DC3203\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 27 May 2021 02:27:00 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 084486891F;\n\tThu, 27 May 2021 04:27:00 +0200 (CEST)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 63A24602AC\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 27 May 2021 04:26:58 +0200 (CEST)","from pendragon.ideasonboard.com (62-78-145-57.bb.dnainternet.fi\n\t[62.78.145.57])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id BA343549;\n\tThu, 27 May 2021 04:26:57 +0200 (CEST)"],"Authentication-Results":"lancelot.ideasonboard.com;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"PnO9PCLG\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1622082417;\n\tbh=I4fvgIOXxFIrWQ3W0Mf0XbwzUq9uhkd2oMPNdjA7H9k=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=PnO9PCLGan9yvV0cOABefxrdIQE1CYc7NZ7EKQ2oPoR8u9qR511qF7nfHXrFbZo5M\n\tpOUj3lyyiV86LOH9SR6GP7WtdXD/n0eZ7nPPHyM8BdfqSAsJQxvsZAsE/ArfA+NWU0\n\tQLnDm8O6sPermZ5KNY1VqcynYDmib91/1V5lMcig=","Date":"Thu, 27 May 2021 05:26:51 +0300","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Jacopo Mondi <jacopo@jmondi.org>","Message-ID":"<YK8DawB2BxidonoT@pendragon.ideasonboard.com>","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>\n\t<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>\n\t<20210524074755.krl5ykzukwvx76ca@uno.localdomain>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<20210524074755.krl5ykzukwvx76ca@uno.localdomain>","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17300,"web_url":"https://patchwork.libcamera.org/comment/17300/","msgid":"<CAAFQd5Ax8eoESYS327ayeFxOsd3zy_Dun8cEf9cm9wEE2goBkw@mail.gmail.com>","date":"2021-05-27T02:49:35","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":9,"url":"https://patchwork.libcamera.org/api/people/9/","name":"Tomasz Figa","email":"tfiga@chromium.org"},"content":"Hi Laurent,\n\nOn Thu, May 27, 2021 at 11:27 AM Laurent Pinchart\n<laurent.pinchart@ideasonboard.com> wrote:\n>\n> Hi Jacopo,\n>\n> (expanding the CC list to finalize the race conditions discussion)\n>\n> On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:\n> > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:\n> > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > > > > Implement the flush() camera operation in the CameraDevice class\n> > > > > > and make it available to the camera framework by implementing the\n> > > > > > operation wrapper in camera_ops.cpp.\n> > > > > >\n> > > > > > The flush() implementation stops the Camera and the worker thread and\n> > > > > > waits for all in-flight requests to be returned. Stopping the Camera\n> > > > > > forces all Requests already queued to be returned immediately in error\n> > > > > > state. As flush() has to wait until all of them have been returned, make it\n> > > > > > wait on a newly introduced condition variable which is notified by the\n> > > > > > request completion handler when the queue of pending requests has been\n> > > > > > exhausted.\n> > > > > >\n> > > > > > As flush() can race with processCaptureRequest() protect the requests\n> > > > > > queueing by introducing a new CameraState::CameraFlushing state that\n> > > > > > processCaptureRequest() inspects before queuing the Request to the\n> > > > > > Camera. If flush() has been called while processCaptureRequest() was\n> > > > > > executing, return the current Request immediately in error state.\n> > > > > >\n> > > > > > Protect potentially concurrent calls to close() and configureStreams()\n> > >\n> > > Can this happen ? Quoting camera3.h,\n> > >\n> > >  * 12. Alternatively, the framework may call camera3_device_t->common->close()\n> > >  *    to end the camera session. This may be called at any time when no other\n> > >  *    calls from the framework are active, although the call may block until all\n> > >  *    in-flight captures have completed (all results returned, all buffers\n> > >  *    filled). After the close call returns, no more calls to the\n> > >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n> > >  *    close() call is underway, the framework may not call any other HAL device\n> > >  *    functions.\n> > >\n> > > The important part is \"when no other calss from the framework are\n> > > active\". I don't think we need to handle close() racing with anything\n> > > else than process_capture_request().\n> >\n> > I've been discussing this with Hiro during v1, as initially I didn't\n> > consider close() and configureStreams().\n> >\n> > https://patchwork.libcamera.org/patch/12248/#16884\n> >\n> > I initially only considered processCaptureRequest() as a potential\n> > race, but got suggested differently by the cros camera team.\n>\n> Let's try to get to the bottom of this.\n>\n> Section S2 (\"Startup and general expected operation sequence\") states:\n>\n>  * 12. Alternatively, the framework may call camera3_device_t->common->close()\n>  *    to end the camera session. This may be called at any time when no other\n>  *    calls from the framework are active, although the call may block until all\n>  *    in-flight captures have completed (all results returned, all buffers\n>  *    filled). After the close call returns, no more calls to the\n>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n>  *    close() call is underway, the framework may not call any other HAL device\n>  *    functions.\n>\n> There can be in-flight requests when .close() is called, but it can't be\n> called concurrently with any other call. There's thus no race condition\n> to protect against.\n\nNote that camera3.h is considered outdated, as the interface updates\nhave been reflected only in the HIDL version [1]. However, I don't see\na HIDL counterpart of the section you mentioned.\n\n[1] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/\n\n>\n> The .configure_streams() documentation states:\n>\n>      * Preconditions:\n>      *\n>      * The framework will only call this method when no captures are being\n>      * processed. That is, all results have been returned to the framework, and\n>      * all in-flight input and output buffers have been returned and their\n>      * release sync fences have been signaled by the HAL. The framework will not\n>      * submit new requests for capture while the configure_streams() call is\n>      * underway.\n>\n> This clearly forbids calling .configure_streams() and\n> .process_capture_request() concurrently.\n>\n> The .flush() documentation states:\n>\n>      * Flush all currently in-process captures and all buffers in the pipeline\n>      * on the given device. The framework will use this to dump all state as\n>      * quickly as possible in order to prepare for a configure_streams() call.\n>\n> I interpret this as at least a very strong hint that .flush() and\n> .configure_streams() can't be called concurrently :-)\n>\n> If anyone disagrees, I'd like compelling evidence that those races can\n> occur.\n\nIndeed, the newest documentation [2] seems to be stating the same.\n\n[2] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107\n\nMy reading of the documentation is the same as Laurent's. Hiro, did\nyou hear something contradictory from the Android framework team?\n\nBest regards,\nTomasz\n\n>\n> > > > > > by inspecting the CameraState, and force a wait for any flush() call\n> > > > > > to complete before proceeding.\n> > > > > >\n> > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > > > > ---\n> > > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n> > > > > >  src/android/camera_device.h   |  9 +++-\n> > > > > >  src/android/camera_ops.cpp    |  8 +++-\n> > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > > > > >\n> > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> > > > > > index 3fce14035718..899afaa49439 100644\n> > > > > > --- a/src/android/camera_device.cpp\n> > > > > > +++ b/src/android/camera_device.cpp\n> > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n> > > > > >\n> > > > > >  void CameraDevice::close()\n> > > > > >  {\n> > > > > > -     streams_.clear();\n> > > > > > +     MutexLocker cameraLock(cameraMutex_);\n> > >\n> > > I'd add a blank line here.\n> > >\n> > > > > > +     if (state_ == CameraFlushing) {\n> > >\n> > > As mentioned above, I don't think you need to protect against close()\n> > > and flush() racing each other.\n> > >\n> > > > > > +             flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > > > +             camera_->release();\n> > > > > >\n> > > > > > +             return;\n> > > > > > +     }\n> > > > > > +\n> > > > > > +     streams_.clear();\n> > > > > >       stop();\n> > > > > >\n> > > > > >       camera_->release();\n> > > > > >  }\n> > > > > >\n> > > > > > -void CameraDevice::stop()\n> > > > > > +/*\n> > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n> > >\n> > > s/wait/waits/\n> > >\n> > > > > > + * until all the in-flight requests have been returned before setting the\n> > > > > > + * camera state to stopped.\n> > > > > > + *\n> > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and\n> > > > > > + * configureStreams().\n> > > > > > + */\n> > > > > > +void CameraDevice::flush()\n> > > > > >  {\n> > > > > > +     {\n> > > > > > +             MutexLocker cameraLock(cameraMutex_);\n> > > > > > +\n> > > > > > +             if (state_ != CameraRunning)\n> > > > > > +                     return;\n> > > > > > +\n> > > > > > +             worker_.stop();\n> > > > > > +             camera_->stop();\n> > > > > > +             state_ = CameraFlushing;\n> > > > > > +     }\n> > > > > > +\n> > > > > > +     /*\n> > > > > > +      * Now wait for all the in-flight requests to be completed before\n> > > > > > +      * continuing. Stopping the Camera guarantees that all in-flight\n> > > > > > +      * requests are completed in error state.\n> > >\n> > > Do we need to wait ? Camera::stop() guarantees that all requests\n> > > complete synchronously with the stop() call.\n> >\n> > I didn't get the API that way... I thought after stop we would receive\n> > a sequence of failed requests... Actually I don't see anything that\n> > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion\n> > in Camera::stop()\n>\n> The camera::stop() documentation states\n>\n>  * This method stops capturing and processing requests immediately. All pending\n>  * requests are cancelled and complete synchronously in an error state.\n>\n> Is this ambiguous ?\n>\n> > > Partly answering myself here, we'll have to wait for post-processing\n> > > tasks to complete once we'll process them in a separate thread, but that\n> > > will likely be handled by Thread::wait(). I don't think you need a\n> > > condition variable here. I'm I'm not mistaken, this should simplify the\n> > > implementation.\n> >\n> > If Camera::stop() is synchronous we don't need to wait indeed\n> >\n> > > > > > +      */\n> > > > > > +     {\n> > > > > > +             MutexLocker requestsLock(requestsMutex_);\n> > > > > > +             flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> > > > > > +     }\n> > > > >\n> > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In\n> > > > > patch 6/8 you add it to protect the state_ variable but here it's\n> > > >\n> > > > I'm not changing state_ without the mutex acquired, am I ?\n> > > >\n> > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the\n> > > > > lock checking state_, releasing the lock and do some work, retake the\n> > > > > lock and update state_ feels like a bad idea. Maybe I'm missing\n> > > >\n> > > > How so, apart from the fact it feels a bit unusual, I concur ?\n> > > >\n> > > > If I keep the held the mutex for the whole duration of flush no other\n> > > > concurrent method can proceed until all the queued requests have not\n> > > > been completed. While flush waits for the flushing_ condition to be\n> > > > signaled, processCaptureRequest() can proceed and immediately return\n> > > > the newly queued requests in error state by detecting state_ ==\n> > > > CameraFlushing which signals that flush in is progress.\n> > > > Otherwise it would have had to wait for flush to end. But then we're back\n> > > > to a situation where we could serialize all calls and that's it, we\n> > > > would be done with a single mutex to be held for the whole duration of\n> > > > all operations.\n> > > >\n> > > > If it only was for close() or configureStreams() we could have locked\n> > > > for the whole duration of flush(), as they anyway wait for flush to\n> > > > complete before proceeding (by waiting on the flushed_ condition here\n> > > > below signaled).\n> > > >\n> > > > > something and this is not a real problem, if so maybe we can capture\n> > > > > that in the comment here?\n> > > > >\n> > > > > > +\n> > > > > > +     /*\n> > > > > > +      * Set state to stopped and unlock close() or configureStreams() that\n> > > > > > +      * might be waiting for flush to be completed.\n> > > > > > +      */\n> > > > > >       MutexLocker cameraLock(cameraMutex_);\n> > > > > > +     state_ = CameraStopped;\n> > > > > > +     flushed_.notify_one();\n> > >\n> > > You should drop the lock before calling notify_one(). Otherwise you'll\n> > > wake up the task waiting on flushed_, which will try to lock\n> > > cameraMutex_, which will block immediately. The scheduler will have to\n> > > reschedule this task for the function to return and the lock to be\n> > > released before the waiter can proceed. That works, but isn't very\n> > > efficient.\n> >\n> > Weird, the cpp reference shows example about notify_one where the\n> > caller always has the mutex held locked, but I see your point and\n> > seems correct..\n>\n> I'm looking at\n> https://en.cppreference.com/w/cpp/thread/condition_variable and\n> https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one\n> and both calls to notify_one() in the example are made without the lock\n> held, aren't they ?\n>\n> > >\n> > >     {\n> > >             MutexLocker cameraLock(cameraMutex_);\n> > >             state_ = CameraStopped;\n> > >     }\n> > >\n> > >     flushed_.notify_one();\n> > >\n> >\n> > So I could change to this one, if I don't have to drop this part\n> > completely if we consider close() and configureStreams() not as\n> > possible races...\n> >\n> > > > > > +}\n> > > > > > +\n> > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> > > > > > +void CameraDevice::stop()\n> > > > > > +{\n> > > > > > +     ASSERT(state_ != CameraFlushing);\n> > > > > > +\n> > > > > >       if (state_ == CameraStopped)\n> > > > > >               return;\n> > > > > >\n> > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n> > > > > >   */\n> > > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> > > > > >  {\n> > > > > > -     /* Before any configuration attempt, stop the camera. */\n> > > > > > -     stop();\n> > > > > > +     {\n> > > > > > +             /*\n> > > > > > +              * If a flush is in progress, wait for it to complete and to\n> > > > > > +              * stop the camera, otherwise before any new configuration\n> > > > > > +              * attempt we have to stop the camera explictely.\n> > > > > > +              */\n> > >\n> > > Same here, I don't think flush() and configure_streams() can race each\n> > > other. I believe the only possible race to be between flush() and\n> > > process_capture_request().\n> >\n> > Ditto.\n> >\n> > > > > > +             MutexLocker cameraLock(cameraMutex_);\n> > > > > > +             if (state_ == CameraFlushing)\n> > > > > > +                     flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > > > +             else\n> > > > > > +                     stop();\n> > > > > > +     }\n> > > > > >\n> > > > > >       if (stream_list->num_streams == 0) {\n> > > > > >               LOG(HAL, Error) << \"No streams in configuration\";\n> > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > > > > >       if (ret)\n> > > > > >               return ret;\n> > > > > >\n> > > > > > +     /*\n> > > > > > +      * Just before queuing the request, make sure flush() has not\n> > > > > > +      * been called after this function has been executed. In that\n> > > > > > +      * case, immediately return the request with errors.\n> > > > > > +      */\n> > > > > > +     MutexLocker cameraLock(cameraMutex_);\n> > > > > > +     if (state_ == CameraFlushing || state_ == CameraStopped) {\n> > > > > > +             for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> > > > > > +                     buffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> > > > > > +                     buffer.release_fence = buffer.acquire_fence;\n> > > > > > +             }\n> > > > > > +\n> > > > > > +             notifyError(descriptor.frameNumber_,\n> > > > > > +                         descriptor.buffers_[0].stream,\n> > >\n> > > As commented on a previous patch, I think you should pass nullptr for\n> > > the stream here.\n> >\n> > The \"S6. Error management:\" section of the camera3.h header does not\n> > mention that, not the ?\n>\n> Indeed, that section doesn't mention the camera3_error_msg::error_stream\n> field at all. The field is documented in the structure as\n>\n>     /**\n>      * Pointer to the stream that had a failure. NULL if the stream isn't\n>      * applicable to the error.\n>      */\n>\n> The question is thus when the stream is applicable to the error. The\n> documentation of enum camera3_error_msg_code mentions error_stream in\n> the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to\n> the device, the request or the result metadata, which are not specific\n> to a stream.\n>\n> > where does you suggestion come from ? I don't find any reference in\n> > the review of [1/8]\n>\n> ([PATCH v3 1/8] android: Rework request completion notification'\n> (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)\n>\n> > > > > > +                         CAMERA3_MSG_ERROR_REQUEST);\n> > > > > > +\n> > > > > > +             return 0;\n> > > > > > +     }\n> > > > > > +\n> > > > > >       worker_.queueRequest(descriptor.request_.get());\n> > > > > >\n> > > > > >       {\n> > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n> > > > > >                       return;\n> > > > > >               }\n> > > > > >\n> > > > > > +             /* Release flush if all the pending requests have been completed. */\n> > > > > > +             if (descriptors_.empty())\n> > > > > > +                     flushing_.notify_one();\n> > >\n> > > This will never happen, as you can only get here if descriptors_.find()\n> > > has found the descriptor. Did you mean to do this after the extract()\n> > > call below ?\n> >\n> > Ugh. This works only because Camera::stop() is synchronous then ?\n>\n> I believe so.\n>\n> > > > > > +\n> > > > > >               node = descriptors_.extract(it);\n> > > > > >       }\n> > > > > >       Camera3RequestDescriptor &descriptor = node.mapped();\n> > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > > > > --- a/src/android/camera_device.h\n> > > > > > +++ b/src/android/camera_device.h\n> > > > > > @@ -7,6 +7,7 @@\n> > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > > > > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > > > > >\n> > > > > > +#include <condition_variable>\n> > > > > >  #include <map>\n> > > > > >  #include <memory>\n> > > > > >  #include <mutex>\n> > > > > > @@ -42,6 +43,7 @@ public:\n> > > > > >\n> > > > > >       int open(const hw_module_t *hardwareModule);\n> > > > > >       void close();\n> > > > > > +     void flush();\n> > > > > >\n> > > > > >       unsigned int id() const { return id_; }\n> > > > > >       camera3_device_t *camera3Device() { return &camera3Device_; }\n> > > > > > @@ -92,6 +94,7 @@ private:\n> > > > > >       enum State {\n> > > > > >               CameraStopped,\n> > > > > >               CameraRunning,\n> > > > > > +             CameraFlushing,\n> > > > > >       };\n> > > > > >\n> > > > > >       void stop();\n> > > > > > @@ -120,8 +123,9 @@ private:\n> > > > > >\n> > > > > >       CameraWorker worker_;\n> > > > > >\n> > > > > > -     libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> > > > > > +     libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n> > > > > >       State state_;\n> > > > > > +     std::condition_variable flushed_;\n> > > > > >\n> > > > > >       std::shared_ptr<libcamera::Camera> camera_;\n> > > > > >       std::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > > > > @@ -134,8 +138,9 @@ private:\n> > > > > >       std::map<int, libcamera::PixelFormat> formatsMap_;\n> > > > > >       std::vector<CameraStream> streams_;\n> > > > > >\n> > > > > > -     libcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> > > > > > +     libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n> > > > > >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > > > > > +     std::condition_variable flushing_;\n> > > > > >\n> > > > > >       std::string maker_;\n> > > > > >       std::string model_;\n> > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> > > > > > index 696e80436821..8a3cfa175ff5 100644\n> > > > > > --- a/src/android/camera_ops.cpp\n> > > > > > +++ b/src/android/camera_ops.cpp\n> > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n> > > > > >  {\n> > > > > >  }\n> > > > > >\n> > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> > > > > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > > > > >  {\n> > > > > > +     if (!dev)\n> > > > > > +             return -EINVAL;\n> > > > > > +\n> > > > > > +     CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> > > > > > +     camera->flush();\n> > > > > > +\n> > > > > >       return 0;\n> > > > > >  }\n> > > > > >\n>\n> --\n> Regards,\n>\n> Laurent Pinchart","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 6268ABDB80\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 27 May 2021 02:49:50 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 25DDC68923;\n\tThu, 27 May 2021 04:49:50 +0200 (CEST)","from mail-ej1-x634.google.com (mail-ej1-x634.google.com\n\t[IPv6:2a00:1450:4864:20::634])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 3EC79602AA\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 27 May 2021 04:49:49 +0200 (CEST)","by mail-ej1-x634.google.com with SMTP id b9so5591484ejc.13\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 26 May 2021 19:49:49 -0700 (PDT)","from mail-wm1-f53.google.com (mail-wm1-f53.google.com.\n\t[209.85.128.53]) by smtp.gmail.com with ESMTPSA id\n\ts2sm308697edu.89.2021.05.26.19.49.47\n\tfor <libcamera-devel@lists.libcamera.org>\n\t(version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);\n\tWed, 26 May 2021 19:49:47 -0700 (PDT)","by mail-wm1-f53.google.com with SMTP id\n\tz19-20020a7bc7d30000b029017521c1fb75so1688699wmk.0\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 26 May 2021 19:49:47 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"JJv8xZSV\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org;\n\ts=google; \n\th=mime-version:references:in-reply-to:from:date:message-id:subject:to\n\t:cc:content-transfer-encoding;\n\tbh=goDOTcMvAxE/pN1GOJ3aYN/2YDAk9eu/iyjZiOB5R7U=;\n\tb=JJv8xZSVYKk2jATvURoOAfPl+riALgmV77637Xt1P7FsB01fUV6Vsr6RzMS5D248O0\n\tYAd6TShUeOqqaqoKdD4xzSH+M5+qWMi3mgK1i1TxM879GiUb54P6eyEkw+o9YNQTeuOn\n\tf99X5uuv98AKqxY2I5z59hiTTyvSe82K0xymw=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:references:in-reply-to:from:date\n\t:message-id:subject:to:cc:content-transfer-encoding;\n\tbh=goDOTcMvAxE/pN1GOJ3aYN/2YDAk9eu/iyjZiOB5R7U=;\n\tb=cwhavpvcIFgiquIILrdWYwzgcleTJFS7e1KwyR9aX/Y7oC009nu7utj9f86l1aPvbL\n\tPa+0dbrcVpMSoWkCqqQxDiTrMeQHI5ZiTCSOZfkJXdIm/asoibNPoVcBYW1CYBAYu0BI\n\tKgDzmIhjgR0BgnNF89RvmfQeL/t8k54Xk6ziJH61WhTByUg4XX562a4K39JXdRIoJMx3\n\tUtbYgTgZbAW+AIzmt2a45oVQWs2BQsl8spxlvqfPHDlMoyz4hL94k4hFvpCQsqBIRaSP\n\t4OAAf85v98lCMIAPeHn5BOUt3D8TVgU1M9Bqvdvc4dRIVHYT5XIrOTiU+FyB+d/mVfUx\n\tSryg==","X-Gm-Message-State":"AOAM5305zKIoXG0CUUR4pi2/C673PbKu5r67W26ad/VNyd1fVVlMbTN3\n\tx7JGb5+4xFh7mRcfFBmDSRP8IyddXR3IaVXK","X-Google-Smtp-Source":"ABdhPJwdllfgqG+AElJNw/mh5ZJeK7MdJxA7F9gfAPodBnAMJkeKXjJy3+pLkQKXfxT394FNrF3BVw==","X-Received":["by 2002:a17:906:e98:: with SMTP id\n\tp24mr1421710ejf.478.1622083788498; \n\tWed, 26 May 2021 19:49:48 -0700 (PDT)","by 2002:a7b:cbc2:: with SMTP id n2mr6104161wmi.116.1622083787324;\n\tWed, 26 May 2021 19:49:47 -0700 (PDT)"],"MIME-Version":"1.0","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>\n\t<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>\n\t<20210524074755.krl5ykzukwvx76ca@uno.localdomain>\n\t<YK8DawB2BxidonoT@pendragon.ideasonboard.com>","In-Reply-To":"<YK8DawB2BxidonoT@pendragon.ideasonboard.com>","From":"Tomasz Figa <tfiga@chromium.org>","Date":"Thu, 27 May 2021 11:49:35 +0900","X-Gmail-Original-Message-ID":"<CAAFQd5Ax8eoESYS327ayeFxOsd3zy_Dun8cEf9cm9wEE2goBkw@mail.gmail.com>","Message-ID":"<CAAFQd5Ax8eoESYS327ayeFxOsd3zy_Dun8cEf9cm9wEE2goBkw@mail.gmail.com>","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>,\n\tHirokazu Honda <hiroh@chromium.org>","Content-Type":"text/plain; charset=\"UTF-8\"","Content-Transfer-Encoding":"quoted-printable","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera devel <libcamera-devel@lists.libcamera.org>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17305,"web_url":"https://patchwork.libcamera.org/comment/17305/","msgid":"<CAO5uPHN51EpcFuO7nRQy4gja19k_d+L5u3Rmfc58yfpe-J_EbA@mail.gmail.com>","date":"2021-05-27T03:59:42","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":63,"url":"https://patchwork.libcamera.org/api/people/63/","name":"Hirokazu Honda","email":"hiroh@chromium.org"},"content":"Hi Laurent and Tomasz.\n\nOn Thu, May 27, 2021 at 11:49 AM Tomasz Figa <tfiga@chromium.org> wrote:\n\n> Hi Laurent,\n>\n> On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart\n> <laurent.pinchart@ideasonboard.com> wrote:\n> >\n> > Hi Jacopo,\n> >\n> > (expanding the CC list to finalize the race conditions discussion)\n> >\n> > On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:\n> > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:\n> > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> > > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > > > > > Implement the flush() camera operation in the CameraDevice\n> class\n> > > > > > > and make it available to the camera framework by implementing\n> the\n> > > > > > > operation wrapper in camera_ops.cpp.\n> > > > > > >\n> > > > > > > The flush() implementation stops the Camera and the worker\n> thread and\n> > > > > > > waits for all in-flight requests to be returned. Stopping the\n> Camera\n> > > > > > > forces all Requests already queued to be returned immediately\n> in error\n> > > > > > > state. As flush() has to wait until all of them have been\n> returned, make it\n> > > > > > > wait on a newly introduced condition variable which is\n> notified by the\n> > > > > > > request completion handler when the queue of pending requests\n> has been\n> > > > > > > exhausted.\n> > > > > > >\n> > > > > > > As flush() can race with processCaptureRequest() protect the\n> requests\n> > > > > > > queueing by introducing a new CameraState::CameraFlushing\n> state that\n> > > > > > > processCaptureRequest() inspects before queuing the Request to\n> the\n> > > > > > > Camera. If flush() has been called while\n> processCaptureRequest() was\n> > > > > > > executing, return the current Request immediately in error\n> state.\n> > > > > > >\n> > > > > > > Protect potentially concurrent calls to close() and\n> configureStreams()\n> > > >\n> > > > Can this happen ? Quoting camera3.h,\n> > > >\n> > > >  * 12. Alternatively, the framework may call\n> camera3_device_t->common->close()\n> > > >  *    to end the camera session. This may be called at any time when\n> no other\n> > > >  *    calls from the framework are active, although the call may\n> block until all\n> > > >  *    in-flight captures have completed (all results returned, all\n> buffers\n> > > >  *    filled). After the close call returns, no more calls to the\n> > > >  *    camera3_callback_ops_t functions are allowed from the HAL.\n> Once the\n> > > >  *    close() call is underway, the framework may not call any other\n> HAL device\n> > > >  *    functions.\n> > > >\n> > > > The important part is \"when no other calss from the framework are\n> > > > active\". I don't think we need to handle close() racing with anything\n> > > > else than process_capture_request().\n> > >\n> > > I've been discussing this with Hiro during v1, as initially I didn't\n> > > consider close() and configureStreams().\n> > >\n> > > https://patchwork.libcamera.org/patch/12248/#16884\n> > >\n> > > I initially only considered processCaptureRequest() as a potential\n> > > race, but got suggested differently by the cros camera team.\n> >\n> > Let's try to get to the bottom of this.\n> >\n> > Section S2 (\"Startup and general expected operation sequence\") states:\n> >\n> >  * 12. Alternatively, the framework may call\n> camera3_device_t->common->close()\n> >  *    to end the camera session. This may be called at any time when no\n> other\n> >  *    calls from the framework are active, although the call may block\n> until all\n> >  *    in-flight captures have completed (all results returned, all\n> buffers\n> >  *    filled). After the close call returns, no more calls to the\n> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n> >  *    close() call is underway, the framework may not call any other HAL\n> device\n> >  *    functions.\n> >\n> > There can be in-flight requests when .close() is called, but it can't be\n> > called concurrently with any other call. There's thus no race condition\n> > to protect against.\n>\n> Note that camera3.h is considered outdated, as the interface updates\n> have been reflected only in the HIDL version [1]. However, I don't see\n> a HIDL counterpart of the section you mentioned.\n>\n> [1]\n> https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/\n>\n> >\n> > The .configure_streams() documentation states:\n> >\n> >      * Preconditions:\n> >      *\n> >      * The framework will only call this method when no captures are\n> being\n> >      * processed. That is, all results have been returned to the\n> framework, and\n> >      * all in-flight input and output buffers have been returned and\n> their\n> >      * release sync fences have been signaled by the HAL. The framework\n> will not\n> >      * submit new requests for capture while the configure_streams()\n> call is\n> >      * underway.\n> >\n> > This clearly forbids calling .configure_streams() and\n> > .process_capture_request() concurrently.\n> >\n> > The .flush() documentation states:\n> >\n> >      * Flush all currently in-process captures and all buffers in the\n> pipeline\n> >      * on the given device. The framework will use this to dump all\n> state as\n> >      * quickly as possible in order to prepare for a configure_streams()\n> call.\n> >\n> > I interpret this as at least a very strong hint that .flush() and\n> > .configure_streams() can't be called concurrently :-)\n> >\n> > If anyone disagrees, I'd like compelling evidence that those races can\n> > occur.\n>\n> Indeed, the newest documentation [2] seems to be stating the same.\n>\n> [2]\n> https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107\n>\n> My reading of the documentation is the same as Laurent's. Hiro, did\n> you hear something contradictory from the Android framework team?\n>\n>\nI got from Ricky that flush() and close() can be called anytime.\nRicky, have we requested the Android framework team to specify this in the\ndocument?\n\nAdditionally, in the discussion mail with Android framework team, I found\nthe Android team told\n> We also guarantee we won't call configureStreams while\nprocessCaptureRequest is being called (or if there are any capture requests\nthat haven't yet been completed, for that matter).\nSo probably some protection in Jacopo's code can be saved. Sorry Jacopo for\nreworking.\n\n-Hiro\n\n\n> Best regards,\n> Tomasz\n>\n> >\n> > > > > > > by inspecting the CameraState, and force a wait for any\n> flush() call\n> > > > > > > to complete before proceeding.\n> > > > > > >\n> > > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > > > > > ---\n> > > > > > >  src/android/camera_device.cpp | 90\n> +++++++++++++++++++++++++++++++++--\n> > > > > > >  src/android/camera_device.h   |  9 +++-\n> > > > > > >  src/android/camera_ops.cpp    |  8 +++-\n> > > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > > > > > >\n> > > > > > > diff --git a/src/android/camera_device.cpp\n> b/src/android/camera_device.cpp\n> > > > > > > index 3fce14035718..899afaa49439 100644\n> > > > > > > --- a/src/android/camera_device.cpp\n> > > > > > > +++ b/src/android/camera_device.cpp\n> > > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t\n> *hardwareModule)\n> > > > > > >\n> > > > > > >  void CameraDevice::close()\n> > > > > > >  {\n> > > > > > > -     streams_.clear();\n> > > > > > > +     MutexLocker cameraLock(cameraMutex_);\n> > > >\n> > > > I'd add a blank line here.\n> > > >\n> > > > > > > +     if (state_ == CameraFlushing) {\n> > > >\n> > > > As mentioned above, I don't think you need to protect against close()\n> > > > and flush() racing each other.\n> > > >\n> > > > > > > +             flushed_.wait(cameraLock, [&] { return state_ !=\n> CameraStopped; });\n> > > > > > > +             camera_->release();\n> > > > > > >\n> > > > > > > +             return;\n> > > > > > > +     }\n> > > > > > > +\n> > > > > > > +     streams_.clear();\n> > > > > > >       stop();\n> > > > > > >\n> > > > > > >       camera_->release();\n> > > > > > >  }\n> > > > > > >\n> > > > > > > -void CameraDevice::stop()\n> > > > > > > +/*\n> > > > > > > + * Flush is similar to stop() but sets the camera state to\n> 'flushing' and wait\n> > > >\n> > > > s/wait/waits/\n> > > >\n> > > > > > > + * until all the in-flight requests have been returned before\n> setting the\n> > > > > > > + * camera state to stopped.\n> > > > > > > + *\n> > > > > > > + * Once flushing is done it unlocks concurrent calls to\n> camera close() and\n> > > > > > > + * configureStreams().\n> > > > > > > + */\n> > > > > > > +void CameraDevice::flush()\n> > > > > > >  {\n> > > > > > > +     {\n> > > > > > > +             MutexLocker cameraLock(cameraMutex_);\n> > > > > > > +\n> > > > > > > +             if (state_ != CameraRunning)\n> > > > > > > +                     return;\n> > > > > > > +\n> > > > > > > +             worker_.stop();\n> > > > > > > +             camera_->stop();\n> > > > > > > +             state_ = CameraFlushing;\n> > > > > > > +     }\n> > > > > > > +\n> > > > > > > +     /*\n> > > > > > > +      * Now wait for all the in-flight requests to be\n> completed before\n> > > > > > > +      * continuing. Stopping the Camera guarantees that all\n> in-flight\n> > > > > > > +      * requests are completed in error state.\n> > > >\n> > > > Do we need to wait ? Camera::stop() guarantees that all requests\n> > > > complete synchronously with the stop() call.\n> > >\n> > > I didn't get the API that way... I thought after stop we would receive\n> > > a sequence of failed requests... Actually I don't see anything that\n> > > suggests that in camera.cpp or pipeline_handler.cpp apart from an\n> assertion\n> > > in Camera::stop()\n> >\n> > The camera::stop() documentation states\n> >\n> >  * This method stops capturing and processing requests immediately. All\n> pending\n> >  * requests are cancelled and complete synchronously in an error state.\n> >\n> > Is this ambiguous ?\n>\n\nPerhaps, should \"and complete synchronously\" be rephrased \"and\nCamera::requestComplete() is executed for all of them before this returns\"?\n\n>\n> > > > Partly answering myself here, we'll have to wait for post-processing\n> > > > tasks to complete once we'll process them in a separate thread, but\n> that\n> > > > will likely be handled by Thread::wait(). I don't think you need a\n> > > > condition variable here. I'm I'm not mistaken, this should simplify\n> the\n> > > > implementation.\n> > >\n> > > If Camera::stop() is synchronous we don't need to wait indeed\n> > >\n> > > > > > > +      */\n> > > > > > > +     {\n> > > > > > > +             MutexLocker requestsLock(requestsMutex_);\n> > > > > > > +             flushing_.wait(requestsLock, [&] { return\n> descriptors_.empty(); });\n> > > > > > > +     }\n> > > > > >\n> > > > > > I'm still uneasy about releasing the cameraMutex_ for this\n> section. In\n> > > > > > patch 6/8 you add it to protect the state_ variable but here it's\n> > > > >\n> > > > > I'm not changing state_ without the mutex acquired, am I ?\n> > > > >\n> > > > > > ignored. I see the ASSERT() added to stop() but the patter of\n> taking the\n> > > > > > lock checking state_, releasing the lock and do some work,\n> retake the\n> > > > > > lock and update state_ feels like a bad idea. Maybe I'm missing\n> > > > >\n> > > > > How so, apart from the fact it feels a bit unusual, I concur ?\n> > > > >\n> > > > > If I keep the held the mutex for the whole duration of flush no\n> other\n> > > > > concurrent method can proceed until all the queued requests have\n> not\n> > > > > been completed. While flush waits for the flushing_ condition to be\n> > > > > signaled, processCaptureRequest() can proceed and immediately\n> return\n> > > > > the newly queued requests in error state by detecting state_ ==\n> > > > > CameraFlushing which signals that flush in is progress.\n> > > > > Otherwise it would have had to wait for flush to end. But then\n> we're back\n> > > > > to a situation where we could serialize all calls and that's it, we\n> > > > > would be done with a single mutex to be held for the whole\n> duration of\n> > > > > all operations.\n> > > > >\n> > > > > If it only was for close() or configureStreams() we could have\n> locked\n> > > > > for the whole duration of flush(), as they anyway wait for flush to\n> > > > > complete before proceeding (by waiting on the flushed_ condition\n> here\n> > > > > below signaled).\n> > > > >\n> > > > > > something and this is not a real problem, if so maybe we can\n> capture\n> > > > > > that in the comment here?\n> > > > > >\n> > > > > > > +\n> > > > > > > +     /*\n> > > > > > > +      * Set state to stopped and unlock close() or\n> configureStreams() that\n> > > > > > > +      * might be waiting for flush to be completed.\n> > > > > > > +      */\n> > > > > > >       MutexLocker cameraLock(cameraMutex_);\n> > > > > > > +     state_ = CameraStopped;\n> > > > > > > +     flushed_.notify_one();\n> > > >\n> > > > You should drop the lock before calling notify_one(). Otherwise\n> you'll\n> > > > wake up the task waiting on flushed_, which will try to lock\n> > > > cameraMutex_, which will block immediately. The scheduler will have\n> to\n> > > > reschedule this task for the function to return and the lock to be\n> > > > released before the waiter can proceed. That works, but isn't very\n> > > > efficient.\n> > >\n> > > Weird, the cpp reference shows example about notify_one where the\n> > > caller always has the mutex held locked, but I see your point and\n> > > seems correct..\n> >\n> > I'm looking at\n> > https://en.cppreference.com/w/cpp/thread/condition_variable and\n> > https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one\n> > and both calls to notify_one() in the example are made without the lock\n> > held, aren't they ?\n> >\n> > > >\n> > > >     {\n> > > >             MutexLocker cameraLock(cameraMutex_);\n> > > >             state_ = CameraStopped;\n> > > >     }\n> > > >\n> > > >     flushed_.notify_one();\n> > > >\n> > >\n> > > So I could change to this one, if I don't have to drop this part\n> > > completely if we consider close() and configureStreams() not as\n> > > possible races...\n> > >\n> > > > > > > +}\n> > > > > > > +\n> > > > > > > +/* Calls to stop() must be protected by cameraMutex_ being\n> held by the caller. */\n> > > > > > > +void CameraDevice::stop()\n> > > > > > > +{\n> > > > > > > +     ASSERT(state_ != CameraFlushing);\n> > > > > > > +\n> > > > > > >       if (state_ == CameraStopped)\n> > > > > > >               return;\n> > > > > > >\n> > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat\n> CameraDevice::toPixelFormat(int format) const\n> > > > > > >   */\n> > > > > > >  int\n> CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> > > > > > >  {\n> > > > > > > -     /* Before any configuration attempt, stop the camera. */\n> > > > > > > -     stop();\n> > > > > > > +     {\n> > > > > > > +             /*\n> > > > > > > +              * If a flush is in progress, wait for it to\n> complete and to\n> > > > > > > +              * stop the camera, otherwise before any new\n> configuration\n> > > > > > > +              * attempt we have to stop the camera explictely.\n> > > > > > > +              */\n> > > >\n> > > > Same here, I don't think flush() and configure_streams() can race\n> each\n> > > > other. I believe the only possible race to be between flush() and\n> > > > process_capture_request().\n> > >\n> > > Ditto.\n> > >\n> > > > > > > +             MutexLocker cameraLock(cameraMutex_);\n> > > > > > > +             if (state_ == CameraFlushing)\n> > > > > > > +                     flushed_.wait(cameraLock, [&] { return\n> state_ != CameraStopped; });\n> > > > > > > +             else\n> > > > > > > +                     stop();\n> > > > > > > +     }\n> > > > > > >\n> > > > > > >       if (stream_list->num_streams == 0) {\n> > > > > > >               LOG(HAL, Error) << \"No streams in configuration\";\n> > > > > > > @@ -1950,6 +2009,25 @@ int\n> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > > > > > >       if (ret)\n> > > > > > >               return ret;\n> > > > > > >\n> > > > > > > +     /*\n> > > > > > > +      * Just before queuing the request, make sure flush()\n> has not\n> > > > > > > +      * been called after this function has been executed. In\n> that\n> > > > > > > +      * case, immediately return the request with errors.\n> > > > > > > +      */\n> > > > > > > +     MutexLocker cameraLock(cameraMutex_);\n> > > > > > > +     if (state_ == CameraFlushing || state_ == CameraStopped)\n> {\n> > > > > > > +             for (camera3_stream_buffer_t &buffer :\n> descriptor.buffers_) {\n> > > > > > > +                     buffer.status =\n> CAMERA3_BUFFER_STATUS_ERROR;\n> > > > > > > +                     buffer.release_fence =\n> buffer.acquire_fence;\n> > > > > > > +             }\n> > > > > > > +\n> > > > > > > +             notifyError(descriptor.frameNumber_,\n> > > > > > > +                         descriptor.buffers_[0].stream,\n> > > >\n> > > > As commented on a previous patch, I think you should pass nullptr for\n> > > > the stream here.\n> > >\n> > > The \"S6. Error management:\" section of the camera3.h header does not\n> > > mention that, not the ?\n> >\n> > Indeed, that section doesn't mention the camera3_error_msg::error_stream\n> > field at all. The field is documented in the structure as\n> >\n> >     /**\n> >      * Pointer to the stream that had a failure. NULL if the stream isn't\n> >      * applicable to the error.\n> >      */\n> >\n> > The question is thus when the stream is applicable to the error. The\n> > documentation of enum camera3_error_msg_code mentions error_stream in\n> > the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to\n> > the device, the request or the result metadata, which are not specific\n> > to a stream.\n> >\n> > > where does you suggestion come from ? I don't find any reference in\n> > > the review of [1/8]\n> >\n> > ([PATCH v3 1/8] android: Rework request completion notification'\n> > (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)\n> >\n> > > > > > > +                         CAMERA3_MSG_ERROR_REQUEST);\n> > > > > > > +\n> > > > > > > +             return 0;\n> > > > > > > +     }\n> > > > > > > +\n> > > > > > >       worker_.queueRequest(descriptor.request_.get());\n> > > > > > >\n> > > > > > >       {\n> > > > > > > @@ -1979,6 +2057,10 @@ void\n> CameraDevice::requestComplete(Request *request)\n> > > > > > >                       return;\n> > > > > > >               }\n> > > > > > >\n> > > > > > > +             /* Release flush if all the pending requests\n> have been completed. */\n> > > > > > > +             if (descriptors_.empty())\n> > > > > > > +                     flushing_.notify_one();\n> > > >\n> > > > This will never happen, as you can only get here if\n> descriptors_.find()\n> > > > has found the descriptor. Did you mean to do this after the extract()\n> > > > call below ?\n> > >\n> > > Ugh. This works only because Camera::stop() is synchronous then ?\n> >\n> > I believe so.\n> >\n> > > > > > > +\n> > > > > > >               node = descriptors_.extract(it);\n> > > > > > >       }\n> > > > > > >       Camera3RequestDescriptor &descriptor = node.mapped();\n> > > > > > > diff --git a/src/android/camera_device.h\n> b/src/android/camera_device.h\n> > > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > > > > > --- a/src/android/camera_device.h\n> > > > > > > +++ b/src/android/camera_device.h\n> > > > > > > @@ -7,6 +7,7 @@\n> > > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > > > > > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > > > > > >\n> > > > > > > +#include <condition_variable>\n> > > > > > >  #include <map>\n> > > > > > >  #include <memory>\n> > > > > > >  #include <mutex>\n> > > > > > > @@ -42,6 +43,7 @@ public:\n> > > > > > >\n> > > > > > >       int open(const hw_module_t *hardwareModule);\n> > > > > > >       void close();\n> > > > > > > +     void flush();\n> > > > > > >\n> > > > > > >       unsigned int id() const { return id_; }\n> > > > > > >       camera3_device_t *camera3Device() { return\n> &camera3Device_; }\n> > > > > > > @@ -92,6 +94,7 @@ private:\n> > > > > > >       enum State {\n> > > > > > >               CameraStopped,\n> > > > > > >               CameraRunning,\n> > > > > > > +             CameraFlushing,\n> > > > > > >       };\n> > > > > > >\n> > > > > > >       void stop();\n> > > > > > > @@ -120,8 +123,9 @@ private:\n> > > > > > >\n> > > > > > >       CameraWorker worker_;\n> > > > > > >\n> > > > > > > -     libcamera::Mutex cameraMutex_; /* Protects access to the\n> camera state. */\n> > > > > > > +     libcamera::Mutex cameraMutex_; /* Protects the camera\n> state and flushed_. */\n> > > > > > >       State state_;\n> > > > > > > +     std::condition_variable flushed_;\n> > > > > > >\n> > > > > > >       std::shared_ptr<libcamera::Camera> camera_;\n> > > > > > >       std::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > > > > > @@ -134,8 +138,9 @@ private:\n> > > > > > >       std::map<int, libcamera::PixelFormat> formatsMap_;\n> > > > > > >       std::vector<CameraStream> streams_;\n> > > > > > >\n> > > > > > > -     libcamera::Mutex requestsMutex_; /* Protects\n> descriptors_. */\n> > > > > > > +     libcamera::Mutex requestsMutex_; /* Protects\n> descriptors_ and flushing_. */\n> > > > > > >       std::map<uint64_t, Camera3RequestDescriptor>\n> descriptors_;\n> > > > > > > +     std::condition_variable flushing_;\n> > > > > > >\n> > > > > > >       std::string maker_;\n> > > > > > >       std::string model_;\n> > > > > > > diff --git a/src/android/camera_ops.cpp\n> b/src/android/camera_ops.cpp\n> > > > > > > index 696e80436821..8a3cfa175ff5 100644\n> > > > > > > --- a/src/android/camera_ops.cpp\n> > > > > > > +++ b/src/android/camera_ops.cpp\n> > > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]]\n> const struct camera3_device *dev,\n> > > > > > >  {\n> > > > > > >  }\n> > > > > > >\n> > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct\n> camera3_device *dev)\n> > > > > > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > > > > > >  {\n> > > > > > > +     if (!dev)\n> > > > > > > +             return -EINVAL;\n> > > > > > > +\n> > > > > > > +     CameraDevice *camera = reinterpret_cast<CameraDevice\n> *>(dev->priv);\n> > > > > > > +     camera->flush();\n> > > > > > > +\n> > > > > > >       return 0;\n> > > > > > >  }\n> > > > > > >\n> >\n> > --\n> > Regards,\n> >\n> > Laurent Pinchart\n>","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 2B5BFBDB80\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 27 May 2021 03:59:54 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id DC1C468926;\n\tThu, 27 May 2021 05:59:53 +0200 (CEST)","from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com\n\t[IPv6:2a00:1450:4864:20::62c])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 088116891F\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 27 May 2021 05:59:53 +0200 (CEST)","by mail-ej1-x62c.google.com with SMTP id lz27so5783881ejb.11\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tWed, 26 May 2021 20:59:53 -0700 (PDT)"],"Authentication-Results":"lancelot.ideasonboard.com;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=chromium.org header.i=@chromium.org\n\theader.b=\"eNE0s2BV\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org;\n\ts=google; \n\th=mime-version:references:in-reply-to:from:date:message-id:subject:to\n\t:cc; bh=tTnqwZMJek0vvzMguEmxsoS/VDx2v3uXDFqSuHWdVns=;\n\tb=eNE0s2BVlzhut3SGf1ufxJV2KL3NW3hf4wSGanPYZyvALTKOYiD1qrQX2gKHZE+16Q\n\tLyaUAMyw54Fj1iWOy39QqCG6E0AZ1bLa67woIyTy6GktcsMl0bacSgmgTbdfu4kQWw2l\n\tmmeBg3zP07kxMPwV9RxPvVFRhN42TrW4GP0uE=","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:references:in-reply-to:from:date\n\t:message-id:subject:to:cc;\n\tbh=tTnqwZMJek0vvzMguEmxsoS/VDx2v3uXDFqSuHWdVns=;\n\tb=G3KZjHmfUAdEuWwX9U4g6vzbf5UbM2iRUuLPfivRAlnbMEzUnYH3Lr/kPJJdZ5WCjX\n\tdYGG7YjIcm7v7Rn/SHY0IPG2wG0qzM5oRqnjGquT4uxhe4wct90/DFB/IyiFcF29cpBw\n\tSM5FBEg8gyAsfRFY6K1KV+/rkl3qUPLVgxfWAolsgrHBl/MMwzHs13YaMY6YLDAOKYLS\n\tVNZc3ZHkIq6072uNGA0DDrNeYNzYzZAH4OoqDvuS2umeEUmtOG7QqKfuw+iA55AnzFPY\n\tl3OEEtcUzlEjuQYnYh2CWTPyFiT8Sfq9VElOsn2d60cb+sVn7mYKKezfLZpWkklNVbxR\n\tvozA==","X-Gm-Message-State":"AOAM531RDwDzjOVvVH652o/W+WYkyfillb4Ou604xgzM23YRkElWPVmE\n\tBKvgsqPM0wgftbDjDrVM6a/G51zvTQRcegE4W8Moaw==","X-Google-Smtp-Source":"ABdhPJxCM1nr2uzVfeVOmjNqMgR/CS+hw53dNwL00riZwX6R4EwsuE6ca1lZTS89Q60hgAy3Sk/kV4AiKxgViTd7vu4=","X-Received":"by 2002:a17:906:3a04:: with SMTP id\n\tz4mr1630553eje.221.1622087992574; \n\tWed, 26 May 2021 20:59:52 -0700 (PDT)","MIME-Version":"1.0","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>\n\t<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>\n\t<20210524074755.krl5ykzukwvx76ca@uno.localdomain>\n\t<YK8DawB2BxidonoT@pendragon.ideasonboard.com>\n\t<CAAFQd5Ax8eoESYS327ayeFxOsd3zy_Dun8cEf9cm9wEE2goBkw@mail.gmail.com>","In-Reply-To":"<CAAFQd5Ax8eoESYS327ayeFxOsd3zy_Dun8cEf9cm9wEE2goBkw@mail.gmail.com>","From":"Hirokazu Honda <hiroh@chromium.org>","Date":"Thu, 27 May 2021 12:59:42 +0900","Message-ID":"<CAO5uPHN51EpcFuO7nRQy4gja19k_d+L5u3Rmfc58yfpe-J_EbA@mail.gmail.com>","To":"Tomasz Figa <tfiga@chromium.org>","Content-Type":"multipart/alternative; boundary=\"00000000000097972c05c347cafd\"","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera devel <libcamera-devel@lists.libcamera.org>","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}},{"id":17314,"web_url":"https://patchwork.libcamera.org/comment/17314/","msgid":"<20210527074632.f5urzibjq2uknzte@uno.localdomain>","date":"2021-05-27T07:46:32","subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","submitter":{"id":3,"url":"https://patchwork.libcamera.org/api/people/3/","name":"Jacopo Mondi","email":"jacopo@jmondi.org"},"content":"Hi Laurent,\n   thanks for the detailed answer\n\nOn Thu, May 27, 2021 at 05:26:51AM +0300, Laurent Pinchart wrote:\n> Hi Jacopo,\n>\n> (expanding the CC list to finalize the race conditions discussion)\n>\n> On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:\n> > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:\n> > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:\n> > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:\n> > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:\n> > > > > > Implement the flush() camera operation in the CameraDevice class\n> > > > > > and make it available to the camera framework by implementing the\n> > > > > > operation wrapper in camera_ops.cpp.\n> > > > > >\n> > > > > > The flush() implementation stops the Camera and the worker thread and\n> > > > > > waits for all in-flight requests to be returned. Stopping the Camera\n> > > > > > forces all Requests already queued to be returned immediately in error\n> > > > > > state. As flush() has to wait until all of them have been returned, make it\n> > > > > > wait on a newly introduced condition variable which is notified by the\n> > > > > > request completion handler when the queue of pending requests has been\n> > > > > > exhausted.\n> > > > > >\n> > > > > > As flush() can race with processCaptureRequest() protect the requests\n> > > > > > queueing by introducing a new CameraState::CameraFlushing state that\n> > > > > > processCaptureRequest() inspects before queuing the Request to the\n> > > > > > Camera. If flush() has been called while processCaptureRequest() was\n> > > > > > executing, return the current Request immediately in error state.\n> > > > > >\n> > > > > > Protect potentially concurrent calls to close() and configureStreams()\n> > >\n> > > Can this happen ? Quoting camera3.h,\n> > >\n> > >  * 12. Alternatively, the framework may call camera3_device_t->common->close()\n> > >  *    to end the camera session. This may be called at any time when no other\n> > >  *    calls from the framework are active, although the call may block until all\n> > >  *    in-flight captures have completed (all results returned, all buffers\n> > >  *    filled). After the close call returns, no more calls to the\n> > >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n> > >  *    close() call is underway, the framework may not call any other HAL device\n> > >  *    functions.\n> > >\n> > > The important part is \"when no other calss from the framework are\n> > > active\". I don't think we need to handle close() racing with anything\n> > > else than process_capture_request().\n> >\n> > I've been discussing this with Hiro during v1, as initially I didn't\n> > consider close() and configureStreams().\n> >\n> > https://patchwork.libcamera.org/patch/12248/#16884\n> >\n> > I initially only considered processCaptureRequest() as a potential\n> > race, but got suggested differently by the cros camera team.\n>\n> Let's try to get to the bottom of this.\n>\n> Section S2 (\"Startup and general expected operation sequence\") states:\n>\n>  * 12. Alternatively, the framework may call camera3_device_t->common->close()\n>  *    to end the camera session. This may be called at any time when no other\n>  *    calls from the framework are active, although the call may block until all\n>  *    in-flight captures have completed (all results returned, all buffers\n>  *    filled). After the close call returns, no more calls to the\n>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the\n>  *    close() call is underway, the framework may not call any other HAL device\n>  *    functions.\n>\n> There can be in-flight requests when .close() is called, but it can't be\n> called concurrently with any other call. There's thus no race condition\n> to protect against.\n>\n> The .configure_streams() documentation states:\n>\n>      * Preconditions:\n>      *\n>      * The framework will only call this method when no captures are being\n>      * processed. That is, all results have been returned to the framework, and\n>      * all in-flight input and output buffers have been returned and their\n>      * release sync fences have been signaled by the HAL. The framework will not\n>      * submit new requests for capture while the configure_streams() call is\n>      * underway.\n>\n> This clearly forbids calling .configure_streams() and\n> .process_capture_request() concurrently.\n>\n> The .flush() documentation states:\n>\n>      * Flush all currently in-process captures and all buffers in the pipeline\n>      * on the given device. The framework will use this to dump all state as\n>      * quickly as possible in order to prepare for a configure_streams() call.\n>\n> I interpret this as at least a very strong hint that .flush() and\n> .configure_streams() can't be called concurrently :-)\n>\n> If anyone disagrees, I'd like compelling evidence that those races can\n> occur.\n>\n> > > > > > by inspecting the CameraState, and force a wait for any flush() call\n> > > > > > to complete before proceeding.\n> > > > > >\n> > > > > > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>\n> > > > > > ---\n> > > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--\n> > > > > >  src/android/camera_device.h   |  9 +++-\n> > > > > >  src/android/camera_ops.cpp    |  8 +++-\n> > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)\n> > > > > >\n> > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp\n> > > > > > index 3fce14035718..899afaa49439 100644\n> > > > > > --- a/src/android/camera_device.cpp\n> > > > > > +++ b/src/android/camera_device.cpp\n> > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)\n> > > > > >\n> > > > > >  void CameraDevice::close()\n> > > > > >  {\n> > > > > > -\tstreams_.clear();\n> > > > > > +\tMutexLocker cameraLock(cameraMutex_);\n> > >\n> > > I'd add a blank line here.\n> > >\n> > > > > > +\tif (state_ == CameraFlushing) {\n> > >\n> > > As mentioned above, I don't think you need to protect against close()\n> > > and flush() racing each other.\n> > >\n> > > > > > +\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > > > +\t\tcamera_->release();\n> > > > > >\n> > > > > > +\t\treturn;\n> > > > > > +\t}\n> > > > > > +\n> > > > > > +\tstreams_.clear();\n> > > > > >  \tstop();\n> > > > > >\n> > > > > >  \tcamera_->release();\n> > > > > >  }\n> > > > > >\n> > > > > > -void CameraDevice::stop()\n> > > > > > +/*\n> > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait\n> > >\n> > > s/wait/waits/\n> > >\n> > > > > > + * until all the in-flight requests have been returned before setting the\n> > > > > > + * camera state to stopped.\n> > > > > > + *\n> > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and\n> > > > > > + * configureStreams().\n> > > > > > + */\n> > > > > > +void CameraDevice::flush()\n> > > > > >  {\n> > > > > > +\t{\n> > > > > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > > > > +\n> > > > > > +\t\tif (state_ != CameraRunning)\n> > > > > > +\t\t\treturn;\n> > > > > > +\n> > > > > > +\t\tworker_.stop();\n> > > > > > +\t\tcamera_->stop();\n> > > > > > +\t\tstate_ = CameraFlushing;\n> > > > > > +\t}\n> > > > > > +\n> > > > > > +\t/*\n> > > > > > +\t * Now wait for all the in-flight requests to be completed before\n> > > > > > +\t * continuing. Stopping the Camera guarantees that all in-flight\n> > > > > > +\t * requests are completed in error state.\n> > >\n> > > Do we need to wait ? Camera::stop() guarantees that all requests\n> > > complete synchronously with the stop() call.\n> >\n> > I didn't get the API that way... I thought after stop we would receive\n> > a sequence of failed requests... Actually I don't see anything that\n> > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion\n> > in Camera::stop()\n>\n> The camera::stop() documentation states\n>\n>  * This method stops capturing and processing requests immediately. All pending\n>  * requests are cancelled and complete synchronously in an error state.\n>\n> Is this ambiguous ?\n>\n\nI admit I haven't looked at documentation but only the code paths..\n\n> > > Partly answering myself here, we'll have to wait for post-processing\n> > > tasks to complete once we'll process them in a separate thread, but that\n> > > will likely be handled by Thread::wait(). I don't think you need a\n> > > condition variable here. I'm I'm not mistaken, this should simplify the\n> > > implementation.\n> >\n> > If Camera::stop() is synchronous we don't need to wait indeed\n> >\n> > > > > > +\t */\n> > > > > > +\t{\n> > > > > > +\t\tMutexLocker requestsLock(requestsMutex_);\n> > > > > > +\t\tflushing_.wait(requestsLock, [&] { return descriptors_.empty(); });\n> > > > > > +\t}\n> > > > >\n> > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In\n> > > > > patch 6/8 you add it to protect the state_ variable but here it's\n> > > >\n> > > > I'm not changing state_ without the mutex acquired, am I ?\n> > > >\n> > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the\n> > > > > lock checking state_, releasing the lock and do some work, retake the\n> > > > > lock and update state_ feels like a bad idea. Maybe I'm missing\n> > > >\n> > > > How so, apart from the fact it feels a bit unusual, I concur ?\n> > > >\n> > > > If I keep the held the mutex for the whole duration of flush no other\n> > > > concurrent method can proceed until all the queued requests have not\n> > > > been completed. While flush waits for the flushing_ condition to be\n> > > > signaled, processCaptureRequest() can proceed and immediately return\n> > > > the newly queued requests in error state by detecting state_ ==\n> > > > CameraFlushing which signals that flush in is progress.\n> > > > Otherwise it would have had to wait for flush to end. But then we're back\n> > > > to a situation where we could serialize all calls and that's it, we\n> > > > would be done with a single mutex to be held for the whole duration of\n> > > > all operations.\n> > > >\n> > > > If it only was for close() or configureStreams() we could have locked\n> > > > for the whole duration of flush(), as they anyway wait for flush to\n> > > > complete before proceeding (by waiting on the flushed_ condition here\n> > > > below signaled).\n> > > >\n> > > > > something and this is not a real problem, if so maybe we can capture\n> > > > > that in the comment here?\n> > > > >\n> > > > > > +\n> > > > > > +\t/*\n> > > > > > +\t * Set state to stopped and unlock close() or configureStreams() that\n> > > > > > +\t * might be waiting for flush to be completed.\n> > > > > > +\t */\n> > > > > >  \tMutexLocker cameraLock(cameraMutex_);\n> > > > > > +\tstate_ = CameraStopped;\n> > > > > > +\tflushed_.notify_one();\n> > >\n> > > You should drop the lock before calling notify_one(). Otherwise you'll\n> > > wake up the task waiting on flushed_, which will try to lock\n> > > cameraMutex_, which will block immediately. The scheduler will have to\n> > > reschedule this task for the function to return and the lock to be\n> > > released before the waiter can proceed. That works, but isn't very\n> > > efficient.\n> >\n> > Weird, the cpp reference shows example about notify_one where the\n> > caller always has the mutex held locked, but I see your point and\n> > seems correct..\n>\n> I'm looking at\n> https://en.cppreference.com/w/cpp/thread/condition_variable and\n> https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one\n> and both calls to notify_one() in the example are made without the lock\n> held, aren't they ?\n\no_0 I would swear I've seen different producer/consumer examples in the\ndocumentation. As I assume they haven't changed overnight, it's\nclearly my immagination..\n\n>\n> > >\n> > > \t{\n> > > \t\tMutexLocker cameraLock(cameraMutex_);\n> > > \t\tstate_ = CameraStopped;\n> > > \t}\n> > >\n> > > \tflushed_.notify_one();\n> > >\n> >\n> > So I could change to this one, if I don't have to drop this part\n> > completely if we consider close() and configureStreams() not as\n> > possible races...\n> >\n> > > > > > +}\n> > > > > > +\n> > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */\n> > > > > > +void CameraDevice::stop()\n> > > > > > +{\n> > > > > > +\tASSERT(state_ != CameraFlushing);\n> > > > > > +\n> > > > > >  \tif (state_ == CameraStopped)\n> > > > > >  \t\treturn;\n> > > > > >\n> > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const\n> > > > > >   */\n> > > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)\n> > > > > >  {\n> > > > > > -\t/* Before any configuration attempt, stop the camera. */\n> > > > > > -\tstop();\n> > > > > > +\t{\n> > > > > > +\t\t/*\n> > > > > > +\t\t * If a flush is in progress, wait for it to complete and to\n> > > > > > +\t\t * stop the camera, otherwise before any new configuration\n> > > > > > +\t\t * attempt we have to stop the camera explictely.\n> > > > > > +\t\t */\n> > >\n> > > Same here, I don't think flush() and configure_streams() can race each\n> > > other. I believe the only possible race to be between flush() and\n> > > process_capture_request().\n> >\n> > Ditto.\n> >\n> > > > > > +\t\tMutexLocker cameraLock(cameraMutex_);\n> > > > > > +\t\tif (state_ == CameraFlushing)\n> > > > > > +\t\t\tflushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });\n> > > > > > +\t\telse\n> > > > > > +\t\t\tstop();\n> > > > > > +\t}\n> > > > > >\n> > > > > >  \tif (stream_list->num_streams == 0) {\n> > > > > >  \t\tLOG(HAL, Error) << \"No streams in configuration\";\n> > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques\n> > > > > >  \tif (ret)\n> > > > > >  \t\treturn ret;\n> > > > > >\n> > > > > > +\t/*\n> > > > > > +\t * Just before queuing the request, make sure flush() has not\n> > > > > > +\t * been called after this function has been executed. In that\n> > > > > > +\t * case, immediately return the request with errors.\n> > > > > > +\t */\n> > > > > > +\tMutexLocker cameraLock(cameraMutex_);\n> > > > > > +\tif (state_ == CameraFlushing || state_ == CameraStopped) {\n> > > > > > +\t\tfor (camera3_stream_buffer_t &buffer : descriptor.buffers_) {\n> > > > > > +\t\t\tbuffer.status = CAMERA3_BUFFER_STATUS_ERROR;\n> > > > > > +\t\t\tbuffer.release_fence = buffer.acquire_fence;\n> > > > > > +\t\t}\n> > > > > > +\n> > > > > > +\t\tnotifyError(descriptor.frameNumber_,\n> > > > > > +\t\t\t    descriptor.buffers_[0].stream,\n> > >\n> > > As commented on a previous patch, I think you should pass nullptr for\n> > > the stream here.\n> >\n> > The \"S6. Error management:\" section of the camera3.h header does not\n> > mention that, not the ?\n>\n> Indeed, that section doesn't mention the camera3_error_msg::error_stream\n> field at all. The field is documented in the structure as\n>\n>     /**\n>      * Pointer to the stream that had a failure. NULL if the stream isn't\n>      * applicable to the error.\n>      */\n>\n> The question is thus when the stream is applicable to the error. The\n> documentation of enum camera3_error_msg_code mentions error_stream in\n> the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to\n> the device, the request or the result metadata, which are not specific\n> to a stream.\n>\n\nAck, thanks for clarifying\n\n> > where does you suggestion come from ? I don't find any reference in\n> > the review of [1/8]\n>\n> ([PATCH v3 1/8] android: Rework request completion notification'\n> (YKqV6Iik2sN3XUEf@pendragon.ideasonboard.com)\n>\n> > > > > > +\t\t\t    CAMERA3_MSG_ERROR_REQUEST);\n> > > > > > +\n> > > > > > +\t\treturn 0;\n> > > > > > +\t}\n> > > > > > +\n> > > > > >  \tworker_.queueRequest(descriptor.request_.get());\n> > > > > >\n> > > > > >  \t{\n> > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)\n> > > > > >  \t\t\treturn;\n> > > > > >  \t\t}\n> > > > > >\n> > > > > > +\t\t/* Release flush if all the pending requests have been completed. */\n> > > > > > +\t\tif (descriptors_.empty())\n> > > > > > +\t\t\tflushing_.notify_one();\n> > >\n> > > This will never happen, as you can only get here if descriptors_.find()\n> > > has found the descriptor. Did you mean to do this after the extract()\n> > > call below ?\n> >\n> > Ugh. This works only because Camera::stop() is synchronous then ?\n>\n> I believe so.\n>\n> > > > > > +\n> > > > > >  \t\tnode = descriptors_.extract(it);\n> > > > > >  \t}\n> > > > > >  \tCamera3RequestDescriptor &descriptor = node.mapped();\n> > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h\n> > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644\n> > > > > > --- a/src/android/camera_device.h\n> > > > > > +++ b/src/android/camera_device.h\n> > > > > > @@ -7,6 +7,7 @@\n> > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__\n> > > > > >  #define __ANDROID_CAMERA_DEVICE_H__\n> > > > > >\n> > > > > > +#include <condition_variable>\n> > > > > >  #include <map>\n> > > > > >  #include <memory>\n> > > > > >  #include <mutex>\n> > > > > > @@ -42,6 +43,7 @@ public:\n> > > > > >\n> > > > > >  \tint open(const hw_module_t *hardwareModule);\n> > > > > >  \tvoid close();\n> > > > > > +\tvoid flush();\n> > > > > >\n> > > > > >  \tunsigned int id() const { return id_; }\n> > > > > >  \tcamera3_device_t *camera3Device() { return &camera3Device_; }\n> > > > > > @@ -92,6 +94,7 @@ private:\n> > > > > >  \tenum State {\n> > > > > >  \t\tCameraStopped,\n> > > > > >  \t\tCameraRunning,\n> > > > > > +\t\tCameraFlushing,\n> > > > > >  \t};\n> > > > > >\n> > > > > >  \tvoid stop();\n> > > > > > @@ -120,8 +123,9 @@ private:\n> > > > > >\n> > > > > >  \tCameraWorker worker_;\n> > > > > >\n> > > > > > -\tlibcamera::Mutex cameraMutex_; /* Protects access to the camera state. */\n> > > > > > +\tlibcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */\n> > > > > >  \tState state_;\n> > > > > > +\tstd::condition_variable flushed_;\n> > > > > >\n> > > > > >  \tstd::shared_ptr<libcamera::Camera> camera_;\n> > > > > >  \tstd::unique_ptr<libcamera::CameraConfiguration> config_;\n> > > > > > @@ -134,8 +138,9 @@ private:\n> > > > > >  \tstd::map<int, libcamera::PixelFormat> formatsMap_;\n> > > > > >  \tstd::vector<CameraStream> streams_;\n> > > > > >\n> > > > > > -\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_. */\n> > > > > > +\tlibcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */\n> > > > > >  \tstd::map<uint64_t, Camera3RequestDescriptor> descriptors_;\n> > > > > > +\tstd::condition_variable flushing_;\n> > > > > >\n> > > > > >  \tstd::string maker_;\n> > > > > >  \tstd::string model_;\n> > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp\n> > > > > > index 696e80436821..8a3cfa175ff5 100644\n> > > > > > --- a/src/android/camera_ops.cpp\n> > > > > > +++ b/src/android/camera_ops.cpp\n> > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,\n> > > > > >  {\n> > > > > >  }\n> > > > > >\n> > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)\n> > > > > > +static int hal_dev_flush(const struct camera3_device *dev)\n> > > > > >  {\n> > > > > > +\tif (!dev)\n> > > > > > +\t\treturn -EINVAL;\n> > > > > > +\n> > > > > > +\tCameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);\n> > > > > > +\tcamera->flush();\n> > > > > > +\n> > > > > >  \treturn 0;\n> > > > > >  }\n> > > > > >\n>\n> --\n> Regards,\n>\n> Laurent Pinchart","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 63877BDB80\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 27 May 2021 07:45:52 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id C2CBE68923;\n\tThu, 27 May 2021 09:45:51 +0200 (CEST)","from relay4-d.mail.gandi.net (relay4-d.mail.gandi.net\n\t[217.70.183.196])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id 9771F602AA\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 27 May 2021 09:45:49 +0200 (CEST)","(Authenticated sender: jacopo@jmondi.org)\n\tby relay4-d.mail.gandi.net (Postfix) with ESMTPSA id 4D1D2E0011;\n\tThu, 27 May 2021 07:45:45 +0000 (UTC)"],"Date":"Thu, 27 May 2021 09:46:32 +0200","From":"Jacopo Mondi <jacopo@jmondi.org>","To":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","Message-ID":"<20210527074632.f5urzibjq2uknzte@uno.localdomain>","References":"<20210521154227.60186-1-jacopo@jmondi.org>\n\t<20210521154227.60186-9-jacopo@jmondi.org>\n\t<YKjVGLPRDGyrLI8V@oden.dyn.berto.se>\n\t<20210523142251.nnxgwho3quitpjfb@uno.localdomain>\n\t<YKqkBuz9ldyNqkaS@pendragon.ideasonboard.com>\n\t<20210524074755.krl5ykzukwvx76ca@uno.localdomain>\n\t<YK8DawB2BxidonoT@pendragon.ideasonboard.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<YK8DawB2BxidonoT@pendragon.ideasonboard.com>","Subject":"Re: [libcamera-devel] [PATCH v3 8/8] android: Implement flush()\n\tcamera operation","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}}]