Message ID | 20210924064409.317140-1-umang.jain@ideasonboard.com |
---|---|
State | Superseded |
Delegated to: | Umang Jain |
Headers | show |
Series |
|
Related | show |
Hi Umang, thank you for the patch. On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > The Camera3RequestDescriptor containing the capture request > is adding to the descriptors_ map after a call to > CameraWorker::queueRequest(). This is a race condition since > CameraWorker::queueRequest() queues request to libcamera::Camera > asynchronously and the addition of the descriptor to the map > occurs with std::move(). Hence, it might happen that the async > queueRequest() hasn't finished but the descriptor gets std::move()ed. > > Fix it by adding the descriptor to map first, before > CameraWorker::queueRequest(). I don't understand the problem well. I think the pointer of descriptors.request_.get() is not invalidated by std::move(). > > Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > --- > src/android/camera_device.cpp | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > index 21844e51..c4c03d86 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > state_ = State::Running; > } > > - worker_.queueRequest(descriptor.request_.get()); > - > + unsigned long requestCookie = descriptor.request_->cookie(); > { > MutexLocker descriptorsLock(descriptorsMutex_); > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > + descriptors_[requestCookie] = std::move(descriptor); > } > > + worker_.queueRequest(descriptors_[requestCookie].request_.get()); Accessing descriptors_ must be while holding descriptorsMutex_. -Hiro > + > return 0; > } > > -- > 2.31.1 >
Hi Hiro, On 9/24/21 1:10 PM, Hirokazu Honda wrote: > Hi Umang, thank you for the patch. > > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: >> The Camera3RequestDescriptor containing the capture request >> is adding to the descriptors_ map after a call to >> CameraWorker::queueRequest(). This is a race condition since >> CameraWorker::queueRequest() queues request to libcamera::Camera >> asynchronously and the addition of the descriptor to the map >> occurs with std::move(). Hence, it might happen that the async >> queueRequest() hasn't finished but the descriptor gets std::move()ed. >> >> Fix it by adding the descriptor to map first, before >> CameraWorker::queueRequest(). > > I don't understand the problem well. It's a race between queueRequest() and adding to the descriptor_ std::map. queueRequest() is called asynchronously(separate thread) so it might happen that queueRequest is still processing/queuing the request while we std::move()ed the descriptor (Which wraps the request queueRequest is accessing) If we fix the order with correct data-access it fixes the issue. > I think the pointer of descriptors.request_.get() is not invalidated > by std::move(). Why do you think so? :) diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp index a693dcbe..0de7050d 100644 --- a/src/android/camera_device.cpp +++ b/src/android/camera_device.cpp @@ -1063,13 +1063,13 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques state_ = State::Running; } - worker_.queueRequest(descriptor.request_.get()); - { MutexLocker descriptorsLock(descriptorsMutex_); descriptors_[descriptor.request_->cookie()] = std::move(descriptor); } + worker_.queueRequest(descriptor.request_.get()); + is a segfaulted here (and expected no?). > >> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> >> --- >> src/android/camera_device.cpp | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp >> index 21844e51..c4c03d86 100644 >> --- a/src/android/camera_device.cpp >> +++ b/src/android/camera_device.cpp >> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques >> state_ = State::Running; >> } >> >> - worker_.queueRequest(descriptor.request_.get()); >> - >> + unsigned long requestCookie = descriptor.request_->cookie(); >> { >> MutexLocker descriptorsLock(descriptorsMutex_); >> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); >> + descriptors_[requestCookie] = std::move(descriptor); >> } >> >> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > Accessing descriptors_ must be while holding descriptorsMutex_. I am in two minds here, We can lock it yes, but we are just reading from the map so, should be fine? > > -Hiro >> + >> return 0; >> } >> >> -- >> 2.31.1 >>
Hi Umang, On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > Hi Hiro, > > On 9/24/21 1:10 PM, Hirokazu Honda wrote: > > Hi Umang, thank you for the patch. > > > > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > >> The Camera3RequestDescriptor containing the capture request > >> is adding to the descriptors_ map after a call to > >> CameraWorker::queueRequest(). This is a race condition since > >> CameraWorker::queueRequest() queues request to libcamera::Camera > >> asynchronously and the addition of the descriptor to the map > >> occurs with std::move(). Hence, it might happen that the async > >> queueRequest() hasn't finished but the descriptor gets std::move()ed. > >> > >> Fix it by adding the descriptor to map first, before > >> CameraWorker::queueRequest(). > > > > I don't understand the problem well. > > > It's a race between queueRequest() and adding to the descriptor_ std::map. > > queueRequest() is called asynchronously(separate thread) so it might > happen that queueRequest is still processing/queuing the request while > we std::move()ed the descriptor (Which wraps the request queueRequest is > accessing) > > If we fix the order with correct data-access it fixes the issue. > > > I think the pointer of descriptors.request_.get() is not invalidated > > by std::move(). > > > Why do you think so? :) > std::move() invalidates descriptor. Yes, descriptor.request_ becomes invalidated after std::move(). descriptor must not be accessed after std::move(). However, queueRequest holds a pointer value, descriptor.request_.get(). descriptor is move()d to descriptors_ and thus descriptor.request_ is alive until descriptor is removed in descriptors_. So the passed pointer should still be valid after executing std::move(). > diff --git a/src/android/camera_device.cpp > b/src/android/camera_device.cpp > index a693dcbe..0de7050d 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -1063,13 +1063,13 @@ int > CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > state_ = State::Running; > } > > - worker_.queueRequest(descriptor.request_.get()); > - > { > MutexLocker descriptorsLock(descriptorsMutex_); > descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > } > > + worker_.queueRequest(descriptor.request_.get()); > + > > is a segfaulted here (and expected no?). > > > > >> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > >> --- > >> src/android/camera_device.cpp | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > >> index 21844e51..c4c03d86 100644 > >> --- a/src/android/camera_device.cpp > >> +++ b/src/android/camera_device.cpp > >> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > >> state_ = State::Running; > >> } > >> > >> - worker_.queueRequest(descriptor.request_.get()); > >> - > >> + unsigned long requestCookie = descriptor.request_->cookie(); > >> { > >> MutexLocker descriptorsLock(descriptorsMutex_); > >> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > >> + descriptors_[requestCookie] = std::move(descriptor); > >> } > >> > >> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > > Accessing descriptors_ must be while holding descriptorsMutex_. > > > I am in two minds here, > > We can lock it yes, but we are just reading from the map so, should be > fine? Regardless of reading/writing, descriptorMutex_ must be acquired whenever accessing descriptors_. Some other thread can clear descriptors in parallel. Parallel executing operations against std::map corrupts it. By the way, std::map::operator[] is not a constant function. In fact, it can create a new node if a key does not exist. -Hiro > > > > > -Hiro > >> + > >> return 0; > >> } > >> > >> -- > >> 2.31.1 > >>
Hi Hiro, On 9/24/21 5:07 PM, Hirokazu Honda wrote: > Hi Umang, > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote: >> Hi Hiro, >> >> On 9/24/21 1:10 PM, Hirokazu Honda wrote: >>> Hi Umang, thank you for the patch. >>> >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: >>>> The Camera3RequestDescriptor containing the capture request >>>> is adding to the descriptors_ map after a call to >>>> CameraWorker::queueRequest(). This is a race condition since >>>> CameraWorker::queueRequest() queues request to libcamera::Camera >>>> asynchronously and the addition of the descriptor to the map >>>> occurs with std::move(). Hence, it might happen that the async >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed. >>>> >>>> Fix it by adding the descriptor to map first, before >>>> CameraWorker::queueRequest(). >>> I don't understand the problem well. >> >> It's a race between queueRequest() and adding to the descriptor_ std::map. >> >> queueRequest() is called asynchronously(separate thread) so it might >> happen that queueRequest is still processing/queuing the request while >> we std::move()ed the descriptor (Which wraps the request queueRequest is >> accessing) >> >> If we fix the order with correct data-access it fixes the issue. >> >>> I think the pointer of descriptors.request_.get() is not invalidated >>> by std::move(). >> >> Why do you think so? :) >> > std::move() invalidates descriptor. Yes, descriptor.request_ becomes > invalidated after std::move(). descriptor must not be accessed after > std::move(). > However, queueRequest holds a pointer value, > descriptor.request_.get(). descriptor is move()d to descriptors_ and > thus descriptor.request_ is alive until descriptor is removed in > descriptors_. > So the passed pointer should still be valid after executing std::move(). Generally, in other programming models, I have often seen data that has been move()d should be now/further accessed(if necessary) from that container that they have been move()d into. It clearly differentiates between the ownership model (before and after) and I find it as a good mental model to have (helps especially when there is a long function implementation ) Yes, we can store a pointer before the move and call queueRequest() with it after the move. Either of the methods, I am not against any one. > >> diff --git a/src/android/camera_device.cpp >> b/src/android/camera_device.cpp >> index a693dcbe..0de7050d 100644 >> --- a/src/android/camera_device.cpp >> +++ b/src/android/camera_device.cpp >> @@ -1063,13 +1063,13 @@ int >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques >> state_ = State::Running; >> } >> >> - worker_.queueRequest(descriptor.request_.get()); >> - >> { >> MutexLocker descriptorsLock(descriptorsMutex_); >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor); >> } >> >> + worker_.queueRequest(descriptor.request_.get()); >> + >> >> is a segfaulted here (and expected no?). >> >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> >>>> --- >>>> src/android/camera_device.cpp | 7 ++++--- >>>> 1 file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp >>>> index 21844e51..c4c03d86 100644 >>>> --- a/src/android/camera_device.cpp >>>> +++ b/src/android/camera_device.cpp >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques >>>> state_ = State::Running; >>>> } >>>> >>>> - worker_.queueRequest(descriptor.request_.get()); >>>> - >>>> + unsigned long requestCookie = descriptor.request_->cookie(); >>>> { >>>> MutexLocker descriptorsLock(descriptorsMutex_); >>>> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); >>>> + descriptors_[requestCookie] = std::move(descriptor); >>>> } >>>> >>>> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); >>> Accessing descriptors_ must be while holding descriptorsMutex_. >> >> I am in two minds here, >> >> We can lock it yes, but we are just reading from the map so, should be >> fine? > Regardless of reading/writing, descriptorMutex_ must be acquired > whenever accessing descriptors_. > Some other thread can clear descriptors in parallel. Parallel Not sure, if this a possible yet, but sure! Thanks for the inputs. > executing operations against std::map corrupts it. > By the way, std::map::operator[] is not a constant function. In fact, > it can create a new node if a key does not exist. > > -Hiro >>> -Hiro >>>> + >>>> return 0; >>>> } >>>> >>>> -- >>>> 2.31.1 >>>>
Hi Umang, On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > Hi Hiro, > > On 9/24/21 5:07 PM, Hirokazu Honda wrote: > > Hi Umang, > > > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > >> Hi Hiro, > >> > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote: > >>> Hi Umang, thank you for the patch. > >>> > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > >>>> The Camera3RequestDescriptor containing the capture request > >>>> is adding to the descriptors_ map after a call to > >>>> CameraWorker::queueRequest(). This is a race condition since > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera > >>>> asynchronously and the addition of the descriptor to the map > >>>> occurs with std::move(). Hence, it might happen that the async > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed. > >>>> > >>>> Fix it by adding the descriptor to map first, before > >>>> CameraWorker::queueRequest(). > >>> I don't understand the problem well. > >> > >> It's a race between queueRequest() and adding to the descriptor_ std::map. > >> > >> queueRequest() is called asynchronously(separate thread) so it might > >> happen that queueRequest is still processing/queuing the request while > >> we std::move()ed the descriptor (Which wraps the request queueRequest is > >> accessing) > >> > >> If we fix the order with correct data-access it fixes the issue. > >> > >>> I think the pointer of descriptors.request_.get() is not invalidated > >>> by std::move(). > >> > >> Why do you think so? :) > >> > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes > > invalidated after std::move(). descriptor must not be accessed after > > std::move(). > > However, queueRequest holds a pointer value, > > descriptor.request_.get(). descriptor is move()d to descriptors_ and > > thus descriptor.request_ is alive until descriptor is removed in > > descriptors_. > > So the passed pointer should still be valid after executing std::move(). > > > Generally, in other programming models, I have often seen data that has > been move()d should be now/further accessed(if necessary) from that > container that they have been move()d into. It clearly differentiates > between the ownership model (before and after) and I find it as a good > mental model to have (helps especially when there is a long function > implementation ) > > Yes, we can store a pointer before the move and call queueRequest() with > it after the move. Either of the methods, I am not against any one. > Since copying a pointer value is before std::move(), it seems to be natural for me to use descriptor.request_.get(), which is the original code. Hmm, I don't understand why the original code doesn't work. I guess the seg fault is due to other reason. -Hiro > > > >> diff --git a/src/android/camera_device.cpp > >> b/src/android/camera_device.cpp > >> index a693dcbe..0de7050d 100644 > >> --- a/src/android/camera_device.cpp > >> +++ b/src/android/camera_device.cpp > >> @@ -1063,13 +1063,13 @@ int > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > >> state_ = State::Running; > >> } > >> > >> - worker_.queueRequest(descriptor.request_.get()); > >> - > >> { > >> MutexLocker descriptorsLock(descriptorsMutex_); > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > >> } > >> > >> + worker_.queueRequest(descriptor.request_.get()); > >> + > >> > >> is a segfaulted here (and expected no?). > >> > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > >>>> --- > >>>> src/android/camera_device.cpp | 7 ++++--- > >>>> 1 file changed, 4 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > >>>> index 21844e51..c4c03d86 100644 > >>>> --- a/src/android/camera_device.cpp > >>>> +++ b/src/android/camera_device.cpp > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > >>>> state_ = State::Running; > >>>> } > >>>> > >>>> - worker_.queueRequest(descriptor.request_.get()); > >>>> - > >>>> + unsigned long requestCookie = descriptor.request_->cookie(); > >>>> { > >>>> MutexLocker descriptorsLock(descriptorsMutex_); > >>>> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > >>>> + descriptors_[requestCookie] = std::move(descriptor); > >>>> } > >>>> > >>>> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > >>> Accessing descriptors_ must be while holding descriptorsMutex_. > >> > >> I am in two minds here, > >> > >> We can lock it yes, but we are just reading from the map so, should be > >> fine? > > Regardless of reading/writing, descriptorMutex_ must be acquired > > whenever accessing descriptors_. > > Some other thread can clear descriptors in parallel. Parallel > > > Not sure, if this a possible yet, but sure! > > Thanks for the inputs. > > > executing operations against std::map corrupts it. > > By the way, std::map::operator[] is not a constant function. In fact, > > it can create a new node if a key does not exist. > > > > -Hiro > >>> -Hiro > >>>> + > >>>> return 0; > >>>> } > >>>> > >>>> -- > >>>> 2.31.1 > >>>>
Hi Umang, On Fri, Sep 24, 2021 at 9:57 PM Hirokazu Honda <hiroh@chromium.org> wrote: > > Hi Umang, > > On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > > Hi Hiro, > > > > On 9/24/21 5:07 PM, Hirokazu Honda wrote: > > > Hi Umang, > > > > > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > >> Hi Hiro, > > >> > > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote: > > >>> Hi Umang, thank you for the patch. > > >>> > > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > >>>> The Camera3RequestDescriptor containing the capture request > > >>>> is adding to the descriptors_ map after a call to > > >>>> CameraWorker::queueRequest(). This is a race condition since > > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera > > >>>> asynchronously and the addition of the descriptor to the map > > >>>> occurs with std::move(). Hence, it might happen that the async > > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed. > > >>>> > > >>>> Fix it by adding the descriptor to map first, before > > >>>> CameraWorker::queueRequest(). > > >>> I don't understand the problem well. > > >> > > >> It's a race between queueRequest() and adding to the descriptor_ std::map. > > >> > > >> queueRequest() is called asynchronously(separate thread) so it might > > >> happen that queueRequest is still processing/queuing the request while > > >> we std::move()ed the descriptor (Which wraps the request queueRequest is > > >> accessing) > > >> > > >> If we fix the order with correct data-access it fixes the issue. > > >> > > >>> I think the pointer of descriptors.request_.get() is not invalidated > > >>> by std::move(). > > >> > > >> Why do you think so? :) > > >> > > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes > > > invalidated after std::move(). descriptor must not be accessed after > > > std::move(). > > > However, queueRequest holds a pointer value, > > > descriptor.request_.get(). descriptor is move()d to descriptors_ and > > > thus descriptor.request_ is alive until descriptor is removed in > > > descriptors_. > > > So the passed pointer should still be valid after executing std::move(). > > > > > > Generally, in other programming models, I have often seen data that has > > been move()d should be now/further accessed(if necessary) from that > > container that they have been move()d into. It clearly differentiates > > between the ownership model (before and after) and I find it as a good > > mental model to have (helps especially when there is a long function > > implementation ) > > > > Yes, we can store a pointer before the move and call queueRequest() with > > it after the move. Either of the methods, I am not against any one. > > > > Since copying a pointer value is before std::move(), it seems to be > natural for me to use descriptor.request_.get(), which is the original > code. > Hmm, I don't understand why the original code doesn't work. I guess > the seg fault is due to other reason. > Would you mind investigating more? Best Regards, -Hiro > -Hiro > > > > > >> diff --git a/src/android/camera_device.cpp > > >> b/src/android/camera_device.cpp > > >> index a693dcbe..0de7050d 100644 > > >> --- a/src/android/camera_device.cpp > > >> +++ b/src/android/camera_device.cpp > > >> @@ -1063,13 +1063,13 @@ int > > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > >> state_ = State::Running; > > >> } > > >> > > >> - worker_.queueRequest(descriptor.request_.get()); > > >> - > > >> { > > >> MutexLocker descriptorsLock(descriptorsMutex_); > > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > >> } > > >> > > >> + worker_.queueRequest(descriptor.request_.get()); > > >> + > > >> > > >> is a segfaulted here (and expected no?). > > >> > > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > > >>>> --- > > >>>> src/android/camera_device.cpp | 7 ++++--- > > >>>> 1 file changed, 4 insertions(+), 3 deletions(-) > > >>>> > > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > >>>> index 21844e51..c4c03d86 100644 > > >>>> --- a/src/android/camera_device.cpp > > >>>> +++ b/src/android/camera_device.cpp > > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > >>>> state_ = State::Running; > > >>>> } > > >>>> > > >>>> - worker_.queueRequest(descriptor.request_.get()); > > >>>> - > > >>>> + unsigned long requestCookie = descriptor.request_->cookie(); > > >>>> { > > >>>> MutexLocker descriptorsLock(descriptorsMutex_); > > >>>> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > >>>> + descriptors_[requestCookie] = std::move(descriptor); > > >>>> } > > >>>> > > >>>> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > > >>> Accessing descriptors_ must be while holding descriptorsMutex_. > > >> > > >> I am in two minds here, > > >> > > >> We can lock it yes, but we are just reading from the map so, should be > > >> fine? > > > Regardless of reading/writing, descriptorMutex_ must be acquired > > > whenever accessing descriptors_. > > > Some other thread can clear descriptors in parallel. Parallel > > > > > > Not sure, if this a possible yet, but sure! > > > > Thanks for the inputs. > > > > > executing operations against std::map corrupts it. > > > By the way, std::map::operator[] is not a constant function. In fact, > > > it can create a new node if a key does not exist. > > > > > > -Hiro > > >>> -Hiro > > >>>> + > > >>>> return 0; > > >>>> } > > >>>> > > >>>> -- > > >>>> 2.31.1 > > >>>>
+Jacopo Mondi I saw Jacopo pointed this in https://patchwork.libcamera.org/patch/13866/. I consider there is no race condition here. Jacopo, am I wrong? -Hiro On Mon, Sep 27, 2021 at 2:25 PM Hirokazu Honda <hiroh@chromium.org> wrote: > > Hi Umang, > > On Fri, Sep 24, 2021 at 9:57 PM Hirokazu Honda <hiroh@chromium.org> wrote: > > > > Hi Umang, > > > > On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > > > > Hi Hiro, > > > > > > On 9/24/21 5:07 PM, Hirokazu Honda wrote: > > > > Hi Umang, > > > > > > > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > >> Hi Hiro, > > > >> > > > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote: > > > >>> Hi Umang, thank you for the patch. > > > >>> > > > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > >>>> The Camera3RequestDescriptor containing the capture request > > > >>>> is adding to the descriptors_ map after a call to > > > >>>> CameraWorker::queueRequest(). This is a race condition since > > > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera > > > >>>> asynchronously and the addition of the descriptor to the map > > > >>>> occurs with std::move(). Hence, it might happen that the async > > > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed. > > > >>>> > > > >>>> Fix it by adding the descriptor to map first, before > > > >>>> CameraWorker::queueRequest(). > > > >>> I don't understand the problem well. > > > >> > > > >> It's a race between queueRequest() and adding to the descriptor_ std::map. > > > >> > > > >> queueRequest() is called asynchronously(separate thread) so it might > > > >> happen that queueRequest is still processing/queuing the request while > > > >> we std::move()ed the descriptor (Which wraps the request queueRequest is > > > >> accessing) > > > >> > > > >> If we fix the order with correct data-access it fixes the issue. > > > >> > > > >>> I think the pointer of descriptors.request_.get() is not invalidated > > > >>> by std::move(). > > > >> > > > >> Why do you think so? :) > > > >> > > > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes > > > > invalidated after std::move(). descriptor must not be accessed after > > > > std::move(). > > > > However, queueRequest holds a pointer value, > > > > descriptor.request_.get(). descriptor is move()d to descriptors_ and > > > > thus descriptor.request_ is alive until descriptor is removed in > > > > descriptors_. > > > > So the passed pointer should still be valid after executing std::move(). > > > > > > > > > Generally, in other programming models, I have often seen data that has > > > been move()d should be now/further accessed(if necessary) from that > > > container that they have been move()d into. It clearly differentiates > > > between the ownership model (before and after) and I find it as a good > > > mental model to have (helps especially when there is a long function > > > implementation ) > > > > > > Yes, we can store a pointer before the move and call queueRequest() with > > > it after the move. Either of the methods, I am not against any one. > > > > > > > Since copying a pointer value is before std::move(), it seems to be > > natural for me to use descriptor.request_.get(), which is the original > > code. > > Hmm, I don't understand why the original code doesn't work. I guess > > the seg fault is due to other reason. > > > > Would you mind investigating more? > > Best Regards, > -Hiro > > > -Hiro > > > > > > > >> diff --git a/src/android/camera_device.cpp > > > >> b/src/android/camera_device.cpp > > > >> index a693dcbe..0de7050d 100644 > > > >> --- a/src/android/camera_device.cpp > > > >> +++ b/src/android/camera_device.cpp > > > >> @@ -1063,13 +1063,13 @@ int > > > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > >> state_ = State::Running; > > > >> } > > > >> > > > >> - worker_.queueRequest(descriptor.request_.get()); > > > >> - > > > >> { > > > >> MutexLocker descriptorsLock(descriptorsMutex_); > > > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > > >> } > > > >> > > > >> + worker_.queueRequest(descriptor.request_.get()); > > > >> + > > > >> > > > >> is a segfaulted here (and expected no?). > > > >> > > > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > > > >>>> --- > > > >>>> src/android/camera_device.cpp | 7 ++++--- > > > >>>> 1 file changed, 4 insertions(+), 3 deletions(-) > > > >>>> > > > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > >>>> index 21844e51..c4c03d86 100644 > > > >>>> --- a/src/android/camera_device.cpp > > > >>>> +++ b/src/android/camera_device.cpp > > > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > >>>> state_ = State::Running; > > > >>>> } > > > >>>> > > > >>>> - worker_.queueRequest(descriptor.request_.get()); > > > >>>> - > > > >>>> + unsigned long requestCookie = descriptor.request_->cookie(); > > > >>>> { > > > >>>> MutexLocker descriptorsLock(descriptorsMutex_); > > > >>>> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > > >>>> + descriptors_[requestCookie] = std::move(descriptor); > > > >>>> } > > > >>>> > > > >>>> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > > > >>> Accessing descriptors_ must be while holding descriptorsMutex_. > > > >> > > > >> I am in two minds here, > > > >> > > > >> We can lock it yes, but we are just reading from the map so, should be > > > >> fine? > > > > Regardless of reading/writing, descriptorMutex_ must be acquired > > > > whenever accessing descriptors_. > > > > Some other thread can clear descriptors in parallel. Parallel > > > > > > > > > Not sure, if this a possible yet, but sure! > > > > > > Thanks for the inputs. > > > > > > > executing operations against std::map corrupts it. > > > > By the way, std::map::operator[] is not a constant function. In fact, > > > > it can create a new node if a key does not exist. > > > > > > > > -Hiro > > > >>> -Hiro > > > >>>> + > > > >>>> return 0; > > > >>>> } > > > >>>> > > > >>>> -- > > > >>>> 2.31.1 > > > >>>>
Hi Hiro, On Mon, Sep 27, 2021 at 02:45:04PM +0900, Hirokazu Honda wrote: > +Jacopo Mondi > > I saw Jacopo pointed this in https://patchwork.libcamera.org/patch/13866/. > I consider there is no race condition here. Jacopo, am I wrong? Not really sure to which part you are referring to, if the race Umang is investigating or the access to the descriptors_ map being protected or not. I'll try go through the discussion in previous emails as here some context have been removed during the conversation. > > -Hiro > > On Mon, Sep 27, 2021 at 2:25 PM Hirokazu Honda <hiroh@chromium.org> wrote: > > > > Hi Umang, > > > > On Fri, Sep 24, 2021 at 9:57 PM Hirokazu Honda <hiroh@chromium.org> wrote: > > > > > > Hi Umang, > > > > > > On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > > > > > > Hi Hiro, > > > > > > > > On 9/24/21 5:07 PM, Hirokazu Honda wrote: > > > > > Hi Umang, > > > > > > > > > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > > >> Hi Hiro, > > > > >> > > > > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote: > > > > >>> Hi Umang, thank you for the patch. > > > > >>> > > > > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > > >>>> The Camera3RequestDescriptor containing the capture request > > > > >>>> is adding to the descriptors_ map after a call to > > > > >>>> CameraWorker::queueRequest(). This is a race condition since > > > > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera > > > > >>>> asynchronously and the addition of the descriptor to the map > > > > >>>> occurs with std::move(). Hence, it might happen that the async > > > > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed. > > > > >>>> > > > > >>>> Fix it by adding the descriptor to map first, before > > > > >>>> CameraWorker::queueRequest(). > > > > >>> I don't understand the problem well. > > > > >> > > > > >> It's a race between queueRequest() and adding to the descriptor_ std::map. > > > > >> > > > > >> queueRequest() is called asynchronously(separate thread) so it might > > > > >> happen that queueRequest is still processing/queuing the request while > > > > >> we std::move()ed the descriptor (Which wraps the request queueRequest is > > > > >> accessing) > > > > >> > > > > >> If we fix the order with correct data-access it fixes the issue. > > > > >> > > > > >>> I think the pointer of descriptors.request_.get() is not invalidated > > > > >>> by std::move(). > > > > >> > > > > >> Why do you think so? :) > > > > >> > > > > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes > > > > > invalidated after std::move(). descriptor must not be accessed after > > > > > std::move(). > > > > > However, queueRequest holds a pointer value, > > > > > descriptor.request_.get(). descriptor is move()d to descriptors_ and > > > > > thus descriptor.request_ is alive until descriptor is removed in > > > > > descriptors_. > > > > > So the passed pointer should still be valid after executing std::move(). > > > > > > > > > > > > Generally, in other programming models, I have often seen data that has > > > > been move()d should be now/further accessed(if necessary) from that > > > > container that they have been move()d into. It clearly differentiates > > > > between the ownership model (before and after) and I find it as a good > > > > mental model to have (helps especially when there is a long function > > > > implementation ) > > > > > > > > Yes, we can store a pointer before the move and call queueRequest() with > > > > it after the move. Either of the methods, I am not against any one. > > > > > > > > > > Since copying a pointer value is before std::move(), it seems to be > > > natural for me to use descriptor.request_.get(), which is the original > > > code. > > > Hmm, I don't understand why the original code doesn't work. I guess > > > the seg fault is due to other reason. > > > > > > > Would you mind investigating more? > > > > Best Regards, > > -Hiro > > > > > -Hiro > > > > > > > > > >> diff --git a/src/android/camera_device.cpp > > > > >> b/src/android/camera_device.cpp > > > > >> index a693dcbe..0de7050d 100644 > > > > >> --- a/src/android/camera_device.cpp > > > > >> +++ b/src/android/camera_device.cpp > > > > >> @@ -1063,13 +1063,13 @@ int > > > > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > >> state_ = State::Running; > > > > >> } > > > > >> > > > > >> - worker_.queueRequest(descriptor.request_.get()); > > > > >> - > > > > >> { > > > > >> MutexLocker descriptorsLock(descriptorsMutex_); > > > > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > > > >> } > > > > >> > > > > >> + worker_.queueRequest(descriptor.request_.get()); > > > > >> + > > > > >> > > > > >> is a segfaulted here (and expected no?). > > > > >> > > > > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > > > > >>>> --- > > > > >>>> src/android/camera_device.cpp | 7 ++++--- > > > > >>>> 1 file changed, 4 insertions(+), 3 deletions(-) > > > > >>>> > > > > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > > >>>> index 21844e51..c4c03d86 100644 > > > > >>>> --- a/src/android/camera_device.cpp > > > > >>>> +++ b/src/android/camera_device.cpp > > > > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > > >>>> state_ = State::Running; > > > > >>>> } > > > > >>>> > > > > >>>> - worker_.queueRequest(descriptor.request_.get()); > > > > >>>> - > > > > >>>> + unsigned long requestCookie = descriptor.request_->cookie(); > > > > >>>> { > > > > >>>> MutexLocker descriptorsLock(descriptorsMutex_); > > > > >>>> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > > > >>>> + descriptors_[requestCookie] = std::move(descriptor); > > > > >>>> } > > > > >>>> > > > > >>>> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > > > > >>> Accessing descriptors_ must be while holding descriptorsMutex_. > > > > >> > > > > >> I am in two minds here, > > > > >> > > > > >> We can lock it yes, but we are just reading from the map so, should be > > > > >> fine? > > > > > Regardless of reading/writing, descriptorMutex_ must be acquired > > > > > whenever accessing descriptors_. > > > > > Some other thread can clear descriptors in parallel. Parallel > > > > > > > > > > > > Not sure, if this a possible yet, but sure! > > > > > > > > Thanks for the inputs. > > > > > > > > > executing operations against std::map corrupts it. > > > > > By the way, std::map::operator[] is not a constant function. In fact, > > > > > it can create a new node if a key does not exist. > > > > > > > > > > -Hiro > > > > >>> -Hiro > > > > >>>> + > > > > >>>> return 0; > > > > >>>> } > > > > >>>> > > > > >>>> -- > > > > >>>> 2.31.1 > > > > >>>>
Hi Umang, Hiro, On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote: > Hi Hiro, > > On 9/24/21 1:10 PM, Hirokazu Honda wrote: > > Hi Umang, thank you for the patch. > > > > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: > > > The Camera3RequestDescriptor containing the capture request > > > is adding to the descriptors_ map after a call to > > > CameraWorker::queueRequest(). This is a race condition since > > > CameraWorker::queueRequest() queues request to libcamera::Camera > > > asynchronously and the addition of the descriptor to the map > > > occurs with std::move(). Hence, it might happen that the async > > > queueRequest() hasn't finished but the descriptor gets std::move()ed. > > > > > > Fix it by adding the descriptor to map first, before > > > CameraWorker::queueRequest(). > > > > I don't understand the problem well. > > > It's a race between queueRequest() and adding to the descriptor_ std::map. > > queueRequest() is called asynchronously(separate thread) so it might happen > that queueRequest is still processing/queuing the request while we > std::move()ed the descriptor (Which wraps the request queueRequest is > accessing) > > If we fix the order with correct data-access it fixes the issue. > > > I think the pointer of descriptors.request_.get() is not invalidated > > by std::move(). You know, I'm not really sure :) Let's try to find it out... This is the descriptor lifecycle inside this function int processRequest() { // Allocated on the stack, request_ is a unique_ptr<> embedded in // descriptor Camera3RequestDescriptor descriptor(camera_.get(), camera3Request); .... worker_.queueRequest(descriptor.request_.get()); descriptors_[descriptor.request_->cookie()] = std::move(descriptor); } Camera3RequestDescriptor::request_ is initialized as request_ = std::make_unique<CaptureRequest>(camera); CameraDevice::descriptors_ is a map of type: std::map<uint64_t, Camera3RequestDescriptor> descriptors_; And Camera3RequestDescriptor has a defaulted move assignment operator=() Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default; As Camera3RequestDescriptor member are: struct Camera3RequestDescriptor { uint32_t frameNumber_ = 0; std::vector<camera3_stream_buffer_t> buffers_; std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_; CameraMetadata settings_; std::unique_ptr<CaptureRequest> request_; }; And most of them have a non-implicitly defined move assignment operator (vecors and unique_ptr) I think we fall in the "Implicitly defined" section of [1] ------------------------------------------------------------------------------- Implicitly-defined move assignment operator If the implicitly-declared move assignment operator is neither deleted nor trivial, it is defined (that is, a function body is generated and compiled) by the compiler if odr-used or needed for constant evaluation (since C++14). For union types, the implicitly-defined move assignment operator copies the object representation (as by std::memmove). For non-union class types (class and struct), the move assignment operator performs full member-wise move assignment of the object's direct bases and immediate non-static members, in their declaration order, using built-in assignment for the scalars, memberwise move-assignment for arrays, and move assignment operator for class types (called non-virtually). ------------------------------------------------------------------------------- Which means we're effectively calling std::unique_ptr::operator[](&&) on request_. As std::unique_ptr::operator[](&&other) is equivalent to reset(other.release()); I guess what happens is that the location in memory of request_ doesn't change but it's only its ownership that gets moved to the newly created entry in the descriptors_ map. Hence I think the code is safe the way it is, even if it might look suspicious or fragile. I think it could be made 'safer' if we want to by: - worker_.queueRequest(descriptor.request_.get()); - + CaptureRequest *req; { MutexLocker descriptorsLock(descriptorsMutex_); + unsigned long requestCookie = descriptor.request_->cookie(); - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); + descriptors_[requestCookie] = std::move(descriptor); + req = descriptors_[requestCookie].request_.get(); } + worker_.queueRequest(req); + return 0; } But I didn't get if Umang hit any issue that lead him to send this patch. [1] https://en.cppreference.com/w/cpp/language/move_assignment > > > Why do you think so? :) > > diff --git a/src/android/camera_device.cpp > b/src/android/camera_device.cpp > index a693dcbe..0de7050d 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -1063,13 +1063,13 @@ int > CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > state_ = State::Running; > } > > - worker_.queueRequest(descriptor.request_.get()); > - > { > MutexLocker descriptorsLock(descriptorsMutex_); > descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > } > > + worker_.queueRequest(descriptor.request_.get()); > + > > is a segfaulted here (and expected no?). For sure it does, you're now using 'descriptor' after it has been moved and it's been left in unspecified state. As shown above std::unique_ptr::operator=(&&) has been called on request_ which calls request_.release(), hence accessing it with std::unique_ptr::get() is undefined behaviour. > > > > > > Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > > > --- > > > src/android/camera_device.cpp | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > > > index 21844e51..c4c03d86 100644 > > > --- a/src/android/camera_device.cpp > > > +++ b/src/android/camera_device.cpp > > > @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > > > state_ = State::Running; > > > } > > > > > > - worker_.queueRequest(descriptor.request_.get()); > > > - > > > + unsigned long requestCookie = descriptor.request_->cookie(); > > > { > > > MutexLocker descriptorsLock(descriptorsMutex_); > > > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > > > + descriptors_[requestCookie] = std::move(descriptor); > > > } > > > > > > + worker_.queueRequest(descriptors_[requestCookie].request_.get()); > > Accessing descriptors_ must be while holding descriptorsMutex_. > > > I am in two minds here, > > We can lock it yes, but we are just reading from the map so, should be fine? I think accessing descriptors_ should always happen while holding the descriptorsMutex_ lock, as we extract nodes from the map in requestComplete() which might invalidates iterators and re-points the internal pointers in the map. I can't tell what happens if done concurrently with accessing the map, but doesn't seem like a good idea. Thanks j > > > > > -Hiro > > > + > > > return 0; > > > } > > > > > > -- > > > 2.31.1 > > >
Hi Jacopo, Hiro, On 9/27/21 7:20 PM, Jacopo Mondi wrote: > Hi Umang, Hiro, > > On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote: >> Hi Hiro, >> >> On 9/24/21 1:10 PM, Hirokazu Honda wrote: >>> Hi Umang, thank you for the patch. >>> >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote: >>>> The Camera3RequestDescriptor containing the capture request >>>> is adding to the descriptors_ map after a call to >>>> CameraWorker::queueRequest(). This is a race condition since >>>> CameraWorker::queueRequest() queues request to libcamera::Camera >>>> asynchronously and the addition of the descriptor to the map >>>> occurs with std::move(). Hence, it might happen that the async >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed. >>>> >>>> Fix it by adding the descriptor to map first, before >>>> CameraWorker::queueRequest(). >>> I don't understand the problem well. >> >> It's a race between queueRequest() and adding to the descriptor_ std::map. >> >> queueRequest() is called asynchronously(separate thread) so it might happen >> that queueRequest is still processing/queuing the request while we >> std::move()ed the descriptor (Which wraps the request queueRequest is >> accessing) >> >> If we fix the order with correct data-access it fixes the issue. >> >>> I think the pointer of descriptors.request_.get() is not invalidated >>> by std::move(). > You know, I'm not really sure :) > Let's try to find it out... > > This is the descriptor lifecycle inside this function > > int processRequest() > { > // Allocated on the stack, request_ is a unique_ptr<> embedded in > // descriptor > Camera3RequestDescriptor descriptor(camera_.get(), camera3Request); > > .... > > worker_.queueRequest(descriptor.request_.get()); > descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > } > > Camera3RequestDescriptor::request_ is initialized as > > request_ = std::make_unique<CaptureRequest>(camera); > > CameraDevice::descriptors_ is a map of type: > std::map<uint64_t, Camera3RequestDescriptor> descriptors_; > > And Camera3RequestDescriptor has a defaulted move assignment operator=() > Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default; > > As Camera3RequestDescriptor member are: > > struct Camera3RequestDescriptor { > > uint32_t frameNumber_ = 0; > std::vector<camera3_stream_buffer_t> buffers_; > std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_; > CameraMetadata settings_; > std::unique_ptr<CaptureRequest> request_; > }; > > And most of them have a non-implicitly defined move assignment operator (vecors > and unique_ptr) I think we fall in the "Implicitly defined" section of [1] > > ------------------------------------------------------------------------------- > Implicitly-defined move assignment operator > > If the implicitly-declared move assignment operator is neither deleted nor > trivial, it is defined (that is, a function body is generated and compiled) by > the compiler if odr-used or needed for constant evaluation (since C++14). > > For union types, the implicitly-defined move assignment operator copies the > object representation (as by std::memmove). > > For non-union class types (class and struct), the move assignment operator > performs full member-wise move assignment of the object's direct bases and > immediate non-static members, in their declaration order, using built-in > assignment for the scalars, memberwise move-assignment for arrays, and move > assignment operator for class types (called non-virtually). > ------------------------------------------------------------------------------- > > Which means we're effectively calling std::unique_ptr::operator[](&&) on > request_. > > As std::unique_ptr::operator[](&&other) is equivalent to > > reset(other.release()); > > I guess what happens is that the location in memory of request_ doesn't change > but it's only its ownership that gets moved to the newly created entry in the > descriptors_ map. > > Hence I think the code is safe the way it is, even if it might look suspicious > or fragile. On a lighter note, looks don't matter ? :P I think I'll slightly dis-agree that it is safe.. I understand that the ::move is not really copying to a different location of memory, rather transfer of ownership happens, so technically it might not cause any problems, since pointers are still pointing to memory location they are supposed to point to (possibly v2 version of this patch is based on this fact). But, ::move()ing while an async operation is in play on a memory location does concern me if we look at it. If we want to queueRequest() /before/ the descriptor is moved, I think that should also happen with a lock, since, descriptor.request_.get() is a descriptor access, no? It might happen that the queueRequest() might also set something on the CaptureRequest pointer, so essentially, we would still to lock the entire async operation before firing it off, until it returns. > > I think it could be made 'safer' if we want to by: > > - worker_.queueRequest(descriptor.request_.get()); > - > + CaptureRequest *req; > { > MutexLocker descriptorsLock(descriptorsMutex_); > + unsigned long requestCookie = descriptor.request_->cookie(); > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > + descriptors_[requestCookie] = std::move(descriptor); > + req = descriptors_[requestCookie].request_.get(); > } > > + worker_.queueRequest(req); > + > return 0; > } Precisely, that intended goal of the patch is to queueRequest() after the descriptor is moved to the map. > > But I didn't get if Umang hit any issue that lead him to send this patch. Please mind that v2 of this patch is already on the list last week, which might be of interest and further discussion. Also, I didn't get any issues with it, it's something Laurent initially spotted while code review [1] Excerpt below : >/> We have a race condition here, worker_.queueRequest() should go after />/> adding the request to the queue. Could you fix it in a patch on top ? />//>/Do you mean the race condition is existing already, with the />/descriptors_ map (that has been removed from this patch)? / Correct, it's already here. >/Yes, I can introduce a patch before this one, that fixes the race first />/in the map itself. Is my understanding correct? / Sounds good to me. It should be a small patch. [1]: https://lists.libcamera.org/pipermail/libcamera-devel/2021-September/025004.html > > [1] https://en.cppreference.com/w/cpp/language/move_assignment > >> >> Why do you think so? :) >> >> diff --git a/src/android/camera_device.cpp >> b/src/android/camera_device.cpp >> index a693dcbe..0de7050d 100644 >> --- a/src/android/camera_device.cpp >> +++ b/src/android/camera_device.cpp >> @@ -1063,13 +1063,13 @@ int >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques >> state_ = State::Running; >> } >> >> - worker_.queueRequest(descriptor.request_.get()); >> - >> { >> MutexLocker descriptorsLock(descriptorsMutex_); >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor); >> } >> >> + worker_.queueRequest(descriptor.request_.get()); >> + >> >> is a segfaulted here (and expected no?). > For sure it does, you're now using 'descriptor' after it has been moved and it's > been left in unspecified state. As shown above std::unique_ptr::operator=(&&) > has been called on request_ which calls request_.release(), hence accessing it > with std::unique_ptr::get() is undefined behaviour. > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> >>>> --- >>>> src/android/camera_device.cpp | 7 ++++--- >>>> 1 file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp >>>> index 21844e51..c4c03d86 100644 >>>> --- a/src/android/camera_device.cpp >>>> +++ b/src/android/camera_device.cpp >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques >>>> state_ = State::Running; >>>> } >>>> >>>> - worker_.queueRequest(descriptor.request_.get()); >>>> - >>>> + unsigned long requestCookie = descriptor.request_->cookie(); >>>> { >>>> MutexLocker descriptorsLock(descriptorsMutex_); >>>> - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); >>>> + descriptors_[requestCookie] = std::move(descriptor); >>>> } >>>> >>>> + worker_.queueRequest(descriptors_[requestCookie].request_.get()); >>> Accessing descriptors_ must be while holding descriptorsMutex_. >> >> I am in two minds here, >> >> We can lock it yes, but we are just reading from the map so, should be fine? > I think accessing descriptors_ should always happen while holding the > descriptorsMutex_ lock, as we extract nodes from the map in requestComplete() > which might invalidates iterators and re-points the internal pointers in the > map. I can't tell what happens if done concurrently with accessing the map, but > doesn't seem like a good idea. > > Thanks > j > >>> -Hiro >>>> + >>>> return 0; >>>> } >>>> >>>> -- >>>> 2.31.1 >>>>
Hi Umang, Thank you for the patch. On Fri, Sep 24, 2021 at 12:14:09PM +0530, Umang Jain wrote: > The Camera3RequestDescriptor containing the capture request > is adding to the descriptors_ map after a call to > CameraWorker::queueRequest(). This is a race condition since > CameraWorker::queueRequest() queues request to libcamera::Camera > asynchronously and the addition of the descriptor to the map > occurs with std::move(). Hence, it might happen that the async > queueRequest() hasn't finished but the descriptor gets std::move()ed. The issue isn't related to std::move(). The problem is that the request is queued to the worker, and at that point processing moves to another thread. It's thus possible (even if very unlikely, but still possible) that the processing will complete, and CameraDevice::requestComplete() gets called, before this function proceeds to add the request to the descriptors_ map. CameraDevice::requestComplete() will then not find it in the map, and bad things will happen. It's a classic race condition, std::move() doesn't play any role here. > Fix it by adding the descriptor to map first, before > CameraWorker::queueRequest(). > > Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> > --- > src/android/camera_device.cpp | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp > index 21844e51..c4c03d86 100644 > --- a/src/android/camera_device.cpp > +++ b/src/android/camera_device.cpp > @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques > state_ = State::Running; > } > > - worker_.queueRequest(descriptor.request_.get()); > - > + unsigned long requestCookie = descriptor.request_->cookie(); > { > MutexLocker descriptorsLock(descriptorsMutex_); > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); > + descriptors_[requestCookie] = std::move(descriptor); > } > > + worker_.queueRequest(descriptors_[requestCookie].request_.get()); As Hiro mentioned, we shouldn't access descriptors_ without taking the lock. Here's how to fix it: CaptureRequest *request = descriptor.request_.get(); { MutexLocker descriptorsLock(descriptorsMutex_); descriptors_[request->cookie()] = std::move(descriptor); } worker_.queueRequest(request); > + > return 0; > } >
diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp index 21844e51..c4c03d86 100644 --- a/src/android/camera_device.cpp +++ b/src/android/camera_device.cpp @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques state_ = State::Running; } - worker_.queueRequest(descriptor.request_.get()); - + unsigned long requestCookie = descriptor.request_->cookie(); { MutexLocker descriptorsLock(descriptorsMutex_); - descriptors_[descriptor.request_->cookie()] = std::move(descriptor); + descriptors_[requestCookie] = std::move(descriptor); } + worker_.queueRequest(descriptors_[requestCookie].request_.get()); + return 0; }
The Camera3RequestDescriptor containing the capture request is adding to the descriptors_ map after a call to CameraWorker::queueRequest(). This is a race condition since CameraWorker::queueRequest() queues request to libcamera::Camera asynchronously and the addition of the descriptor to the map occurs with std::move(). Hence, it might happen that the async queueRequest() hasn't finished but the descriptor gets std::move()ed. Fix it by adding the descriptor to map first, before CameraWorker::queueRequest(). Signed-off-by: Umang Jain <umang.jain@ideasonboard.com> --- src/android/camera_device.cpp | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)