[libcamera-devel] android: camera_device: Fix race on queuing capture request
diff mbox series

Message ID 20210924064409.317140-1-umang.jain@ideasonboard.com
State Superseded
Delegated to: Umang Jain
Headers show
Series
  • [libcamera-devel] android: camera_device: Fix race on queuing capture request
Related show

Commit Message

Umang Jain Sept. 24, 2021, 6:44 a.m. UTC
The Camera3RequestDescriptor containing the capture request
is adding to the descriptors_ map after a call to
CameraWorker::queueRequest(). This is a race condition since
CameraWorker::queueRequest() queues request to libcamera::Camera
asynchronously and the addition of the descriptor to the map
occurs with std::move(). Hence, it might happen that the async
queueRequest() hasn't finished but the descriptor gets std::move()ed.

Fix it by adding the descriptor to map first, before
CameraWorker::queueRequest().

Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
---
 src/android/camera_device.cpp | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Hirokazu Honda Sept. 24, 2021, 7:40 a.m. UTC | #1
Hi Umang, thank you for the patch.

On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>
> The Camera3RequestDescriptor containing the capture request
> is adding to the descriptors_ map after a call to
> CameraWorker::queueRequest(). This is a race condition since
> CameraWorker::queueRequest() queues request to libcamera::Camera
> asynchronously and the addition of the descriptor to the map
> occurs with std::move(). Hence, it might happen that the async
> queueRequest() hasn't finished but the descriptor gets std::move()ed.
>
> Fix it by adding the descriptor to map first, before
> CameraWorker::queueRequest().


I don't understand the problem well.
I think the pointer of descriptors.request_.get() is not invalidated
by std::move().

>
> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> ---
>  src/android/camera_device.cpp | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 21844e51..c4c03d86 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>                 state_ = State::Running;
>         }
>
> -       worker_.queueRequest(descriptor.request_.get());
> -
> +       unsigned long requestCookie = descriptor.request_->cookie();
>         {
>                 MutexLocker descriptorsLock(descriptorsMutex_);
> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> +               descriptors_[requestCookie] = std::move(descriptor);
>         }
>
> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());

Accessing descriptors_ must be while holding descriptorsMutex_.

-Hiro
> +
>         return 0;
>  }
>
> --
> 2.31.1
>
Umang Jain Sept. 24, 2021, 8:19 a.m. UTC | #2
Hi Hiro,

On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> Hi Umang, thank you for the patch.
>
> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>> The Camera3RequestDescriptor containing the capture request
>> is adding to the descriptors_ map after a call to
>> CameraWorker::queueRequest(). This is a race condition since
>> CameraWorker::queueRequest() queues request to libcamera::Camera
>> asynchronously and the addition of the descriptor to the map
>> occurs with std::move(). Hence, it might happen that the async
>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
>>
>> Fix it by adding the descriptor to map first, before
>> CameraWorker::queueRequest().
>
> I don't understand the problem well.


It's a race between queueRequest() and adding to the descriptor_ std::map.

queueRequest() is called asynchronously(separate thread)  so it might 
happen that queueRequest is still processing/queuing the request while 
we std::move()ed the descriptor (Which wraps the request queueRequest is 
accessing)

If we fix the order with correct data-access it fixes the issue.

> I think the pointer of descriptors.request_.get() is not invalidated
> by std::move().


Why do you think so? :)

     diff --git a/src/android/camera_device.cpp 
b/src/android/camera_device.cpp
     index a693dcbe..0de7050d 100644
     --- a/src/android/camera_device.cpp
     +++ b/src/android/camera_device.cpp
     @@ -1063,13 +1063,13 @@ int 
CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
                     state_ = State::Running;
             }

     - worker_.queueRequest(descriptor.request_.get());
     -
             {
                     MutexLocker descriptorsLock(descriptorsMutex_);
descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
             }

     + worker_.queueRequest(descriptor.request_.get());
     +

is a segfaulted here (and expected no?).

>
>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>> ---
>>   src/android/camera_device.cpp | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>> index 21844e51..c4c03d86 100644
>> --- a/src/android/camera_device.cpp
>> +++ b/src/android/camera_device.cpp
>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>                  state_ = State::Running;
>>          }
>>
>> -       worker_.queueRequest(descriptor.request_.get());
>> -
>> +       unsigned long requestCookie = descriptor.request_->cookie();
>>          {
>>                  MutexLocker descriptorsLock(descriptorsMutex_);
>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>> +               descriptors_[requestCookie] = std::move(descriptor);
>>          }
>>
>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> Accessing descriptors_ must be while holding descriptorsMutex_.


I am in two minds here,

We can lock it yes, but we are just reading from the map so, should be 
fine?

>
> -Hiro
>> +
>>          return 0;
>>   }
>>
>> --
>> 2.31.1
>>
Hirokazu Honda Sept. 24, 2021, 11:37 a.m. UTC | #3
Hi Umang,

On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>
> Hi Hiro,
>
> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > Hi Umang, thank you for the patch.
> >
> > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> >> The Camera3RequestDescriptor containing the capture request
> >> is adding to the descriptors_ map after a call to
> >> CameraWorker::queueRequest(). This is a race condition since
> >> CameraWorker::queueRequest() queues request to libcamera::Camera
> >> asynchronously and the addition of the descriptor to the map
> >> occurs with std::move(). Hence, it might happen that the async
> >> queueRequest() hasn't finished but the descriptor gets std::move()ed.
> >>
> >> Fix it by adding the descriptor to map first, before
> >> CameraWorker::queueRequest().
> >
> > I don't understand the problem well.
>
>
> It's a race between queueRequest() and adding to the descriptor_ std::map.
>
> queueRequest() is called asynchronously(separate thread)  so it might
> happen that queueRequest is still processing/queuing the request while
> we std::move()ed the descriptor (Which wraps the request queueRequest is
> accessing)
>
> If we fix the order with correct data-access it fixes the issue.
>
> > I think the pointer of descriptors.request_.get() is not invalidated
> > by std::move().
>
>
> Why do you think so? :)
>

std::move() invalidates descriptor. Yes, descriptor.request_ becomes
invalidated after std::move(). descriptor must not be accessed after
std::move().
However, queueRequest holds a pointer value,
descriptor.request_.get(). descriptor is move()d to descriptors_ and
thus descriptor.request_ is alive until descriptor is removed in
descriptors_.
So the passed pointer should still be valid after executing std::move().

>      diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
>      index a693dcbe..0de7050d 100644
>      --- a/src/android/camera_device.cpp
>      +++ b/src/android/camera_device.cpp
>      @@ -1063,13 +1063,13 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>                      state_ = State::Running;
>              }
>
>      - worker_.queueRequest(descriptor.request_.get());
>      -
>              {
>                      MutexLocker descriptorsLock(descriptorsMutex_);
> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>              }
>
>      + worker_.queueRequest(descriptor.request_.get());
>      +
>
> is a segfaulted here (and expected no?).
>
> >
> >> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> >> ---
> >>   src/android/camera_device.cpp | 7 ++++---
> >>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> >> index 21844e51..c4c03d86 100644
> >> --- a/src/android/camera_device.cpp
> >> +++ b/src/android/camera_device.cpp
> >> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >>                  state_ = State::Running;
> >>          }
> >>
> >> -       worker_.queueRequest(descriptor.request_.get());
> >> -
> >> +       unsigned long requestCookie = descriptor.request_->cookie();
> >>          {
> >>                  MutexLocker descriptorsLock(descriptorsMutex_);
> >> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> >> +               descriptors_[requestCookie] = std::move(descriptor);
> >>          }
> >>
> >> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > Accessing descriptors_ must be while holding descriptorsMutex_.
>
>
> I am in two minds here,
>
> We can lock it yes, but we are just reading from the map so, should be
> fine?

Regardless of reading/writing, descriptorMutex_ must be acquired
whenever accessing descriptors_.
Some other thread can clear descriptors in parallel. Parallel
executing operations against std::map corrupts it.
By the way, std::map::operator[] is not a constant function. In fact,
it can create a new node if a key does not exist.

-Hiro
>
> >
> > -Hiro
> >> +
> >>          return 0;
> >>   }
> >>
> >> --
> >> 2.31.1
> >>
Umang Jain Sept. 24, 2021, 12:39 p.m. UTC | #4
Hi Hiro,

On 9/24/21 5:07 PM, Hirokazu Honda wrote:
> Hi Umang,
>
> On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>> Hi Hiro,
>>
>> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
>>> Hi Umang, thank you for the patch.
>>>
>>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>>>> The Camera3RequestDescriptor containing the capture request
>>>> is adding to the descriptors_ map after a call to
>>>> CameraWorker::queueRequest(). This is a race condition since
>>>> CameraWorker::queueRequest() queues request to libcamera::Camera
>>>> asynchronously and the addition of the descriptor to the map
>>>> occurs with std::move(). Hence, it might happen that the async
>>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
>>>>
>>>> Fix it by adding the descriptor to map first, before
>>>> CameraWorker::queueRequest().
>>> I don't understand the problem well.
>>
>> It's a race between queueRequest() and adding to the descriptor_ std::map.
>>
>> queueRequest() is called asynchronously(separate thread)  so it might
>> happen that queueRequest is still processing/queuing the request while
>> we std::move()ed the descriptor (Which wraps the request queueRequest is
>> accessing)
>>
>> If we fix the order with correct data-access it fixes the issue.
>>
>>> I think the pointer of descriptors.request_.get() is not invalidated
>>> by std::move().
>>
>> Why do you think so? :)
>>
> std::move() invalidates descriptor. Yes, descriptor.request_ becomes
> invalidated after std::move(). descriptor must not be accessed after
> std::move().
> However, queueRequest holds a pointer value,
> descriptor.request_.get(). descriptor is move()d to descriptors_ and
> thus descriptor.request_ is alive until descriptor is removed in
> descriptors_.
> So the passed pointer should still be valid after executing std::move().


Generally, in other programming models, I have often seen data that has 
been move()d should be now/further accessed(if necessary)  from that 
container that they have been move()d into. It clearly differentiates 
between the ownership model (before and after) and I find it as a good 
mental model to have (helps especially when there is a long function 
implementation )

Yes, we can store a pointer before the move and call queueRequest() with 
it after the move. Either of the methods, I am not against any one.

>
>>       diff --git a/src/android/camera_device.cpp
>> b/src/android/camera_device.cpp
>>       index a693dcbe..0de7050d 100644
>>       --- a/src/android/camera_device.cpp
>>       +++ b/src/android/camera_device.cpp
>>       @@ -1063,13 +1063,13 @@ int
>> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>                       state_ = State::Running;
>>               }
>>
>>       - worker_.queueRequest(descriptor.request_.get());
>>       -
>>               {
>>                       MutexLocker descriptorsLock(descriptorsMutex_);
>> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>>               }
>>
>>       + worker_.queueRequest(descriptor.request_.get());
>>       +
>>
>> is a segfaulted here (and expected no?).
>>
>>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>>>> ---
>>>>    src/android/camera_device.cpp | 7 ++++---
>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>>>> index 21844e51..c4c03d86 100644
>>>> --- a/src/android/camera_device.cpp
>>>> +++ b/src/android/camera_device.cpp
>>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>>>                   state_ = State::Running;
>>>>           }
>>>>
>>>> -       worker_.queueRequest(descriptor.request_.get());
>>>> -
>>>> +       unsigned long requestCookie = descriptor.request_->cookie();
>>>>           {
>>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
>>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>>>> +               descriptors_[requestCookie] = std::move(descriptor);
>>>>           }
>>>>
>>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
>>> Accessing descriptors_ must be while holding descriptorsMutex_.
>>
>> I am in two minds here,
>>
>> We can lock it yes, but we are just reading from the map so, should be
>> fine?
> Regardless of reading/writing, descriptorMutex_ must be acquired
> whenever accessing descriptors_.
> Some other thread can clear descriptors in parallel. Parallel


Not sure, if this a possible yet, but sure!

Thanks for the inputs.

> executing operations against std::map corrupts it.
> By the way, std::map::operator[] is not a constant function. In fact,
> it can create a new node if a key does not exist.
>
> -Hiro
>>> -Hiro
>>>> +
>>>>           return 0;
>>>>    }
>>>>
>>>> --
>>>> 2.31.1
>>>>
Hirokazu Honda Sept. 24, 2021, 12:57 p.m. UTC | #5
Hi Umang,

On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>
> Hi Hiro,
>
> On 9/24/21 5:07 PM, Hirokazu Honda wrote:
> > Hi Umang,
> >
> > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> >> Hi Hiro,
> >>
> >> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> >>> Hi Umang, thank you for the patch.
> >>>
> >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> >>>> The Camera3RequestDescriptor containing the capture request
> >>>> is adding to the descriptors_ map after a call to
> >>>> CameraWorker::queueRequest(). This is a race condition since
> >>>> CameraWorker::queueRequest() queues request to libcamera::Camera
> >>>> asynchronously and the addition of the descriptor to the map
> >>>> occurs with std::move(). Hence, it might happen that the async
> >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
> >>>>
> >>>> Fix it by adding the descriptor to map first, before
> >>>> CameraWorker::queueRequest().
> >>> I don't understand the problem well.
> >>
> >> It's a race between queueRequest() and adding to the descriptor_ std::map.
> >>
> >> queueRequest() is called asynchronously(separate thread)  so it might
> >> happen that queueRequest is still processing/queuing the request while
> >> we std::move()ed the descriptor (Which wraps the request queueRequest is
> >> accessing)
> >>
> >> If we fix the order with correct data-access it fixes the issue.
> >>
> >>> I think the pointer of descriptors.request_.get() is not invalidated
> >>> by std::move().
> >>
> >> Why do you think so? :)
> >>
> > std::move() invalidates descriptor. Yes, descriptor.request_ becomes
> > invalidated after std::move(). descriptor must not be accessed after
> > std::move().
> > However, queueRequest holds a pointer value,
> > descriptor.request_.get(). descriptor is move()d to descriptors_ and
> > thus descriptor.request_ is alive until descriptor is removed in
> > descriptors_.
> > So the passed pointer should still be valid after executing std::move().
>
>
> Generally, in other programming models, I have often seen data that has
> been move()d should be now/further accessed(if necessary)  from that
> container that they have been move()d into. It clearly differentiates
> between the ownership model (before and after) and I find it as a good
> mental model to have (helps especially when there is a long function
> implementation )
>
> Yes, we can store a pointer before the move and call queueRequest() with
> it after the move. Either of the methods, I am not against any one.
>

Since copying a pointer value is before std::move(), it seems to be
natural for me to use descriptor.request_.get(), which is the original
code.
Hmm, I don't understand why the original code doesn't work. I guess
the seg fault is due to other reason.

-Hiro
> >
> >>       diff --git a/src/android/camera_device.cpp
> >> b/src/android/camera_device.cpp
> >>       index a693dcbe..0de7050d 100644
> >>       --- a/src/android/camera_device.cpp
> >>       +++ b/src/android/camera_device.cpp
> >>       @@ -1063,13 +1063,13 @@ int
> >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >>                       state_ = State::Running;
> >>               }
> >>
> >>       - worker_.queueRequest(descriptor.request_.get());
> >>       -
> >>               {
> >>                       MutexLocker descriptorsLock(descriptorsMutex_);
> >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> >>               }
> >>
> >>       + worker_.queueRequest(descriptor.request_.get());
> >>       +
> >>
> >> is a segfaulted here (and expected no?).
> >>
> >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> >>>> ---
> >>>>    src/android/camera_device.cpp | 7 ++++---
> >>>>    1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> >>>> index 21844e51..c4c03d86 100644
> >>>> --- a/src/android/camera_device.cpp
> >>>> +++ b/src/android/camera_device.cpp
> >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >>>>                   state_ = State::Running;
> >>>>           }
> >>>>
> >>>> -       worker_.queueRequest(descriptor.request_.get());
> >>>> -
> >>>> +       unsigned long requestCookie = descriptor.request_->cookie();
> >>>>           {
> >>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
> >>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> >>>> +               descriptors_[requestCookie] = std::move(descriptor);
> >>>>           }
> >>>>
> >>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> >>> Accessing descriptors_ must be while holding descriptorsMutex_.
> >>
> >> I am in two minds here,
> >>
> >> We can lock it yes, but we are just reading from the map so, should be
> >> fine?
> > Regardless of reading/writing, descriptorMutex_ must be acquired
> > whenever accessing descriptors_.
> > Some other thread can clear descriptors in parallel. Parallel
>
>
> Not sure, if this a possible yet, but sure!
>
> Thanks for the inputs.
>
> > executing operations against std::map corrupts it.
> > By the way, std::map::operator[] is not a constant function. In fact,
> > it can create a new node if a key does not exist.
> >
> > -Hiro
> >>> -Hiro
> >>>> +
> >>>>           return 0;
> >>>>    }
> >>>>
> >>>> --
> >>>> 2.31.1
> >>>>
Hirokazu Honda Sept. 27, 2021, 5:25 a.m. UTC | #6
Hi Umang,

On Fri, Sep 24, 2021 at 9:57 PM Hirokazu Honda <hiroh@chromium.org> wrote:
>
> Hi Umang,
>
> On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> >
> > Hi Hiro,
> >
> > On 9/24/21 5:07 PM, Hirokazu Honda wrote:
> > > Hi Umang,
> > >
> > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > >> Hi Hiro,
> > >>
> > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > >>> Hi Umang, thank you for the patch.
> > >>>
> > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > >>>> The Camera3RequestDescriptor containing the capture request
> > >>>> is adding to the descriptors_ map after a call to
> > >>>> CameraWorker::queueRequest(). This is a race condition since
> > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera
> > >>>> asynchronously and the addition of the descriptor to the map
> > >>>> occurs with std::move(). Hence, it might happen that the async
> > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
> > >>>>
> > >>>> Fix it by adding the descriptor to map first, before
> > >>>> CameraWorker::queueRequest().
> > >>> I don't understand the problem well.
> > >>
> > >> It's a race between queueRequest() and adding to the descriptor_ std::map.
> > >>
> > >> queueRequest() is called asynchronously(separate thread)  so it might
> > >> happen that queueRequest is still processing/queuing the request while
> > >> we std::move()ed the descriptor (Which wraps the request queueRequest is
> > >> accessing)
> > >>
> > >> If we fix the order with correct data-access it fixes the issue.
> > >>
> > >>> I think the pointer of descriptors.request_.get() is not invalidated
> > >>> by std::move().
> > >>
> > >> Why do you think so? :)
> > >>
> > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes
> > > invalidated after std::move(). descriptor must not be accessed after
> > > std::move().
> > > However, queueRequest holds a pointer value,
> > > descriptor.request_.get(). descriptor is move()d to descriptors_ and
> > > thus descriptor.request_ is alive until descriptor is removed in
> > > descriptors_.
> > > So the passed pointer should still be valid after executing std::move().
> >
> >
> > Generally, in other programming models, I have often seen data that has
> > been move()d should be now/further accessed(if necessary)  from that
> > container that they have been move()d into. It clearly differentiates
> > between the ownership model (before and after) and I find it as a good
> > mental model to have (helps especially when there is a long function
> > implementation )
> >
> > Yes, we can store a pointer before the move and call queueRequest() with
> > it after the move. Either of the methods, I am not against any one.
> >
>
> Since copying a pointer value is before std::move(), it seems to be
> natural for me to use descriptor.request_.get(), which is the original
> code.
> Hmm, I don't understand why the original code doesn't work. I guess
> the seg fault is due to other reason.
>

Would you mind investigating more?

Best Regards,
-Hiro

> -Hiro
> > >
> > >>       diff --git a/src/android/camera_device.cpp
> > >> b/src/android/camera_device.cpp
> > >>       index a693dcbe..0de7050d 100644
> > >>       --- a/src/android/camera_device.cpp
> > >>       +++ b/src/android/camera_device.cpp
> > >>       @@ -1063,13 +1063,13 @@ int
> > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >>                       state_ = State::Running;
> > >>               }
> > >>
> > >>       - worker_.queueRequest(descriptor.request_.get());
> > >>       -
> > >>               {
> > >>                       MutexLocker descriptorsLock(descriptorsMutex_);
> > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > >>               }
> > >>
> > >>       + worker_.queueRequest(descriptor.request_.get());
> > >>       +
> > >>
> > >> is a segfaulted here (and expected no?).
> > >>
> > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> > >>>> ---
> > >>>>    src/android/camera_device.cpp | 7 ++++---
> > >>>>    1 file changed, 4 insertions(+), 3 deletions(-)
> > >>>>
> > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > >>>> index 21844e51..c4c03d86 100644
> > >>>> --- a/src/android/camera_device.cpp
> > >>>> +++ b/src/android/camera_device.cpp
> > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >>>>                   state_ = State::Running;
> > >>>>           }
> > >>>>
> > >>>> -       worker_.queueRequest(descriptor.request_.get());
> > >>>> -
> > >>>> +       unsigned long requestCookie = descriptor.request_->cookie();
> > >>>>           {
> > >>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
> > >>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > >>>> +               descriptors_[requestCookie] = std::move(descriptor);
> > >>>>           }
> > >>>>
> > >>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > >>> Accessing descriptors_ must be while holding descriptorsMutex_.
> > >>
> > >> I am in two minds here,
> > >>
> > >> We can lock it yes, but we are just reading from the map so, should be
> > >> fine?
> > > Regardless of reading/writing, descriptorMutex_ must be acquired
> > > whenever accessing descriptors_.
> > > Some other thread can clear descriptors in parallel. Parallel
> >
> >
> > Not sure, if this a possible yet, but sure!
> >
> > Thanks for the inputs.
> >
> > > executing operations against std::map corrupts it.
> > > By the way, std::map::operator[] is not a constant function. In fact,
> > > it can create a new node if a key does not exist.
> > >
> > > -Hiro
> > >>> -Hiro
> > >>>> +
> > >>>>           return 0;
> > >>>>    }
> > >>>>
> > >>>> --
> > >>>> 2.31.1
> > >>>>
Hirokazu Honda Sept. 27, 2021, 5:45 a.m. UTC | #7
+Jacopo Mondi

I saw Jacopo pointed this in https://patchwork.libcamera.org/patch/13866/.
I consider there is no race condition here. Jacopo, am I wrong?

-Hiro

On Mon, Sep 27, 2021 at 2:25 PM Hirokazu Honda <hiroh@chromium.org> wrote:
>
> Hi Umang,
>
> On Fri, Sep 24, 2021 at 9:57 PM Hirokazu Honda <hiroh@chromium.org> wrote:
> >
> > Hi Umang,
> >
> > On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > >
> > > Hi Hiro,
> > >
> > > On 9/24/21 5:07 PM, Hirokazu Honda wrote:
> > > > Hi Umang,
> > > >
> > > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > > >> Hi Hiro,
> > > >>
> > > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > > >>> Hi Umang, thank you for the patch.
> > > >>>
> > > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > > >>>> The Camera3RequestDescriptor containing the capture request
> > > >>>> is adding to the descriptors_ map after a call to
> > > >>>> CameraWorker::queueRequest(). This is a race condition since
> > > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera
> > > >>>> asynchronously and the addition of the descriptor to the map
> > > >>>> occurs with std::move(). Hence, it might happen that the async
> > > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
> > > >>>>
> > > >>>> Fix it by adding the descriptor to map first, before
> > > >>>> CameraWorker::queueRequest().
> > > >>> I don't understand the problem well.
> > > >>
> > > >> It's a race between queueRequest() and adding to the descriptor_ std::map.
> > > >>
> > > >> queueRequest() is called asynchronously(separate thread)  so it might
> > > >> happen that queueRequest is still processing/queuing the request while
> > > >> we std::move()ed the descriptor (Which wraps the request queueRequest is
> > > >> accessing)
> > > >>
> > > >> If we fix the order with correct data-access it fixes the issue.
> > > >>
> > > >>> I think the pointer of descriptors.request_.get() is not invalidated
> > > >>> by std::move().
> > > >>
> > > >> Why do you think so? :)
> > > >>
> > > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes
> > > > invalidated after std::move(). descriptor must not be accessed after
> > > > std::move().
> > > > However, queueRequest holds a pointer value,
> > > > descriptor.request_.get(). descriptor is move()d to descriptors_ and
> > > > thus descriptor.request_ is alive until descriptor is removed in
> > > > descriptors_.
> > > > So the passed pointer should still be valid after executing std::move().
> > >
> > >
> > > Generally, in other programming models, I have often seen data that has
> > > been move()d should be now/further accessed(if necessary)  from that
> > > container that they have been move()d into. It clearly differentiates
> > > between the ownership model (before and after) and I find it as a good
> > > mental model to have (helps especially when there is a long function
> > > implementation )
> > >
> > > Yes, we can store a pointer before the move and call queueRequest() with
> > > it after the move. Either of the methods, I am not against any one.
> > >
> >
> > Since copying a pointer value is before std::move(), it seems to be
> > natural for me to use descriptor.request_.get(), which is the original
> > code.
> > Hmm, I don't understand why the original code doesn't work. I guess
> > the seg fault is due to other reason.
> >
>
> Would you mind investigating more?
>
> Best Regards,
> -Hiro
>
> > -Hiro
> > > >
> > > >>       diff --git a/src/android/camera_device.cpp
> > > >> b/src/android/camera_device.cpp
> > > >>       index a693dcbe..0de7050d 100644
> > > >>       --- a/src/android/camera_device.cpp
> > > >>       +++ b/src/android/camera_device.cpp
> > > >>       @@ -1063,13 +1063,13 @@ int
> > > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >>                       state_ = State::Running;
> > > >>               }
> > > >>
> > > >>       - worker_.queueRequest(descriptor.request_.get());
> > > >>       -
> > > >>               {
> > > >>                       MutexLocker descriptorsLock(descriptorsMutex_);
> > > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > >>               }
> > > >>
> > > >>       + worker_.queueRequest(descriptor.request_.get());
> > > >>       +
> > > >>
> > > >> is a segfaulted here (and expected no?).
> > > >>
> > > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> > > >>>> ---
> > > >>>>    src/android/camera_device.cpp | 7 ++++---
> > > >>>>    1 file changed, 4 insertions(+), 3 deletions(-)
> > > >>>>
> > > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > >>>> index 21844e51..c4c03d86 100644
> > > >>>> --- a/src/android/camera_device.cpp
> > > >>>> +++ b/src/android/camera_device.cpp
> > > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > >>>>                   state_ = State::Running;
> > > >>>>           }
> > > >>>>
> > > >>>> -       worker_.queueRequest(descriptor.request_.get());
> > > >>>> -
> > > >>>> +       unsigned long requestCookie = descriptor.request_->cookie();
> > > >>>>           {
> > > >>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
> > > >>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > >>>> +               descriptors_[requestCookie] = std::move(descriptor);
> > > >>>>           }
> > > >>>>
> > > >>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > > >>> Accessing descriptors_ must be while holding descriptorsMutex_.
> > > >>
> > > >> I am in two minds here,
> > > >>
> > > >> We can lock it yes, but we are just reading from the map so, should be
> > > >> fine?
> > > > Regardless of reading/writing, descriptorMutex_ must be acquired
> > > > whenever accessing descriptors_.
> > > > Some other thread can clear descriptors in parallel. Parallel
> > >
> > >
> > > Not sure, if this a possible yet, but sure!
> > >
> > > Thanks for the inputs.
> > >
> > > > executing operations against std::map corrupts it.
> > > > By the way, std::map::operator[] is not a constant function. In fact,
> > > > it can create a new node if a key does not exist.
> > > >
> > > > -Hiro
> > > >>> -Hiro
> > > >>>> +
> > > >>>>           return 0;
> > > >>>>    }
> > > >>>>
> > > >>>> --
> > > >>>> 2.31.1
> > > >>>>
Jacopo Mondi Sept. 27, 2021, 12:55 p.m. UTC | #8
Hi Hiro,

On Mon, Sep 27, 2021 at 02:45:04PM +0900, Hirokazu Honda wrote:
> +Jacopo Mondi
>
> I saw Jacopo pointed this in https://patchwork.libcamera.org/patch/13866/.
> I consider there is no race condition here. Jacopo, am I wrong?

Not really sure to which part you are referring to, if the race Umang
is investigating or the access to the descriptors_ map being protected
or not.

I'll try go through the discussion in previous emails as here some
context have been removed during the conversation.


>
> -Hiro
>
> On Mon, Sep 27, 2021 at 2:25 PM Hirokazu Honda <hiroh@chromium.org> wrote:
> >
> > Hi Umang,
> >
> > On Fri, Sep 24, 2021 at 9:57 PM Hirokazu Honda <hiroh@chromium.org> wrote:
> > >
> > > Hi Umang,
> > >
> > > On Fri, Sep 24, 2021 at 9:39 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > > >
> > > > Hi Hiro,
> > > >
> > > > On 9/24/21 5:07 PM, Hirokazu Honda wrote:
> > > > > Hi Umang,
> > > > >
> > > > > On Fri, Sep 24, 2021 at 5:19 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > > > >> Hi Hiro,
> > > > >>
> > > > >> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > > > >>> Hi Umang, thank you for the patch.
> > > > >>>
> > > > >>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > > > >>>> The Camera3RequestDescriptor containing the capture request
> > > > >>>> is adding to the descriptors_ map after a call to
> > > > >>>> CameraWorker::queueRequest(). This is a race condition since
> > > > >>>> CameraWorker::queueRequest() queues request to libcamera::Camera
> > > > >>>> asynchronously and the addition of the descriptor to the map
> > > > >>>> occurs with std::move(). Hence, it might happen that the async
> > > > >>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
> > > > >>>>
> > > > >>>> Fix it by adding the descriptor to map first, before
> > > > >>>> CameraWorker::queueRequest().
> > > > >>> I don't understand the problem well.
> > > > >>
> > > > >> It's a race between queueRequest() and adding to the descriptor_ std::map.
> > > > >>
> > > > >> queueRequest() is called asynchronously(separate thread)  so it might
> > > > >> happen that queueRequest is still processing/queuing the request while
> > > > >> we std::move()ed the descriptor (Which wraps the request queueRequest is
> > > > >> accessing)
> > > > >>
> > > > >> If we fix the order with correct data-access it fixes the issue.
> > > > >>
> > > > >>> I think the pointer of descriptors.request_.get() is not invalidated
> > > > >>> by std::move().
> > > > >>
> > > > >> Why do you think so? :)
> > > > >>
> > > > > std::move() invalidates descriptor. Yes, descriptor.request_ becomes
> > > > > invalidated after std::move(). descriptor must not be accessed after
> > > > > std::move().
> > > > > However, queueRequest holds a pointer value,
> > > > > descriptor.request_.get(). descriptor is move()d to descriptors_ and
> > > > > thus descriptor.request_ is alive until descriptor is removed in
> > > > > descriptors_.
> > > > > So the passed pointer should still be valid after executing std::move().
> > > >
> > > >
> > > > Generally, in other programming models, I have often seen data that has
> > > > been move()d should be now/further accessed(if necessary)  from that
> > > > container that they have been move()d into. It clearly differentiates
> > > > between the ownership model (before and after) and I find it as a good
> > > > mental model to have (helps especially when there is a long function
> > > > implementation )
> > > >
> > > > Yes, we can store a pointer before the move and call queueRequest() with
> > > > it after the move. Either of the methods, I am not against any one.
> > > >
> > >
> > > Since copying a pointer value is before std::move(), it seems to be
> > > natural for me to use descriptor.request_.get(), which is the original
> > > code.
> > > Hmm, I don't understand why the original code doesn't work. I guess
> > > the seg fault is due to other reason.
> > >
> >
> > Would you mind investigating more?
> >
> > Best Regards,
> > -Hiro
> >
> > > -Hiro
> > > > >
> > > > >>       diff --git a/src/android/camera_device.cpp
> > > > >> b/src/android/camera_device.cpp
> > > > >>       index a693dcbe..0de7050d 100644
> > > > >>       --- a/src/android/camera_device.cpp
> > > > >>       +++ b/src/android/camera_device.cpp
> > > > >>       @@ -1063,13 +1063,13 @@ int
> > > > >> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > >>                       state_ = State::Running;
> > > > >>               }
> > > > >>
> > > > >>       - worker_.queueRequest(descriptor.request_.get());
> > > > >>       -
> > > > >>               {
> > > > >>                       MutexLocker descriptorsLock(descriptorsMutex_);
> > > > >> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > > >>               }
> > > > >>
> > > > >>       + worker_.queueRequest(descriptor.request_.get());
> > > > >>       +
> > > > >>
> > > > >> is a segfaulted here (and expected no?).
> > > > >>
> > > > >>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> > > > >>>> ---
> > > > >>>>    src/android/camera_device.cpp | 7 ++++---
> > > > >>>>    1 file changed, 4 insertions(+), 3 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > >>>> index 21844e51..c4c03d86 100644
> > > > >>>> --- a/src/android/camera_device.cpp
> > > > >>>> +++ b/src/android/camera_device.cpp
> > > > >>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > >>>>                   state_ = State::Running;
> > > > >>>>           }
> > > > >>>>
> > > > >>>> -       worker_.queueRequest(descriptor.request_.get());
> > > > >>>> -
> > > > >>>> +       unsigned long requestCookie = descriptor.request_->cookie();
> > > > >>>>           {
> > > > >>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
> > > > >>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > > >>>> +               descriptors_[requestCookie] = std::move(descriptor);
> > > > >>>>           }
> > > > >>>>
> > > > >>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > > > >>> Accessing descriptors_ must be while holding descriptorsMutex_.
> > > > >>
> > > > >> I am in two minds here,
> > > > >>
> > > > >> We can lock it yes, but we are just reading from the map so, should be
> > > > >> fine?
> > > > > Regardless of reading/writing, descriptorMutex_ must be acquired
> > > > > whenever accessing descriptors_.
> > > > > Some other thread can clear descriptors in parallel. Parallel
> > > >
> > > >
> > > > Not sure, if this a possible yet, but sure!
> > > >
> > > > Thanks for the inputs.
> > > >
> > > > > executing operations against std::map corrupts it.
> > > > > By the way, std::map::operator[] is not a constant function. In fact,
> > > > > it can create a new node if a key does not exist.
> > > > >
> > > > > -Hiro
> > > > >>> -Hiro
> > > > >>>> +
> > > > >>>>           return 0;
> > > > >>>>    }
> > > > >>>>
> > > > >>>> --
> > > > >>>> 2.31.1
> > > > >>>>
Jacopo Mondi Sept. 27, 2021, 1:50 p.m. UTC | #9
Hi Umang, Hiro,

On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote:
> Hi Hiro,
>
> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > Hi Umang, thank you for the patch.
> >
> > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
> > > The Camera3RequestDescriptor containing the capture request
> > > is adding to the descriptors_ map after a call to
> > > CameraWorker::queueRequest(). This is a race condition since
> > > CameraWorker::queueRequest() queues request to libcamera::Camera
> > > asynchronously and the addition of the descriptor to the map
> > > occurs with std::move(). Hence, it might happen that the async
> > > queueRequest() hasn't finished but the descriptor gets std::move()ed.
> > >
> > > Fix it by adding the descriptor to map first, before
> > > CameraWorker::queueRequest().
> >
> > I don't understand the problem well.
>
>
> It's a race between queueRequest() and adding to the descriptor_ std::map.
>
> queueRequest() is called asynchronously(separate thread)  so it might happen
> that queueRequest is still processing/queuing the request while we
> std::move()ed the descriptor (Which wraps the request queueRequest is
> accessing)
>
> If we fix the order with correct data-access it fixes the issue.
>
> > I think the pointer of descriptors.request_.get() is not invalidated
> > by std::move().

You know, I'm not really sure :)
Let's try to find it out...

This is the descriptor lifecycle inside this function

int processRequest()
{
        // Allocated on the stack, request_ is a unique_ptr<> embedded in
        // descriptor
        Camera3RequestDescriptor descriptor(camera_.get(), camera3Request);

        ....

        worker_.queueRequest(descriptor.request_.get());
        descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
}

Camera3RequestDescriptor::request_ is initialized as

	request_ = std::make_unique<CaptureRequest>(camera);

CameraDevice::descriptors_ is a map of type:
        std::map<uint64_t, Camera3RequestDescriptor> descriptors_;

And Camera3RequestDescriptor has a defaulted move assignment operator=()
       		Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default;

As Camera3RequestDescriptor member are:

	struct Camera3RequestDescriptor {

		uint32_t frameNumber_ = 0;
		std::vector<camera3_stream_buffer_t> buffers_;
		std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
		CameraMetadata settings_;
		std::unique_ptr<CaptureRequest> request_;
	};

And most of them have a non-implicitly defined move assignment operator (vecors
and unique_ptr) I think we fall in the "Implicitly defined" section of [1]

-------------------------------------------------------------------------------
Implicitly-defined move assignment operator

If the implicitly-declared move assignment operator is neither deleted nor
trivial, it is defined (that is, a function body is generated and compiled) by
the compiler if odr-used or needed for constant evaluation (since C++14).

For union types, the implicitly-defined move assignment operator copies the
object representation (as by std::memmove).

For non-union class types (class and struct), the move assignment operator
performs full member-wise move assignment of the object's direct bases and
immediate non-static members, in their declaration order, using built-in
assignment for the scalars, memberwise move-assignment for arrays, and move
assignment operator for class types (called non-virtually).
-------------------------------------------------------------------------------

Which means we're effectively calling std::unique_ptr::operator[](&&) on
request_.

As std::unique_ptr::operator[](&&other) is equivalent to

        reset(other.release());

I guess what happens is that the location in memory of request_ doesn't change
but it's only its ownership that gets moved to the newly created entry in the
descriptors_ map.

Hence I think the code is safe the way it is, even if it might look suspicious
or fragile.

I think it could be made 'safer' if we want to by:

-       worker_.queueRequest(descriptor.request_.get());
-
+       CaptureRequest *req;
        {
                MutexLocker descriptorsLock(descriptorsMutex_);
+               unsigned long requestCookie = descriptor.request_->cookie();
-               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
+               descriptors_[requestCookie] = std::move(descriptor);
+               req = descriptors_[requestCookie].request_.get();
        }

+       worker_.queueRequest(req);
+
        return 0;
 }

But I didn't get if Umang hit any issue that lead him to send this patch.

[1] https://en.cppreference.com/w/cpp/language/move_assignment

>
>
> Why do you think so? :)
>
>     diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
>     index a693dcbe..0de7050d 100644
>     --- a/src/android/camera_device.cpp
>     +++ b/src/android/camera_device.cpp
>     @@ -1063,13 +1063,13 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>                     state_ = State::Running;
>             }
>
>     - worker_.queueRequest(descriptor.request_.get());
>     -
>             {
>                     MutexLocker descriptorsLock(descriptorsMutex_);
> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>             }
>
>     + worker_.queueRequest(descriptor.request_.get());
>     +
>
> is a segfaulted here (and expected no?).

For sure it does, you're now using 'descriptor' after it has been moved and it's
been left in unspecified state. As shown above std::unique_ptr::operator=(&&)
has been called on request_ which calls request_.release(), hence accessing it
with std::unique_ptr::get() is undefined behaviour.

>
> >
> > > Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> > > ---
> > >   src/android/camera_device.cpp | 7 ++++---
> > >   1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index 21844e51..c4c03d86 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >                  state_ = State::Running;
> > >          }
> > >
> > > -       worker_.queueRequest(descriptor.request_.get());
> > > -
> > > +       unsigned long requestCookie = descriptor.request_->cookie();
> > >          {
> > >                  MutexLocker descriptorsLock(descriptorsMutex_);
> > > -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > +               descriptors_[requestCookie] = std::move(descriptor);
> > >          }
> > >
> > > +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > Accessing descriptors_ must be while holding descriptorsMutex_.
>
>
> I am in two minds here,
>
> We can lock it yes, but we are just reading from the map so, should be fine?

I think accessing descriptors_ should always happen while holding the
descriptorsMutex_ lock, as we extract nodes from the map in requestComplete()
which might invalidates iterators and re-points the internal pointers in the
map. I can't tell what happens if done concurrently with accessing the map, but
doesn't seem like a good idea.

Thanks
   j

>
> >
> > -Hiro
> > > +
> > >          return 0;
> > >   }
> > >
> > > --
> > > 2.31.1
> > >
Umang Jain Sept. 27, 2021, 3:13 p.m. UTC | #10
Hi Jacopo, Hiro,

On 9/27/21 7:20 PM, Jacopo Mondi wrote:
> Hi Umang, Hiro,
>
> On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote:
>> Hi Hiro,
>>
>> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
>>> Hi Umang, thank you for the patch.
>>>
>>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain@ideasonboard.com> wrote:
>>>> The Camera3RequestDescriptor containing the capture request
>>>> is adding to the descriptors_ map after a call to
>>>> CameraWorker::queueRequest(). This is a race condition since
>>>> CameraWorker::queueRequest() queues request to libcamera::Camera
>>>> asynchronously and the addition of the descriptor to the map
>>>> occurs with std::move(). Hence, it might happen that the async
>>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
>>>>
>>>> Fix it by adding the descriptor to map first, before
>>>> CameraWorker::queueRequest().
>>> I don't understand the problem well.
>>
>> It's a race between queueRequest() and adding to the descriptor_ std::map.
>>
>> queueRequest() is called asynchronously(separate thread)  so it might happen
>> that queueRequest is still processing/queuing the request while we
>> std::move()ed the descriptor (Which wraps the request queueRequest is
>> accessing)
>>
>> If we fix the order with correct data-access it fixes the issue.
>>
>>> I think the pointer of descriptors.request_.get() is not invalidated
>>> by std::move().
> You know, I'm not really sure :)
> Let's try to find it out...
>
> This is the descriptor lifecycle inside this function
>
> int processRequest()
> {
>          // Allocated on the stack, request_ is a unique_ptr<> embedded in
>          // descriptor
>          Camera3RequestDescriptor descriptor(camera_.get(), camera3Request);
>
>          ....
>
>          worker_.queueRequest(descriptor.request_.get());
>          descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> }
>
> Camera3RequestDescriptor::request_ is initialized as
>
> 	request_ = std::make_unique<CaptureRequest>(camera);
>
> CameraDevice::descriptors_ is a map of type:
>          std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
>
> And Camera3RequestDescriptor has a defaulted move assignment operator=()
>         		Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default;
>
> As Camera3RequestDescriptor member are:
>
> 	struct Camera3RequestDescriptor {
>
> 		uint32_t frameNumber_ = 0;
> 		std::vector<camera3_stream_buffer_t> buffers_;
> 		std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
> 		CameraMetadata settings_;
> 		std::unique_ptr<CaptureRequest> request_;
> 	};
>
> And most of them have a non-implicitly defined move assignment operator (vecors
> and unique_ptr) I think we fall in the "Implicitly defined" section of [1]
>
> -------------------------------------------------------------------------------
> Implicitly-defined move assignment operator
>
> If the implicitly-declared move assignment operator is neither deleted nor
> trivial, it is defined (that is, a function body is generated and compiled) by
> the compiler if odr-used or needed for constant evaluation (since C++14).
>
> For union types, the implicitly-defined move assignment operator copies the
> object representation (as by std::memmove).
>
> For non-union class types (class and struct), the move assignment operator
> performs full member-wise move assignment of the object's direct bases and
> immediate non-static members, in their declaration order, using built-in
> assignment for the scalars, memberwise move-assignment for arrays, and move
> assignment operator for class types (called non-virtually).
> -------------------------------------------------------------------------------
>
> Which means we're effectively calling std::unique_ptr::operator[](&&) on
> request_.
>
> As std::unique_ptr::operator[](&&other) is equivalent to
>
>          reset(other.release());
>
> I guess what happens is that the location in memory of request_ doesn't change
> but it's only its ownership that gets moved to the newly created entry in the
> descriptors_ map.
>
> Hence I think the code is safe the way it is, even if it might look suspicious
> or fragile.


On a lighter note, looks don't matter ? :P

I think I'll slightly dis-agree that it is safe..

I understand that the ::move is not really copying to a different 
location of memory, rather transfer of ownership happens, so technically 
it might not cause any problems, since pointers are still pointing to 
memory location they are supposed to point to (possibly v2 version of 
this patch is based on this fact). But, ::move()ing  while an async 
operation is in play on a memory location does concern me if we look at it.

If we want to queueRequest() /before/ the descriptor is moved, I think 
that should also happen with a lock, since, descriptor.request_.get() is 
a descriptor access, no? It might happen that the queueRequest() might 
also set something on the CaptureRequest pointer, so essentially, we 
would still to lock the entire async operation before firing it off, 
until it returns.


>
> I think it could be made 'safer' if we want to by:
>
> -       worker_.queueRequest(descriptor.request_.get());
> -
> +       CaptureRequest *req;
>          {
>                  MutexLocker descriptorsLock(descriptorsMutex_);
> +               unsigned long requestCookie = descriptor.request_->cookie();
> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> +               descriptors_[requestCookie] = std::move(descriptor);
> +               req = descriptors_[requestCookie].request_.get();
>          }
>
> +       worker_.queueRequest(req);
> +
>          return 0;
>   }


Precisely, that intended goal of the patch is to queueRequest() after 
the descriptor is moved to the map.

>
> But I didn't get if Umang hit any issue that lead him to send this patch.


Please mind that v2 of this patch is already on the list last week, 
which might be of interest and further discussion. Also, I didn't get 
any issues with it, it's something Laurent initially spotted while code 
review [1]

Excerpt below :

>/> We have a race condition here, worker_.queueRequest() should go after />/> adding the request to the queue. Could you fix it in a patch on top ? />//>/Do you mean the race condition is existing already, with the />/descriptors_ map (that has been removed from this patch)? /
Correct, it's already here.

>/Yes, I can introduce a patch before this one, that fixes the race first />/in the map itself. Is my understanding correct? /
Sounds good to me. It should be a small patch.

[1]: 
https://lists.libcamera.org/pipermail/libcamera-devel/2021-September/025004.html

>
> [1] https://en.cppreference.com/w/cpp/language/move_assignment
>
>>
>> Why do you think so? :)
>>
>>      diff --git a/src/android/camera_device.cpp
>> b/src/android/camera_device.cpp
>>      index a693dcbe..0de7050d 100644
>>      --- a/src/android/camera_device.cpp
>>      +++ b/src/android/camera_device.cpp
>>      @@ -1063,13 +1063,13 @@ int
>> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>                      state_ = State::Running;
>>              }
>>
>>      - worker_.queueRequest(descriptor.request_.get());
>>      -
>>              {
>>                      MutexLocker descriptorsLock(descriptorsMutex_);
>> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>>              }
>>
>>      + worker_.queueRequest(descriptor.request_.get());
>>      +
>>
>> is a segfaulted here (and expected no?).
> For sure it does, you're now using 'descriptor' after it has been moved and it's
> been left in unspecified state. As shown above std::unique_ptr::operator=(&&)
> has been called on request_ which calls request_.release(), hence accessing it
> with std::unique_ptr::get() is undefined behaviour.
>
>>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>>>> ---
>>>>    src/android/camera_device.cpp | 7 ++++---
>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>>>> index 21844e51..c4c03d86 100644
>>>> --- a/src/android/camera_device.cpp
>>>> +++ b/src/android/camera_device.cpp
>>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>>>                   state_ = State::Running;
>>>>           }
>>>>
>>>> -       worker_.queueRequest(descriptor.request_.get());
>>>> -
>>>> +       unsigned long requestCookie = descriptor.request_->cookie();
>>>>           {
>>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
>>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>>>> +               descriptors_[requestCookie] = std::move(descriptor);
>>>>           }
>>>>
>>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
>>> Accessing descriptors_ must be while holding descriptorsMutex_.
>>
>> I am in two minds here,
>>
>> We can lock it yes, but we are just reading from the map so, should be fine?
> I think accessing descriptors_ should always happen while holding the
> descriptorsMutex_ lock, as we extract nodes from the map in requestComplete()
> which might invalidates iterators and re-points the internal pointers in the
> map. I can't tell what happens if done concurrently with accessing the map, but
> doesn't seem like a good idea.
>
> Thanks
>     j
>
>>> -Hiro
>>>> +
>>>>           return 0;
>>>>    }
>>>>
>>>> --
>>>> 2.31.1
>>>>
Laurent Pinchart Sept. 28, 2021, 1:20 a.m. UTC | #11
Hi Umang,

Thank you for the patch.

On Fri, Sep 24, 2021 at 12:14:09PM +0530, Umang Jain wrote:
> The Camera3RequestDescriptor containing the capture request
> is adding to the descriptors_ map after a call to
> CameraWorker::queueRequest(). This is a race condition since
> CameraWorker::queueRequest() queues request to libcamera::Camera
> asynchronously and the addition of the descriptor to the map
> occurs with std::move(). Hence, it might happen that the async
> queueRequest() hasn't finished but the descriptor gets std::move()ed.

The issue isn't related to std::move(). The problem is that the request
is queued to the worker, and at that point processing moves to another
thread. It's thus possible (even if very unlikely, but still possible)
that the processing will complete, and CameraDevice::requestComplete()
gets called, before this function proceeds to add the request to the
descriptors_ map. CameraDevice::requestComplete() will then not find it
in the map, and bad things will happen.

It's a classic race condition, std::move() doesn't play any role here.

> Fix it by adding the descriptor to map first, before
> CameraWorker::queueRequest().
> 
> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> ---
>  src/android/camera_device.cpp | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index 21844e51..c4c03d86 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  		state_ = State::Running;
>  	}
>  
> -	worker_.queueRequest(descriptor.request_.get());
> -
> +	unsigned long requestCookie = descriptor.request_->cookie();
>  	{
>  		MutexLocker descriptorsLock(descriptorsMutex_);
> -		descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> +		descriptors_[requestCookie] = std::move(descriptor);
>  	}
>  
> +	worker_.queueRequest(descriptors_[requestCookie].request_.get());

As Hiro mentioned, we shouldn't access descriptors_ without taking the
lock. Here's how to fix it:

	CaptureRequest *request = descriptor.request_.get();

	{
		MutexLocker descriptorsLock(descriptorsMutex_);
		descriptors_[request->cookie()] = std::move(descriptor);
	}

	worker_.queueRequest(request);

> +
>  	return 0;
>  }
>

Patch
diff mbox series

diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
index 21844e51..c4c03d86 100644
--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -1065,13 +1065,14 @@  int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
 		state_ = State::Running;
 	}
 
-	worker_.queueRequest(descriptor.request_.get());
-
+	unsigned long requestCookie = descriptor.request_->cookie();
 	{
 		MutexLocker descriptorsLock(descriptorsMutex_);
-		descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
+		descriptors_[requestCookie] = std::move(descriptor);
 	}
 
+	worker_.queueRequest(descriptors_[requestCookie].request_.get());
+
 	return 0;
 }