[libcamera-devel] libcamera: v4l2_videodevice: Handle unexpected buffers
diff mbox series

Message ID 20210715142130.139675-1-kieran.bingham@ideasonboard.com
State Superseded
Delegated to: Kieran Bingham
Headers show
Series
  • [libcamera-devel] libcamera: v4l2_videodevice: Handle unexpected buffers
Related show

Commit Message

Kieran Bingham July 15, 2021, 2:21 p.m. UTC
A kernel bug can lead to unexpected buffers being dequeued where we
haven't entered the buffer in our queuedBuffers_ list.

This causes invalid accesses if not handled correctly within libcamera,
and while it is a kernel issue, we must protect against unpatched
kernels.

Handle unexpected buffers by returning a nullptr, and move cache
management after the validation of the buffer.

Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
---
 src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Paul Elder July 28, 2021, 2:11 a.m. UTC | #1
Hi Kieran,

On Thu, Jul 15, 2021 at 03:21:30PM +0100, Kieran Bingham wrote:
> A kernel bug can lead to unexpected buffers being dequeued where we
> haven't entered the buffer in our queuedBuffers_ list.
> 
> This causes invalid accesses if not handled correctly within libcamera,
> and while it is a kernel issue, we must protect against unpatched
> kernels.
> 
> Handle unexpected buffers by returning a nullptr, and move cache
> management after the validation of the buffer.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

Looks good.

Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>

> ---
>  src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
> index 3d2d99b46e4e..6c7f9daf24db 100644
> --- a/src/libcamera/v4l2_videodevice.cpp
> +++ b/src/libcamera/v4l2_videodevice.cpp
> @@ -1519,9 +1519,28 @@ FrameBuffer *V4L2VideoDevice::dequeueBuffer()
>  
>  	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
>  
> +	auto it = queuedBuffers_.find(buf.index);
> +	/*
> +	 * If the video node fails to stream-on successfully (which can occur
> +	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
> +	 * returns a failure upon queing, being mistakenely kept in the kernel.
> +	 * This leads to the kernel notifying us that a buffer is available to
> +	 * dequeue, which we have no awareness of being queued, and thus we will
> +	 * not find it in the queuedBuffers_ list.
> +	 *
> +	 * Whilst this is a kernel bug and should be fixed there, ensure that we
> +	 * safely ignore buffers which are unexpected to prevent crashes on
> +	 * unpatched kernels.
> +	 */
> +	if (it == queuedBuffers_.end()) {
> +		LOG(V4L2, Error)
> +			<< "Dequeued an unexpected buffer:" << buf.index;
> +
> +		return nullptr;
> +	}
> +
>  	cache_->put(buf.index);
>  
> -	auto it = queuedBuffers_.find(buf.index);
>  	FrameBuffer *buffer = it->second;
>  	queuedBuffers_.erase(it);
>  
> -- 
> 2.30.2
>
Laurent Pinchart July 28, 2021, 6:24 a.m. UTC | #2
Hi Kieran,

Thank you for the patch.

On Thu, Jul 15, 2021 at 03:21:30PM +0100, Kieran Bingham wrote:
> A kernel bug can lead to unexpected buffers being dequeued where we
> haven't entered the buffer in our queuedBuffers_ list.
> 
> This causes invalid accesses if not handled correctly within libcamera,
> and while it is a kernel issue, we must protect against unpatched
> kernels.
> 
> Handle unexpected buffers by returning a nullptr, and move cache
> management after the validation of the buffer.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> ---
>  src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
> index 3d2d99b46e4e..6c7f9daf24db 100644
> --- a/src/libcamera/v4l2_videodevice.cpp
> +++ b/src/libcamera/v4l2_videodevice.cpp
> @@ -1519,9 +1519,28 @@ FrameBuffer *V4L2VideoDevice::dequeueBuffer()
>  
>  	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
>  
> +	auto it = queuedBuffers_.find(buf.index);
> +	/*
> +	 * If the video node fails to stream-on successfully (which can occur
> +	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
> +	 * returns a failure upon queing, being mistakenely kept in the kernel.

s/queing,/queing/

You've mistakenly spelled mistakenly mistakenely :-)

> +	 * This leads to the kernel notifying us that a buffer is available to
> +	 * dequeue, which we have no awareness of being queued, and thus we will
> +	 * not find it in the queuedBuffers_ list.

It may be worth expanding this a bit to explain about the
stream-on-at-qbuf-time issue. If I recall correctly, given that the
device fails to stream, the issue only occurs when we call
VIDIOC_STREAMOFF and all buffers currently in queue (including the one
erroneously kept in the queue) being signalled as complete in the error
state. Is that right ?

> +	 *
> +	 * Whilst this is a kernel bug and should be fixed there, ensure that we
> +	 * safely ignore buffers which are unexpected to prevent crashes on
> +	 * unpatched kernels.

Is the fix on its way upstream ? If so, this could probably be reworded,
and mentioning the target kernel version (and even better, the upstream
commit) would be nice.

> +	 */
> +	if (it == queuedBuffers_.end()) {
> +		LOG(V4L2, Error)
> +			<< "Dequeued an unexpected buffer:" << buf.index;

Missing space after ':'.

> +

You can drop the blank line.

> +		return nullptr;
> +	}
> +
>  	cache_->put(buf.index);
>  
> -	auto it = queuedBuffers_.find(buf.index);
>  	FrameBuffer *buffer = it->second;
>  	queuedBuffers_.erase(it);
>
Kieran Bingham July 28, 2021, 8:51 a.m. UTC | #3
Hi Laurent,

On 28/07/2021 07:24, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Thu, Jul 15, 2021 at 03:21:30PM +0100, Kieran Bingham wrote:
>> A kernel bug can lead to unexpected buffers being dequeued where we
>> haven't entered the buffer in our queuedBuffers_ list.
>>
>> This causes invalid accesses if not handled correctly within libcamera,
>> and while it is a kernel issue, we must protect against unpatched
>> kernels.
>>
>> Handle unexpected buffers by returning a nullptr, and move cache
>> management after the validation of the buffer.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
>> ---
>>  src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
>> index 3d2d99b46e4e..6c7f9daf24db 100644
>> --- a/src/libcamera/v4l2_videodevice.cpp
>> +++ b/src/libcamera/v4l2_videodevice.cpp
>> @@ -1519,9 +1519,28 @@ FrameBuffer *V4L2VideoDevice::dequeueBuffer()
>>  
>>  	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
>>  
>> +	auto it = queuedBuffers_.find(buf.index);
>> +	/*
>> +	 * If the video node fails to stream-on successfully (which can occur
>> +	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
>> +	 * returns a failure upon queing, being mistakenely kept in the kernel.
> 
> s/queing,/queing/
> 
> You've mistakenly spelled mistakenly mistakenely :-)

You're too slow ;-)

Highlighted in my reply to myself already.

> 
>> +	 * This leads to the kernel notifying us that a buffer is available to
>> +	 * dequeue, which we have no awareness of being queued, and thus we will
>> +	 * not find it in the queuedBuffers_ list.
> 
> It may be worth expanding this a bit to explain about the
> stream-on-at-qbuf-time issue. If I recall correctly, given that the
> device fails to stream, the issue only occurs when we call
> VIDIOC_STREAMOFF and all buffers currently in queue (including the one
> erroneously kept in the queue) being signalled as complete in the error
> state. Is that right ?

I don't think we're calling stream off because we haven't called stream
on (by nature of the fact that it was a failure at stream-on-at-qbuf-time?).

Although, perhaps we do as the queueBuffer could lead to an error path
that calls cio2_->stop() which does call streamOff regardless, but it's
a fairly moot point by that stage.

The key issue is that the kernel notifies us of a buffer as completed in
error state, that we believed didn't queue - because it failed to
complete in Q_BUF.


>> +	 *
>> +	 * Whilst this is a kernel bug and should be fixed there, ensure that we
>> +	 * safely ignore buffers which are unexpected to prevent crashes on
>> +	 * unpatched kernels.
> 
> Is the fix on its way upstream ? If so, this could probably be reworded,
> and mentioning the target kernel version (and even better, the upstream
> commit) would be nice.

I've given Tested-by: tags to Hans on his proof-of-concept patch, I
haven't seen any updated patch come through from him though, so I don't
currently know the progress.

It may need chasing, or taking over if he doesn't have time.


>> +	 */
>> +	if (it == queuedBuffers_.end()) {
>> +		LOG(V4L2, Error)
>> +			<< "Dequeued an unexpected buffer:" << buf.index;
> 
> Missing space after ':'.

Also in my self-review and already fixed locally ;-)

> 
>> +
> 
> You can drop the blank line.

Ack.

> 
>> +		return nullptr;
>> +	}
>> +
>>  	cache_->put(buf.index);
>>  
>> -	auto it = queuedBuffers_.find(buf.index);
>>  	FrameBuffer *buffer = it->second;
>>  	queuedBuffers_.erase(it);
>>  
>
Laurent Pinchart July 28, 2021, 11:26 a.m. UTC | #4
Hi Kieran,

On Wed, Jul 28, 2021 at 09:51:19AM +0100, Kieran Bingham wrote:
> On 28/07/2021 07:24, Laurent Pinchart wrote:
> > On Thu, Jul 15, 2021 at 03:21:30PM +0100, Kieran Bingham wrote:
> >> A kernel bug can lead to unexpected buffers being dequeued where we
> >> haven't entered the buffer in our queuedBuffers_ list.
> >>
> >> This causes invalid accesses if not handled correctly within libcamera,
> >> and while it is a kernel issue, we must protect against unpatched
> >> kernels.
> >>
> >> Handle unexpected buffers by returning a nullptr, and move cache
> >> management after the validation of the buffer.
> >>
> >> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> >> ---
> >>  src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
> >>  1 file changed, 20 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
> >> index 3d2d99b46e4e..6c7f9daf24db 100644
> >> --- a/src/libcamera/v4l2_videodevice.cpp
> >> +++ b/src/libcamera/v4l2_videodevice.cpp
> >> @@ -1519,9 +1519,28 @@ FrameBuffer *V4L2VideoDevice::dequeueBuffer()
> >>  
> >>  	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
> >>  
> >> +	auto it = queuedBuffers_.find(buf.index);
> >> +	/*
> >> +	 * If the video node fails to stream-on successfully (which can occur
> >> +	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
> >> +	 * returns a failure upon queing, being mistakenely kept in the kernel.
> > 
> > s/queing,/queing/
> > 
> > You've mistakenly spelled mistakenly mistakenely :-)
> 
> You're too slow ;-)
> 
> Highlighted in my reply to myself already.
> 
> >> +	 * This leads to the kernel notifying us that a buffer is available to
> >> +	 * dequeue, which we have no awareness of being queued, and thus we will
> >> +	 * not find it in the queuedBuffers_ list.
> > 
> > It may be worth expanding this a bit to explain about the
> > stream-on-at-qbuf-time issue. If I recall correctly, given that the
> > device fails to stream, the issue only occurs when we call
> > VIDIOC_STREAMOFF and all buffers currently in queue (including the one
> > erroneously kept in the queue) being signalled as complete in the error
> > state. Is that right ?
> 
> I don't think we're calling stream off because we haven't called stream
> on (by nature of the fact that it was a failure at stream-on-at-qbuf-time?).

Strream-on-at-qbuf means that STREAMON is called and succeeds, but
doesn't reach the driver because vb2 blocks the STREAMON until enough
buffers are available. When a QBUF provides the needed buffer, vb2 calls
the .start_streaming() operation, which starts the device. A failure at
that point is reported as a QBUF failure, not a STREAMON failure has
that as completed long before. From a userspace point of view, the V4L2
video node is streaming, and thus need to the stopped with STREAMOFF.

> Although, perhaps we do as the queueBuffer could lead to an error path
> that calls cio2_->stop() which does call streamOff regardless, but it's
> a fairly moot point by that stage.
> 
> The key issue is that the kernel notifies us of a buffer as completed in
> error state, that we believed didn't queue - because it failed to
> complete in Q_BUF.

rMmy point is that that notification occurs at STREAMOFF time, when all
queued buffers are given back to userspace. The device hasn't been
started due to the .start_streaming() failure, so the kernel doesn't
produce any buffer ready event until STREAMOFF time.

> >> +	 *
> >> +	 * Whilst this is a kernel bug and should be fixed there, ensure that we
> >> +	 * safely ignore buffers which are unexpected to prevent crashes on
> >> +	 * unpatched kernels.
> > 
> > Is the fix on its way upstream ? If so, this could probably be reworded,
> > and mentioning the target kernel version (and even better, the upstream
> > commit) would be nice.
> 
> I've given Tested-by: tags to Hans on his proof-of-concept patch, I
> haven't seen any updated patch come through from him though, so I don't
> currently know the progress.
> 
> It may need chasing, or taking over if he doesn't have time.

Let's start the chase :-)

> >> +	 */
> >> +	if (it == queuedBuffers_.end()) {
> >> +		LOG(V4L2, Error)
> >> +			<< "Dequeued an unexpected buffer:" << buf.index;
> > 
> > Missing space after ':'.
> 
> Also in my self-review and already fixed locally ;-)
> 
> >> +
> > 
> > You can drop the blank line.
> 
> Ack.
> 
> >> +		return nullptr;
> >> +	}
> >> +
> >>  	cache_->put(buf.index);
> >>  
> >> -	auto it = queuedBuffers_.find(buf.index);
> >>  	FrameBuffer *buffer = it->second;
> >>  	queuedBuffers_.erase(it);
> >>
Kieran Bingham Aug. 12, 2021, 1:09 p.m. UTC | #5
Hi Laurent,

On 28/07/2021 12:26, Laurent Pinchart wrote:
> Hi Kieran,
> 
> On Wed, Jul 28, 2021 at 09:51:19AM +0100, Kieran Bingham wrote:
>> On 28/07/2021 07:24, Laurent Pinchart wrote:
>>> On Thu, Jul 15, 2021 at 03:21:30PM +0100, Kieran Bingham wrote:
>>>> A kernel bug can lead to unexpected buffers being dequeued where we
>>>> haven't entered the buffer in our queuedBuffers_ list.
>>>>
>>>> This causes invalid accesses if not handled correctly within libcamera,
>>>> and while it is a kernel issue, we must protect against unpatched
>>>> kernels.
>>>>
>>>> Handle unexpected buffers by returning a nullptr, and move cache
>>>> management after the validation of the buffer.
>>>>
>>>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
>>>> ---
>>>>  src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
>>>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
>>>> index 3d2d99b46e4e..6c7f9daf24db 100644
>>>> --- a/src/libcamera/v4l2_videodevice.cpp
>>>> +++ b/src/libcamera/v4l2_videodevice.cpp
>>>> @@ -1519,9 +1519,28 @@ FrameBuffer *V4L2VideoDevice::dequeueBuffer()
>>>>  
>>>>  	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
>>>>  
>>>> +	auto it = queuedBuffers_.find(buf.index);
>>>> +	/*
>>>> +	 * If the video node fails to stream-on successfully (which can occur
>>>> +	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
>>>> +	 * returns a failure upon queing, being mistakenely kept in the kernel.
>>>
>>> s/queing,/queing/
>>>
>>> You've mistakenly spelled mistakenly mistakenely :-)
>>
>> You're too slow ;-)
>>
>> Highlighted in my reply to myself already.
>>
>>>> +	 * This leads to the kernel notifying us that a buffer is available to
>>>> +	 * dequeue, which we have no awareness of being queued, and thus we will
>>>> +	 * not find it in the queuedBuffers_ list.
>>>
>>> It may be worth expanding this a bit to explain about the
>>> stream-on-at-qbuf-time issue. If I recall correctly, given that the
>>> device fails to stream, the issue only occurs when we call
>>> VIDIOC_STREAMOFF and all buffers currently in queue (including the one
>>> erroneously kept in the queue) being signalled as complete in the error
>>> state. Is that right ?
>>
>> I don't think we're calling stream off because we haven't called stream
>> on (by nature of the fact that it was a failure at stream-on-at-qbuf-time?).
> 
> Strream-on-at-qbuf means that STREAMON is called and succeeds, but
> doesn't reach the driver because vb2 blocks the STREAMON until enough
> buffers are available. When a QBUF provides the needed buffer, vb2 calls
> the .start_streaming() operation, which starts the device. A failure at
> that point is reported as a QBUF failure, not a STREAMON failure has
> that as completed long before. From a userspace point of view, the V4L2
> video node is streaming, and thus need to the stopped with STREAMOFF.
> 
>> Although, perhaps we do as the queueBuffer could lead to an error path
>> that calls cio2_->stop() which does call streamOff regardless, but it's
>> a fairly moot point by that stage.
>>
>> The key issue is that the kernel notifies us of a buffer as completed in
>> error state, that we believed didn't queue - because it failed to
>> complete in Q_BUF.
> 
> rMmy point is that that notification occurs at STREAMOFF time, when all
> queued buffers are given back to userspace. The device hasn't been
> started due to the .start_streaming() failure, so the kernel doesn't
> produce any buffer ready event until STREAMOFF time.
> 
>>>> +	 *
>>>> +	 * Whilst this is a kernel bug and should be fixed there, ensure that we
>>>> +	 * safely ignore buffers which are unexpected to prevent crashes on
>>>> +	 * unpatched kernels.
>>>
>>> Is the fix on its way upstream ? If so, this could probably be reworded,
>>> and mentioning the target kernel version (and even better, the upstream
>>> commit) would be nice.
>>
>> I've given Tested-by: tags to Hans on his proof-of-concept patch, I
>> haven't seen any updated patch come through from him though, so I don't
>> currently know the progress.
>>
>> It may need chasing, or taking over if he doesn't have time.
> 
> Let's start the chase :-)


The relevant kernel patch is posted, and headed to the stable trees, so
it will be picked up by CrOS soon enough.

Do you want to explicitly drop this patch, or are you happy to keep it?

I still think it has some merit, as it's better to report the issue
cleanly in the event that it happens ... however, its' a rare kernel bug
(I expect) that is already patched, and thus should become ... rarer ...

--
Kieran


> 
>>>> +	 */
>>>> +	if (it == queuedBuffers_.end()) {
>>>> +		LOG(V4L2, Error)
>>>> +			<< "Dequeued an unexpected buffer:" << buf.index;
>>>
>>> Missing space after ':'.
>>
>> Also in my self-review and already fixed locally ;-)
>>
>>>> +
>>>
>>> You can drop the blank line.
>>
>> Ack.
>>
>>>> +		return nullptr;
>>>> +	}
>>>> +
>>>>  	cache_->put(buf.index);
>>>>  
>>>> -	auto it = queuedBuffers_.find(buf.index);
>>>>  	FrameBuffer *buffer = it->second;
>>>>  	queuedBuffers_.erase(it);
>>>>  
>
Laurent Pinchart Aug. 12, 2021, 1:19 p.m. UTC | #6
Hi Kieran,

On Thu, Aug 12, 2021 at 02:09:18PM +0100, Kieran Bingham wrote:
> On 28/07/2021 12:26, Laurent Pinchart wrote:
> > On Wed, Jul 28, 2021 at 09:51:19AM +0100, Kieran Bingham wrote:
> >> On 28/07/2021 07:24, Laurent Pinchart wrote:
> >>> On Thu, Jul 15, 2021 at 03:21:30PM +0100, Kieran Bingham wrote:
> >>>> A kernel bug can lead to unexpected buffers being dequeued where we
> >>>> haven't entered the buffer in our queuedBuffers_ list.
> >>>>
> >>>> This causes invalid accesses if not handled correctly within libcamera,
> >>>> and while it is a kernel issue, we must protect against unpatched
> >>>> kernels.
> >>>>
> >>>> Handle unexpected buffers by returning a nullptr, and move cache
> >>>> management after the validation of the buffer.
> >>>>
> >>>> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> >>>> ---
> >>>>  src/libcamera/v4l2_videodevice.cpp | 21 ++++++++++++++++++++-
> >>>>  1 file changed, 20 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
> >>>> index 3d2d99b46e4e..6c7f9daf24db 100644
> >>>> --- a/src/libcamera/v4l2_videodevice.cpp
> >>>> +++ b/src/libcamera/v4l2_videodevice.cpp
> >>>> @@ -1519,9 +1519,28 @@ FrameBuffer *V4L2VideoDevice::dequeueBuffer()
> >>>>  
> >>>>  	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
> >>>>  
> >>>> +	auto it = queuedBuffers_.find(buf.index);
> >>>> +	/*
> >>>> +	 * If the video node fails to stream-on successfully (which can occur
> >>>> +	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
> >>>> +	 * returns a failure upon queing, being mistakenely kept in the kernel.
> >>>
> >>> s/queing,/queing/
> >>>
> >>> You've mistakenly spelled mistakenly mistakenely :-)
> >>
> >> You're too slow ;-)
> >>
> >> Highlighted in my reply to myself already.
> >>
> >>>> +	 * This leads to the kernel notifying us that a buffer is available to
> >>>> +	 * dequeue, which we have no awareness of being queued, and thus we will
> >>>> +	 * not find it in the queuedBuffers_ list.
> >>>
> >>> It may be worth expanding this a bit to explain about the
> >>> stream-on-at-qbuf-time issue. If I recall correctly, given that the
> >>> device fails to stream, the issue only occurs when we call
> >>> VIDIOC_STREAMOFF and all buffers currently in queue (including the one
> >>> erroneously kept in the queue) being signalled as complete in the error
> >>> state. Is that right ?
> >>
> >> I don't think we're calling stream off because we haven't called stream
> >> on (by nature of the fact that it was a failure at stream-on-at-qbuf-time?).
> > 
> > Strream-on-at-qbuf means that STREAMON is called and succeeds, but
> > doesn't reach the driver because vb2 blocks the STREAMON until enough
> > buffers are available. When a QBUF provides the needed buffer, vb2 calls
> > the .start_streaming() operation, which starts the device. A failure at
> > that point is reported as a QBUF failure, not a STREAMON failure has
> > that as completed long before. From a userspace point of view, the V4L2
> > video node is streaming, and thus need to the stopped with STREAMOFF.
> > 
> >> Although, perhaps we do as the queueBuffer could lead to an error path
> >> that calls cio2_->stop() which does call streamOff regardless, but it's
> >> a fairly moot point by that stage.
> >>
> >> The key issue is that the kernel notifies us of a buffer as completed in
> >> error state, that we believed didn't queue - because it failed to
> >> complete in Q_BUF.
> > 
> > rMmy point is that that notification occurs at STREAMOFF time, when all
> > queued buffers are given back to userspace. The device hasn't been
> > started due to the .start_streaming() failure, so the kernel doesn't
> > produce any buffer ready event until STREAMOFF time.
> > 
> >>>> +	 *
> >>>> +	 * Whilst this is a kernel bug and should be fixed there, ensure that we
> >>>> +	 * safely ignore buffers which are unexpected to prevent crashes on
> >>>> +	 * unpatched kernels.
> >>>
> >>> Is the fix on its way upstream ? If so, this could probably be reworded,
> >>> and mentioning the target kernel version (and even better, the upstream
> >>> commit) would be nice.
> >>
> >> I've given Tested-by: tags to Hans on his proof-of-concept patch, I
> >> haven't seen any updated patch come through from him though, so I don't
> >> currently know the progress.
> >>
> >> It may need chasing, or taking over if he doesn't have time.
> > 
> > Let's start the chase :-)
> 
> The relevant kernel patch is posted, and headed to the stable trees, so
> it will be picked up by CrOS soon enough.
> 
> Do you want to explicitly drop this patch, or are you happy to keep it?
> 
> I still think it has some merit, as it's better to report the issue
> cleanly in the event that it happens ... however, its' a rare kernel bug
> (I expect) that is already patched, and thus should become ... rarer ...

I'm fine keeping the patch, but I'd like to reword the above comment to
indicate when the issue has been fixed in the kernel. The corresponding
commit ID should be recorded, either in the commit message, or in the
comment.

> >>>> +	 */
> >>>> +	if (it == queuedBuffers_.end()) {
> >>>> +		LOG(V4L2, Error)
> >>>> +			<< "Dequeued an unexpected buffer:" << buf.index;
> >>>
> >>> Missing space after ':'.
> >>
> >> Also in my self-review and already fixed locally ;-)
> >>
> >>>> +
> >>>
> >>> You can drop the blank line.
> >>
> >> Ack.
> >>
> >>>> +		return nullptr;
> >>>> +	}
> >>>> +
> >>>>  	cache_->put(buf.index);
> >>>>  
> >>>> -	auto it = queuedBuffers_.find(buf.index);
> >>>>  	FrameBuffer *buffer = it->second;
> >>>>  	queuedBuffers_.erase(it);
> >>>>

Patch
diff mbox series

diff --git a/src/libcamera/v4l2_videodevice.cpp b/src/libcamera/v4l2_videodevice.cpp
index 3d2d99b46e4e..6c7f9daf24db 100644
--- a/src/libcamera/v4l2_videodevice.cpp
+++ b/src/libcamera/v4l2_videodevice.cpp
@@ -1519,9 +1519,28 @@  FrameBuffer *V4L2VideoDevice::dequeueBuffer()
 
 	LOG(V4L2, Debug) << "Dequeuing buffer " << buf.index;
 
+	auto it = queuedBuffers_.find(buf.index);
+	/*
+	 * If the video node fails to stream-on successfully (which can occur
+	 * when queing a buffer), a vb2 kernel bug can lead to the buffer which
+	 * returns a failure upon queing, being mistakenely kept in the kernel.
+	 * This leads to the kernel notifying us that a buffer is available to
+	 * dequeue, which we have no awareness of being queued, and thus we will
+	 * not find it in the queuedBuffers_ list.
+	 *
+	 * Whilst this is a kernel bug and should be fixed there, ensure that we
+	 * safely ignore buffers which are unexpected to prevent crashes on
+	 * unpatched kernels.
+	 */
+	if (it == queuedBuffers_.end()) {
+		LOG(V4L2, Error)
+			<< "Dequeued an unexpected buffer:" << buf.index;
+
+		return nullptr;
+	}
+
 	cache_->put(buf.index);
 
-	auto it = queuedBuffers_.find(buf.index);
 	FrameBuffer *buffer = it->second;
 	queuedBuffers_.erase(it);