[RFC] delayed_controls: avoid reading undefined control value
diff mbox series

Message ID 20241206112743.95435-1-stanislaw.gruszka@linux.intel.com
State New
Headers show
Series
  • [RFC] delayed_controls: avoid reading undefined control value
Related show

Commit Message

Stanislaw Gruszka Dec. 6, 2024, 11:27 a.m. UTC
Limit ControlRingBuffer index by number of queued entries in order do
avoid reading undefined control value with type ControlTypeNone .

It can happen at the begging of streaming when ControlRingBuffer is
not yet filled and there are errors on CSI-2 bus resulting dropping
frames. Then we can call to DelayedControls::get() with frame number
that exceed number of saved entries and get below assertion failure:

../src/libcamera/delayed_controls.cpp:227:
libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
Assertion `info.type() != ControlTypeNone' failed

Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
---
Notes: When debugging this I notice that the push() and get() are
done in different threads. Maybe mutex should be
used? Not sure how synchronization works for mojom signals
handlers.

Also I'm not sure if similar change like in this patch
is not needed for rpi DelayedControls

Alternative fix would be to pre-fill ControlRingBuffer with
valid control values. Should we care about possibility
of queueCount_ overflowing ? 

 src/libcamera/delayed_controls.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.43.0

Comments

Kieran Bingham Dec. 10, 2024, 10:39 p.m. UTC | #1
Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
> Limit ControlRingBuffer index by number of queued entries in order do
> avoid reading undefined control value with type ControlTypeNone .
> 
> It can happen at the begging of streaming when ControlRingBuffer is

s/begging/beginning/

> not yet filled and there are errors on CSI-2 bus resulting dropping
> frames. Then we can call to DelayedControls::get() with frame number
> that exceed number of saved entries and get below assertion failure:
> 
> ../src/libcamera/delayed_controls.cpp:227:
> libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
> Assertion `info.type() != ControlTypeNone' failed
> 
> Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> ---
> Notes: When debugging this I notice that the push() and get() are
> done in different threads. Maybe mutex should be
> used? Not sure how synchronization works for mojom signals
> handlers.

I thought they would be in the same thread as the CameraManager event
loop...  Perhaps we could/should have an assertion that the thread is
the same - otherwise - yes it would probably need a lock if there's a
reader and a writer on both ends of the queue.

I'd double check the threading first.

> Also I'm not sure if similar change like in this patch
> is not needed for rpi DelayedControls

Probably. There's very little different between the two implementations,
so we should probably keep them aligned with bugfixes at least until
someone tries to realign them both again.

It looks like the cookie feature was easier to fork than to add, I can't
remember all the details ;_(


> Alternative fix would be to pre-fill ControlRingBuffer with
> valid control values. Should we care about possibility
> of queueCount_ overflowing ? 

I'm not sure what 'valid' control values would be ? I think just
clamping to the most recent data is probably the best we can do ? And 'I
think' would reflect the most up to date information we have about
what's configured on the sensor - so I think it's still even the 'most
correct' (and probably even 'is' correct) data we can return in the
get() call in this event.


I think I can already throw this on here, it seems like a sane input
parameter validation:


Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

> 
>  src/libcamera/delayed_controls.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
> index 94d0a575..78b0b8f7 100644
> --- a/src/libcamera/delayed_controls.cpp
> +++ b/src/libcamera/delayed_controls.cpp
> @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
>   */
>  ControlList DelayedControls::get(uint32_t sequence)
>  {
> -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
> +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
> 
>         ControlList out(device_->controls());
>         for (const auto &ctrl : values_) {
> --
> 2.43.0
>
Laurent Pinchart Dec. 11, 2024, 7:43 a.m. UTC | #2
(CC'ing Naush for the question related to the Raspberry Pi
implementation)

On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
> Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
> > Limit ControlRingBuffer index by number of queued entries in order do
> > avoid reading undefined control value with type ControlTypeNone .
> > 
> > It can happen at the begging of streaming when ControlRingBuffer is
> 
> s/begging/beginning/
> 
> > not yet filled and there are errors on CSI-2 bus resulting dropping
> > frames. Then we can call to DelayedControls::get() with frame number
> > that exceed number of saved entries and get below assertion failure:
> > 
> > ../src/libcamera/delayed_controls.cpp:227:
> > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
> > Assertion `info.type() != ControlTypeNone' failed
> > 
> > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
> > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> > ---
> > Notes: When debugging this I notice that the push() and get() are
> > done in different threads. Maybe mutex should be
> > used? Not sure how synchronization works for mojom signals
> > handlers.

The comment related to the SoftwareISP::inputBufferReady signal, in the
SimpleCameraData::init() function, possibly applies to other signals of
the SoftwareISP class, although I wouldn't expect it would affect the
setSensorControls signal. Stanislaw, could you provide more information
about which threads the push and get functions are called from ?
Everything should be called from the camera manager thread.

Looking at this more deeply, I think the outputBufferReady and
statsReady signals may be mishandled, as they seem to be emitted from
the soft ISP worker thread but not treated the same way as the
inputBufferReady signal. We shouldn't have to deal with it when
connecting the signals in the user, the SoftwareISP class should handle
thread crossing, and emit all its signals from the camera manager
thread. Milan, is this something you could address ?

> I thought they would be in the same thread as the CameraManager event
> loop...  Perhaps we could/should have an assertion that the thread is
> the same - otherwise - yes it would probably need a lock if there's a
> reader and a writer on both ends of the queue.
> 
> I'd double check the threading first.
> 
> > Also I'm not sure if similar change like in this patch
> > is not needed for rpi DelayedControls
> 
> Probably. There's very little different between the two implementations,
> so we should probably keep them aligned with bugfixes at least until
> someone tries to realign them both again.

Naush, could you check this ?

> It looks like the cookie feature was easier to fork than to add, I can't
> remember all the details ;_(
> 
> 
> > Alternative fix would be to pre-fill ControlRingBuffer with
> > valid control values. Should we care about possibility
> > of queueCount_ overflowing ? 
> 
> I'm not sure what 'valid' control values would be ? I think just
> clamping to the most recent data is probably the best we can do ? And 'I
> think' would reflect the most up to date information we have about
> what's configured on the sensor - so I think it's still even the 'most
> correct' (and probably even 'is' correct) data we can return in the
> get() call in this event.
> 
> 
> I think I can already throw this on here, it seems like a sane input
> parameter validation:
> 
> 
> Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> 
> >  src/libcamera/delayed_controls.cpp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
> > index 94d0a575..78b0b8f7 100644
> > --- a/src/libcamera/delayed_controls.cpp
> > +++ b/src/libcamera/delayed_controls.cpp
> > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
> >   */
> >  ControlList DelayedControls::get(uint32_t sequence)
> >  {
> > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
> > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
> > 
> >         ControlList out(device_->controls());
> >         for (const auto &ctrl : values_) {
Naushir Patuck Dec. 11, 2024, 9:03 a.m. UTC | #3
On Wed, 11 Dec 2024 at 07:43, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> (CC'ing Naush for the question related to the Raspberry Pi
> implementation)
>
> On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
> > Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
> > > Limit ControlRingBuffer index by number of queued entries in order do
> > > avoid reading undefined control value with type ControlTypeNone .
> > >
> > > It can happen at the begging of streaming when ControlRingBuffer is
> >
> > s/begging/beginning/
> >
> > > not yet filled and there are errors on CSI-2 bus resulting dropping
> > > frames. Then we can call to DelayedControls::get() with frame number
> > > that exceed number of saved entries and get below assertion failure:
> > >
> > > ../src/libcamera/delayed_controls.cpp:227:
> > > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
> > > Assertion `info.type() != ControlTypeNone' failed
> > >
> > > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
> > > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> > > ---
> > > Notes: When debugging this I notice that the push() and get() are
> > > done in different threads. Maybe mutex should be
> > > used? Not sure how synchronization works for mojom signals
> > > handlers.
>
> The comment related to the SoftwareISP::inputBufferReady signal, in the
> SimpleCameraData::init() function, possibly applies to other signals of
> the SoftwareISP class, although I wouldn't expect it would affect the
> setSensorControls signal. Stanislaw, could you provide more information
> about which threads the push and get functions are called from ?
> Everything should be called from the camera manager thread.
>
> Looking at this more deeply, I think the outputBufferReady and
> statsReady signals may be mishandled, as they seem to be emitted from
> the soft ISP worker thread but not treated the same way as the
> inputBufferReady signal. We shouldn't have to deal with it when
> connecting the signals in the user, the SoftwareISP class should handle
> thread crossing, and emit all its signals from the camera manager
> thread. Milan, is this something you could address ?
>
> > I thought they would be in the same thread as the CameraManager event
> > loop...  Perhaps we could/should have an assertion that the thread is
> > the same - otherwise - yes it would probably need a lock if there's a
> > reader and a writer on both ends of the queue.
> >
> > I'd double check the threading first.
> >
> > > Also I'm not sure if similar change like in this patch
> > > is not needed for rpi DelayedControls
> >
> > Probably. There's very little different between the two implementations,
> > so we should probably keep them aligned with bugfixes at least until
> > someone tries to realign them both again.
>
> Naush, could you check this ?

Theoretically if our kernel would somehow dequeue buffers without
triggering the frame start event, this could hit us.  However, in that
situation something much worse has gone wrong before the invalid
indexing into the ring buffer.  I don't recall any user raising an
issue like this so I doubt it's a problem for us in practice.  I'm ok
to add a fix in our DelayedControls implementation, but I would have a
slight preference on changing the fix to seed the entire ring buffer
with initial values rather than clamp the index.

Naush


>
> > It looks like the cookie feature was easier to fork than to add, I can't
> > remember all the details ;_(
> >
> >
> > > Alternative fix would be to pre-fill ControlRingBuffer with
> > > valid control values. Should we care about possibility
> > > of queueCount_ overflowing ?
> >
> > I'm not sure what 'valid' control values would be ? I think just
> > clamping to the most recent data is probably the best we can do ? And 'I
> > think' would reflect the most up to date information we have about
> > what's configured on the sensor - so I think it's still even the 'most
> > correct' (and probably even 'is' correct) data we can return in the
> > get() call in this event.
> >
> >
> > I think I can already throw this on here, it seems like a sane input
> > parameter validation:
> >
> >
> > Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> >
> > >  src/libcamera/delayed_controls.cpp | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
> > > index 94d0a575..78b0b8f7 100644
> > > --- a/src/libcamera/delayed_controls.cpp
> > > +++ b/src/libcamera/delayed_controls.cpp
> > > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
> > >   */
> > >  ControlList DelayedControls::get(uint32_t sequence)
> > >  {
> > > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
> > > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
> > >
> > >         ControlList out(device_->controls());
> > >         for (const auto &ctrl : values_) {
>
> --
> Regards,
>
> Laurent Pinchart
Stanislaw Gruszka Dec. 11, 2024, 9:26 a.m. UTC | #4
On Wed, Dec 11, 2024 at 09:03:39AM +0000, Naushir Patuck wrote:
> On Wed, 11 Dec 2024 at 07:43, Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >
> > (CC'ing Naush for the question related to the Raspberry Pi
> > implementation)
> >
> > On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
> > > Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
> > > > Limit ControlRingBuffer index by number of queued entries in order do
> > > > avoid reading undefined control value with type ControlTypeNone .
> > > >
> > > > It can happen at the begging of streaming when ControlRingBuffer is
> > >
> > > s/begging/beginning/
> > >
> > > > not yet filled and there are errors on CSI-2 bus resulting dropping
> > > > frames. Then we can call to DelayedControls::get() with frame number
> > > > that exceed number of saved entries and get below assertion failure:
> > > >
> > > > ../src/libcamera/delayed_controls.cpp:227:
> > > > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
> > > > Assertion `info.type() != ControlTypeNone' failed
> > > >
> > > > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
> > > > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> > > > ---
> > > > Notes: When debugging this I notice that the push() and get() are
> > > > done in different threads. Maybe mutex should be
> > > > used? Not sure how synchronization works for mojom signals
> > > > handlers.
> >
> > The comment related to the SoftwareISP::inputBufferReady signal, in the
> > SimpleCameraData::init() function, possibly applies to other signals of
> > the SoftwareISP class, although I wouldn't expect it would affect the
> > setSensorControls signal. Stanislaw, could you provide more information
> > about which threads the push and get functions are called from ?
> > Everything should be called from the camera manager thread.
> >
> > Looking at this more deeply, I think the outputBufferReady and
> > statsReady signals may be mishandled, as they seem to be emitted from
> > the soft ISP worker thread but not treated the same way as the
> > inputBufferReady signal. We shouldn't have to deal with it when
> > connecting the signals in the user, the SoftwareISP class should handle
> > thread crossing, and emit all its signals from the camera manager
> > thread. Milan, is this something you could address ?
> >
> > > I thought they would be in the same thread as the CameraManager event
> > > loop...  Perhaps we could/should have an assertion that the thread is
> > > the same - otherwise - yes it would probably need a lock if there's a
> > > reader and a writer on both ends of the queue.
> > >
> > > I'd double check the threading first.
> > >
> > > > Also I'm not sure if similar change like in this patch
> > > > is not needed for rpi DelayedControls
> > >
> > > Probably. There's very little different between the two implementations,
> > > so we should probably keep them aligned with bugfixes at least until
> > > someone tries to realign them both again.
> >
> > Naush, could you check this ?
> 
> Theoretically if our kernel would somehow dequeue buffers without
> triggering the frame start event, this could hit us.  However, in that

After debug this further on my setup, this is actually wrong for simple
pipeline. We do not correctly subscribe for frameStart events and
never call DelayedControls::applyControls(uint32_t sequence).

I'll debug this a bit more and provide more details and answers
to questions.

Regards
Stanislaw

> indexing into the ring buffer.  I don't recall any user raising an
> issue like this so I doubt it's a problem for us in practice.  I'm ok
> to add a fix in our DelayedControls implementation, but I would have a
> slight preference on changing the fix to seed the entire ring buffer
> with initial values rather than clamp the index.
> 
> Naush
> 
> 
> >
> > > It looks like the cookie feature was easier to fork than to add, I can't
> > > remember all the details ;_(
> > >
> > >
> > > > Alternative fix would be to pre-fill ControlRingBuffer with
> > > > valid control values. Should we care about possibility
> > > > of queueCount_ overflowing ?
> > >
> > > I'm not sure what 'valid' control values would be ? I think just
> > > clamping to the most recent data is probably the best we can do ? And 'I
> > > think' would reflect the most up to date information we have about
> > > what's configured on the sensor - so I think it's still even the 'most
> > > correct' (and probably even 'is' correct) data we can return in the
> > > get() call in this event.
> > >
> > >
> > > I think I can already throw this on here, it seems like a sane input
> > > parameter validation:
> > >
> > >
> > > Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> > >
> > > >  src/libcamera/delayed_controls.cpp | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
> > > > index 94d0a575..78b0b8f7 100644
> > > > --- a/src/libcamera/delayed_controls.cpp
> > > > +++ b/src/libcamera/delayed_controls.cpp
> > > > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
> > > >   */
> > > >  ControlList DelayedControls::get(uint32_t sequence)
> > > >  {
> > > > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
> > > > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
> > > >
> > > >         ControlList out(device_->controls());
> > > >         for (const auto &ctrl : values_) {
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
Stefan Klug Dec. 11, 2024, 11:08 a.m. UTC | #5
Hi Stanislaw,

On Wed, Dec 11, 2024 at 10:26:26AM +0100, Stanislaw Gruszka wrote:
> On Wed, Dec 11, 2024 at 09:03:39AM +0000, Naushir Patuck wrote:
> > On Wed, 11 Dec 2024 at 07:43, Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> > >
> > > (CC'ing Naush for the question related to the Raspberry Pi
> > > implementation)
> > >
> > > On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
> > > > Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
> > > > > Limit ControlRingBuffer index by number of queued entries in order do
> > > > > avoid reading undefined control value with type ControlTypeNone .
> > > > >
> > > > > It can happen at the begging of streaming when ControlRingBuffer is
> > > >
> > > > s/begging/beginning/
> > > >
> > > > > not yet filled and there are errors on CSI-2 bus resulting dropping
> > > > > frames. Then we can call to DelayedControls::get() with frame number
> > > > > that exceed number of saved entries and get below assertion failure:
> > > > >
> > > > > ../src/libcamera/delayed_controls.cpp:227:
> > > > > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
> > > > > Assertion `info.type() != ControlTypeNone' failed
> > > > >
> > > > > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
> > > > > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> > > > > ---
> > > > > Notes: When debugging this I notice that the push() and get() are
> > > > > done in different threads. Maybe mutex should be
> > > > > used? Not sure how synchronization works for mojom signals
> > > > > handlers.
> > >
> > > The comment related to the SoftwareISP::inputBufferReady signal, in the
> > > SimpleCameraData::init() function, possibly applies to other signals of
> > > the SoftwareISP class, although I wouldn't expect it would affect the
> > > setSensorControls signal. Stanislaw, could you provide more information
> > > about which threads the push and get functions are called from ?
> > > Everything should be called from the camera manager thread.
> > >
> > > Looking at this more deeply, I think the outputBufferReady and
> > > statsReady signals may be mishandled, as they seem to be emitted from
> > > the soft ISP worker thread but not treated the same way as the
> > > inputBufferReady signal. We shouldn't have to deal with it when
> > > connecting the signals in the user, the SoftwareISP class should handle
> > > thread crossing, and emit all its signals from the camera manager
> > > thread. Milan, is this something you could address ?
> > >
> > > > I thought they would be in the same thread as the CameraManager event
> > > > loop...  Perhaps we could/should have an assertion that the thread is
> > > > the same - otherwise - yes it would probably need a lock if there's a
> > > > reader and a writer on both ends of the queue.
> > > >
> > > > I'd double check the threading first.
> > > >
> > > > > Also I'm not sure if similar change like in this patch
> > > > > is not needed for rpi DelayedControls
> > > >
> > > > Probably. There's very little different between the two implementations,
> > > > so we should probably keep them aligned with bugfixes at least until
> > > > someone tries to realign them both again.
> > >
> > > Naush, could you check this ?
> > 
> > Theoretically if our kernel would somehow dequeue buffers without
> > triggering the frame start event, this could hit us.  However, in that
> 
> After debug this further on my setup, this is actually wrong for simple
> pipeline. We do not correctly subscribe for frameStart events and
> never call DelayedControls::applyControls(uint32_t sequence).
> 
> I'll debug this a bit more and provide more details and answers
> to questions.
> 
> Regards
> Stanislaw
> 
> > indexing into the ring buffer.  I don't recall any user raising an
> > issue like this so I doubt it's a problem for us in practice.  I'm ok
> > to add a fix in our DelayedControls implementation, but I would have a
> > slight preference on changing the fix to seed the entire ring buffer
> > with initial values rather than clamp the index.

I would even go a step further and log an error and return an empty
ControlsList. If we hit that case something in the pipeline went wrong
and that needs fixing (as you also found out).

Regards,
Stefan

> > 
> > Naush
> > 
> > 
> > >
> > > > It looks like the cookie feature was easier to fork than to add, I can't
> > > > remember all the details ;_(
> > > >
> > > >
> > > > > Alternative fix would be to pre-fill ControlRingBuffer with
> > > > > valid control values. Should we care about possibility
> > > > > of queueCount_ overflowing ?
> > > >
> > > > I'm not sure what 'valid' control values would be ? I think just
> > > > clamping to the most recent data is probably the best we can do ? And 'I
> > > > think' would reflect the most up to date information we have about
> > > > what's configured on the sensor - so I think it's still even the 'most
> > > > correct' (and probably even 'is' correct) data we can return in the
> > > > get() call in this event.
> > > >
> > > >
> > > > I think I can already throw this on here, it seems like a sane input
> > > > parameter validation:
> > > >
> > > >
> > > > Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> > > >
> > > > >  src/libcamera/delayed_controls.cpp | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
> > > > > index 94d0a575..78b0b8f7 100644
> > > > > --- a/src/libcamera/delayed_controls.cpp
> > > > > +++ b/src/libcamera/delayed_controls.cpp
> > > > > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
> > > > >   */
> > > > >  ControlList DelayedControls::get(uint32_t sequence)
> > > > >  {
> > > > > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
> > > > > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
> > > > >
> > > > >         ControlList out(device_->controls());
> > > > >         for (const auto &ctrl : values_) {
> > >
> > > --
> > > Regards,
> > >
> > > Laurent Pinchart
Milan Zamazal Dec. 11, 2024, 2:13 p.m. UTC | #6
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:

> (CC'ing Naush for the question related to the Raspberry Pi
> implementation)
>
> On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
>> Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
>> > Limit ControlRingBuffer index by number of queued entries in order do
>> > avoid reading undefined control value with type ControlTypeNone .
>> > 
>> > It can happen at the begging of streaming when ControlRingBuffer is
>> 
>> s/begging/beginning/
>> 
>> > not yet filled and there are errors on CSI-2 bus resulting dropping
>> > frames. Then we can call to DelayedControls::get() with frame number
>> > that exceed number of saved entries and get below assertion failure:
>> > 
>> > ../src/libcamera/delayed_controls.cpp:227:
>> > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
>> > Assertion `info.type() != ControlTypeNone' failed
>> > 
>> > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
>> > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
>> > ---
>> > Notes: When debugging this I notice that the push() and get() are
>> > done in different threads. Maybe mutex should be
>> > used? Not sure how synchronization works for mojom signals
>> > handlers.
>
> The comment related to the SoftwareISP::inputBufferReady signal, in the
> SimpleCameraData::init() function, possibly applies to other signals of
> the SoftwareISP class, although I wouldn't expect it would affect the
> setSensorControls signal. Stanislaw, could you provide more information
> about which threads the push and get functions are called from ?
> Everything should be called from the camera manager thread.
>
> Looking at this more deeply, I think the outputBufferReady and
> statsReady signals may be mishandled, as they seem to be emitted from
> the soft ISP worker thread but not treated the same way as the
> inputBufferReady signal. We shouldn't have to deal with it when
> connecting the signals in the user, the SoftwareISP class should handle
> thread crossing, and emit all its signals from the camera manager
> thread. Milan, is this something you could address ?

Honestly, I usually get confused easily when trying to follow the
related flows and understanding what's what.  If I understand it
correctly, the pipeline handler instance is created in the camera
manager thread.  Thus inputBufferReady is run there and the other
signals could do the same the same way (there is apparently no reason
why not).  Then yes, it could be fixed easily.

BTW there is software ISP TODO #3 where you suggest that statsReady
could be possibly removed completely.  When I worked on buffer sharing,
I didn't see an obvious way/reason to do it, so let's fix just what's
immediately wrong about it for now.

>> I thought they would be in the same thread as the CameraManager event
>> loop...  Perhaps we could/should have an assertion that the thread is
>> the same - otherwise - yes it would probably need a lock if there's a
>> reader and a writer on both ends of the queue.
>> 
>> I'd double check the threading first.
>> 
>> > Also I'm not sure if similar change like in this patch
>> > is not needed for rpi DelayedControls
>> 
>> Probably. There's very little different between the two implementations,
>> so we should probably keep them aligned with bugfixes at least until
>> someone tries to realign them both again.
>
> Naush, could you check this ?
>
>> It looks like the cookie feature was easier to fork than to add, I can't
>> remember all the details ;_(
>> 
>> 
>> > Alternative fix would be to pre-fill ControlRingBuffer with
>> > valid control values. Should we care about possibility
>> > of queueCount_ overflowing ? 
>> 
>> I'm not sure what 'valid' control values would be ? I think just
>> clamping to the most recent data is probably the best we can do ? And 'I
>> think' would reflect the most up to date information we have about
>> what's configured on the sensor - so I think it's still even the 'most
>> correct' (and probably even 'is' correct) data we can return in the
>> get() call in this event.
>> 
>> 
>> I think I can already throw this on here, it seems like a sane input
>> parameter validation:
>> 
>> 
>> Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
>> 
>> >  src/libcamera/delayed_controls.cpp | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
>> > index 94d0a575..78b0b8f7 100644
>> > --- a/src/libcamera/delayed_controls.cpp
>> > +++ b/src/libcamera/delayed_controls.cpp
>> > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
>> >   */
>> >  ControlList DelayedControls::get(uint32_t sequence)
>> >  {
>> > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
>> > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
>> > 
>> >         ControlList out(device_->controls());
>> >         for (const auto &ctrl : values_) {
Stanislaw Gruszka Dec. 12, 2024, 10:45 a.m. UTC | #7
Hi all, thanks for review and comments.

On Wed, Dec 11, 2024 at 09:43:29AM +0200, Laurent Pinchart wrote:
> (CC'ing Naush for the question related to the Raspberry Pi
> implementation)
> 
> On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
> > Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
> > > Limit ControlRingBuffer index by number of queued entries in order do
> > > avoid reading undefined control value with type ControlTypeNone .
> > > 
> > > It can happen at the begging of streaming when ControlRingBuffer is
> > 
> > s/begging/beginning/
> > 
> > > not yet filled and there are errors on CSI-2 bus resulting dropping
> > > frames. Then we can call to DelayedControls::get() with frame number
> > > that exceed number of saved entries and get below assertion failure:
> > > 
> > > ../src/libcamera/delayed_controls.cpp:227:
> > > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
> > > Assertion `info.type() != ControlTypeNone' failed
> > > 
> > > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
> > > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> > > ---
> > > Notes: When debugging this I notice that the push() and get() are
> > > done in different threads. Maybe mutex should be
> > > used? Not sure how synchronization works for mojom signals
> > > handlers.
> 
> The comment related to the SoftwareISP::inputBufferReady signal, in the
> SimpleCameraData::init() function, possibly applies to other signals of
> the SoftwareISP class, although I wouldn't expect it would affect the
> setSensorControls signal. Stanislaw, could you provide more information
> about which threads the push and get functions are called from ?
> Everything should be called from the camera manager thread.

I added debug prints with backtraces to delayedControls push and get.
Here is the output:
https://gist.github.com/sgruszka/cfe3832a907887512a4953a3353c8806

Threads are different - have different tid, but they seems to be
serialized. I added extra:

std::this_thread::sleep_for(std::chrono::milliseconds(500)); 

to push and it did not change the ordering - get() was followed by push().

> Looking at this more deeply, I think the outputBufferReady and
> statsReady signals may be mishandled, as they seem to be emitted from
> the soft ISP worker thread but not treated the same way as the
> inputBufferReady signal. We shouldn't have to deal with it when
> connecting the signals in the user, the SoftwareISP class should handle
> thread crossing, and emit all its signals from the camera manager
> thread. Milan, is this something you could address ?
<snip>
> > I'm not sure what 'valid' control values would be ? I think just
> > clamping to the most recent data is probably the best we can do ? And 'I
> > think' would reflect the most up to date information we have about
> > what's configured on the sensor - so I think it's still even the 'most
> > correct' (and probably even 'is' correct) data we can return in the
> > get() call in this event.
> > 
> > Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> > 
> > >  src/libcamera/delayed_controls.cpp | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
> > > index 94d0a575..78b0b8f7 100644
> > > --- a/src/libcamera/delayed_controls.cpp
> > > +++ b/src/libcamera/delayed_controls.cpp
> > > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
> > >   */
> > >  ControlList DelayedControls::get(uint32_t sequence)
> > >  {
> > > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
> > > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);

There is some drawback except additional cycles pointed by Naush.
I can see possibility of queueCount_ overlap (for 30 FPS this would be
approximately after 2 years of using camera non-stop :-) )
Then we can have clamp<int> call with high value smaller than low value,
what is undefined behaviour. To prevent overlap issue this need to
be coded differently, what will add additional complexity (since
sequence variable can overlap as well).

After further debug, the actual problem is not handling startFrame
signal and not calling DelayedControls::applyControls(). Having
DeleyadControls::applyControls() will prevent the assertion as
DeleyadControls::get() will have proper control values on requested
frame number, even when some frames are missing.

One issue in current simple pipeline code is that we lack enabling
events by setFrameStartEnabled(true).
Second, we have to enable events for proper device. On my setup this is
CSI-2 receiver. Using video_ results in inappropriate ioctl error.

Below is a draft patch that try to address those problems.

Regards
Stanislaw

From 08d4caa36702da7efe7611365df8d417fedf38ee Mon Sep 17 00:00:00 2001
From: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Date: Thu, 12 Dec 2024 10:55:44 +0100
Subject: [PATCH] pipeline: simple: Use subdevice for frame start events

Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
---
 src/libcamera/pipeline/simple/simple.cpp | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/libcamera/pipeline/simple/simple.cpp b/src/libcamera/pipeline/simple/simple.cpp
index 8ac24e6e..64525b81 100644
--- a/src/libcamera/pipeline/simple/simple.cpp
+++ b/src/libcamera/pipeline/simple/simple.cpp
@@ -277,6 +277,7 @@ public:
 	std::list<Entity> entities_;
 	std::unique_ptr<CameraSensor> sensor_;
 	V4L2VideoDevice *video_;
+	V4L2Subdevice *subdev_;
 
 	std::vector<Configuration> configs_;
 	std::map<PixelFormat, std::vector<const Configuration *>> formats_;
@@ -561,6 +562,13 @@ int SimpleCameraData::init()
 	video_ = pipe->video(entities_.back().entity);
 	ASSERT(video_);
 
+	for (auto it = entities_.rbegin(); it != entities_.rend(); ++it) {
+		if (it->entity->type() == MediaEntity::Type::V4L2Subdevice) {
+			subdev_ = pipe->subdev(it->entity);
+			break;
+		}
+	}
+
 	/*
 	 * Setup links first as some subdev drivers take active links into
 	 * account to propagate TRY formats. Such is life :-(
@@ -1299,7 +1307,7 @@ int SimplePipelineHandler::configure(Camera *camera, CameraConfiguration *c)
 	data->delayedCtrls_ =
 		std::make_unique<DelayedControls>(data->sensor_->device(),
 						  params);
-	data->video_->frameStart.connect(data->delayedCtrls_.get(),
+	data->subdev_->frameStart.connect(data->delayedCtrls_.get(),
 					 &DelayedControls::applyControls);
 
 	StreamConfiguration inputCfg;
@@ -1339,6 +1347,7 @@ int SimplePipelineHandler::start(Camera *camera, [[maybe_unused]] const ControlL
 {
 	SimpleCameraData *data = cameraData(camera);
 	V4L2VideoDevice *video = data->video_;
+	V4L2Subdevice *subdev = data->subdev_;
 	int ret;
 
 	const MediaPad *pad = acquirePipeline(data);
@@ -1367,6 +1376,12 @@ int SimplePipelineHandler::start(Camera *camera, [[maybe_unused]] const ControlL
 	}
 
 	video->bufferReady.connect(data, &SimpleCameraData::imageBufferReady);
+	if (subdev) {
+		LOG(SimplePipeline, Debug)
+			<< "Enable frame start event on "
+			<< subdev->entity()->name();
+		subdev->setFrameStartEnabled(true);
+	}
 
 	ret = video->streamOn();
 	if (ret < 0) {
Milan Zamazal Dec. 18, 2024, 8:50 p.m. UTC | #8
Hi Stanislaw,

Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> writes:

> Hi all, thanks for review and comments.
>
> On Wed, Dec 11, 2024 at 09:43:29AM +0200, Laurent Pinchart wrote:
>> (CC'ing Naush for the question related to the Raspberry Pi
>> implementation)
>> 
>> On Tue, Dec 10, 2024 at 10:39:50PM +0000, Kieran Bingham wrote:
>> > Quoting Stanislaw Gruszka (2024-12-06 11:27:43)
>> > > Limit ControlRingBuffer index by number of queued entries in order do
>> > > avoid reading undefined control value with type ControlTypeNone .
>> > > 
>> > > It can happen at the begging of streaming when ControlRingBuffer is
>> > 
>> > s/begging/beginning/
>> > 
>> > > not yet filled and there are errors on CSI-2 bus resulting dropping
>> > > frames. Then we can call to DelayedControls::get() with frame number
>> > > that exceed number of saved entries and get below assertion failure:
>> > > 
>> > > ../src/libcamera/delayed_controls.cpp:227:
>> > > libcamera::ControlList libcamera::DelayedControls::get(uint32_t):
>> > > Assertion `info.type() != ControlTypeNone' failed
>> > > 
>> > > Bug: https://bugs.libcamera.org/show_bug.cgi?id=241
>> > > Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
>> > > ---
>> > > Notes: When debugging this I notice that the push() and get() are
>> > > done in different threads. Maybe mutex should be
>> > > used? Not sure how synchronization works for mojom signals
>> > > handlers.
>> 
>> The comment related to the SoftwareISP::inputBufferReady signal, in the
>> SimpleCameraData::init() function, possibly applies to other signals of
>> the SoftwareISP class, although I wouldn't expect it would affect the
>> setSensorControls signal. Stanislaw, could you provide more information
>> about which threads the push and get functions are called from ?
>> Everything should be called from the camera manager thread.
>
> I added debug prints with backtraces to delayedControls push and get.
> Here is the output:
> https://gist.github.com/sgruszka/cfe3832a907887512a4953a3353c8806
>
> Threads are different - have different tid, but they seems to be
> serialized. I added extra:
>
> std::this_thread::sleep_for(std::chrono::milliseconds(500)); 
>
> to push and it did not change the ordering - get() was followed by push().

Thank you for testing.  As I understand the source code comment:
  
  The inputBufferReady signal is emitted from the soft ISP thread,
  and needs to be handled in the pipeline handler thread. Signals
  implement queued delivery, but this works transparently only if
  the receiver is bound to the target thread. As the
  SimpleCameraData class doesn't inherit from the Object class, it
  is not bound to any thread, and the signal would be delivered
  synchronously. Instead, connect the signal to a lambda function
  bound explicitly to the pipe, which is bound to the pipeline
  handler thread. The function then simply forwards the call to
  conversionInputDone().

and since ispStatsReady and outputBufferReady handlers are initiated in
the same place:

  stats_->finishFrame(frame, 0);
  outputBufferReady.emit(output);

they are indeed run in the same thread and thus serialized.  But they
block the caller.  I'm not sure how much it matters here; in any case if
I redirect them to the pipeline handler thread similarly to
inputBufferReady, it works and is probably more correct.  I post a patch
tomorrow and we can discuss it there further.

>> Looking at this more deeply, I think the outputBufferReady and
>> statsReady signals may be mishandled, as they seem to be emitted from
>> the soft ISP worker thread but not treated the same way as the
>> inputBufferReady signal. We shouldn't have to deal with it when
>> connecting the signals in the user, the SoftwareISP class should handle
>> thread crossing, and emit all its signals from the camera manager
>> thread. Milan, is this something you could address ?
> <snip>
>> > I'm not sure what 'valid' control values would be ? I think just
>> > clamping to the most recent data is probably the best we can do ? And 'I
>> > think' would reflect the most up to date information we have about
>> > what's configured on the sensor - so I think it's still even the 'most
>> > correct' (and probably even 'is' correct) data we can return in the
>> > get() call in this event.
>> > 
>> > Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
>> > 
>> > >  src/libcamera/delayed_controls.cpp | 2 +-
>> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > > 
>> > > diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
>> > > index 94d0a575..78b0b8f7 100644
>> > > --- a/src/libcamera/delayed_controls.cpp
>> > > +++ b/src/libcamera/delayed_controls.cpp
>> > > @@ -202,7 +202,7 @@ bool DelayedControls::push(const ControlList &controls)
>> > >   */
>> > >  ControlList DelayedControls::get(uint32_t sequence)
>> > >  {
>> > > -       unsigned int index = std::max<int>(0, sequence - maxDelay_);
>> > > +       unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);
>
> There is some drawback except additional cycles pointed by Naush.
> I can see possibility of queueCount_ overlap (for 30 FPS this would be
> approximately after 2 years of using camera non-stop :-) )
> Then we can have clamp<int> call with high value smaller than low value,
> what is undefined behaviour. To prevent overlap issue this need to
> be coded differently, what will add additional complexity (since
> sequence variable can overlap as well).
>
> After further debug, the actual problem is not handling startFrame
> signal and not calling DelayedControls::applyControls(). Having
> DeleyadControls::applyControls() will prevent the assertion as
> DeleyadControls::get() will have proper control values on requested
> frame number, even when some frames are missing.
>
> One issue in current simple pipeline code is that we lack enabling
> events by setFrameStartEnabled(true).
> Second, we have to enable events for proper device. On my setup this is
> CSI-2 receiver. Using video_ results in inappropriate ioctl error.
>
> Below is a draft patch that try to address those problems.
>
> Regards
> Stanislaw
>
> From 08d4caa36702da7efe7611365df8d417fedf38ee Mon Sep 17 00:00:00 2001
> From: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> Date: Thu, 12 Dec 2024 10:55:44 +0100
> Subject: [PATCH] pipeline: simple: Use subdevice for frame start events
>
> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> ---
>  src/libcamera/pipeline/simple/simple.cpp | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/src/libcamera/pipeline/simple/simple.cpp b/src/libcamera/pipeline/simple/simple.cpp
> index 8ac24e6e..64525b81 100644
> --- a/src/libcamera/pipeline/simple/simple.cpp
> +++ b/src/libcamera/pipeline/simple/simple.cpp
> @@ -277,6 +277,7 @@ public:
>  	std::list<Entity> entities_;
>  	std::unique_ptr<CameraSensor> sensor_;
>  	V4L2VideoDevice *video_;
> +	V4L2Subdevice *subdev_;
>  
>  	std::vector<Configuration> configs_;
>  	std::map<PixelFormat, std::vector<const Configuration *>> formats_;
> @@ -561,6 +562,13 @@ int SimpleCameraData::init()
>  	video_ = pipe->video(entities_.back().entity);
>  	ASSERT(video_);
>  
> +	for (auto it = entities_.rbegin(); it != entities_.rend(); ++it) {
> +		if (it->entity->type() == MediaEntity::Type::V4L2Subdevice) {
> +			subdev_ = pipe->subdev(it->entity);
> +			break;
> +		}
> +	}
> +
>  	/*
>  	 * Setup links first as some subdev drivers take active links into
>  	 * account to propagate TRY formats. Such is life :-(
> @@ -1299,7 +1307,7 @@ int SimplePipelineHandler::configure(Camera *camera, CameraConfiguration *c)
>  	data->delayedCtrls_ =
>  		std::make_unique<DelayedControls>(data->sensor_->device(),
>  						  params);
> -	data->video_->frameStart.connect(data->delayedCtrls_.get(),
> +	data->subdev_->frameStart.connect(data->delayedCtrls_.get(),
>  					 &DelayedControls::applyControls);
>  
>  	StreamConfiguration inputCfg;
> @@ -1339,6 +1347,7 @@ int SimplePipelineHandler::start(Camera *camera, [[maybe_unused]] const ControlL
>  {
>  	SimpleCameraData *data = cameraData(camera);
>  	V4L2VideoDevice *video = data->video_;
> +	V4L2Subdevice *subdev = data->subdev_;
>  	int ret;
>  
>  	const MediaPad *pad = acquirePipeline(data);
> @@ -1367,6 +1376,12 @@ int SimplePipelineHandler::start(Camera *camera, [[maybe_unused]] const ControlL
>  	}
>  
>  	video->bufferReady.connect(data, &SimpleCameraData::imageBufferReady);
> +	if (subdev) {
> +		LOG(SimplePipeline, Debug)
> +			<< "Enable frame start event on "
> +			<< subdev->entity()->name();
> +		subdev->setFrameStartEnabled(true);
> +	}
>  
>  	ret = video->streamOn();
>  	if (ret < 0) {

Patch
diff mbox series

diff --git a/src/libcamera/delayed_controls.cpp b/src/libcamera/delayed_controls.cpp
index 94d0a575..78b0b8f7 100644
--- a/src/libcamera/delayed_controls.cpp
+++ b/src/libcamera/delayed_controls.cpp
@@ -202,7 +202,7 @@  bool DelayedControls::push(const ControlList &controls)
  */
 ControlList DelayedControls::get(uint32_t sequence)
 {
-	unsigned int index = std::max<int>(0, sequence - maxDelay_);
+	unsigned int index = std::clamp<int>(sequence - maxDelay_, 0, queueCount_ - 1);

 	ControlList out(device_->controls());
 	for (const auto &ctrl : values_) {