libcamera: udev: Catch udev notification errors
diff mbox series

Message ID 20240805223148.30605-1-kieran.bingham@ideasonboard.com
State Accepted
Headers show
Series
  • libcamera: udev: Catch udev notification errors
Related show

Commit Message

Kieran Bingham Aug. 5, 2024, 10:31 p.m. UTC
The udev_monitor_receive_device() can return NULL on an error as
detailed in the man pages for the function.

The udevNotify() handler in the DeviceEnumeratorUdev directly uses the
return value of udev_monitor_receive_device() in successive calls to
process the event without having first checked the udev_device.

Ensure we identify, and handle events where the udev_device can not be
returned successfully.

Bug: https://bugs.libcamera.org/show_bug.cgi?id=230
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
---
This is a proposed fix for the reported CI failure caught by Systemd
runners. It's not clear if this failure was an injected failure, startup
race or otherwise - but given the documentation of this function states
it can return null ... it seems valid to null check before passing into
other APIs especially when a trace is reported directly at this point.

I don't presently have a way to replicate or reproduce the error either,
but this patch itself has passed our ci tests (which isn't too
unexpected as the change is quite minimal otherwise).


 src/libcamera/device_enumerator_udev.cpp | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Stefan Klug Aug. 6, 2024, 7:19 a.m. UTC | #1
Hi Kieran,

Thank you for the patch. 

On Mon, Aug 05, 2024 at 11:31:48PM +0100, Kieran Bingham wrote:
> The udev_monitor_receive_device() can return NULL on an error as
> detailed in the man pages for the function.
> 
> The udevNotify() handler in the DeviceEnumeratorUdev directly uses the
> return value of udev_monitor_receive_device() in successive calls to
> process the event without having first checked the udev_device.
> 
> Ensure we identify, and handle events where the udev_device can not be
> returned successfully.
> 
> Bug: https://bugs.libcamera.org/show_bug.cgi?id=230
> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> ---
> This is a proposed fix for the reported CI failure caught by Systemd
> runners. It's not clear if this failure was an injected failure, startup
> race or otherwise - but given the documentation of this function states
> it can return null ... it seems valid to null check before passing into
> other APIs especially when a trace is reported directly at this point.
> 
> I don't presently have a way to replicate or reproduce the error either,
> but this patch itself has passed our ci tests (which isn't too
> unexpected as the change is quite minimal otherwise).
> 
> 
>  src/libcamera/device_enumerator_udev.cpp | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/libcamera/device_enumerator_udev.cpp b/src/libcamera/device_enumerator_udev.cpp
> index 01c70b6daa82..c6c78651baa0 100644
> --- a/src/libcamera/device_enumerator_udev.cpp
> +++ b/src/libcamera/device_enumerator_udev.cpp
> @@ -332,6 +332,12 @@ int DeviceEnumeratorUdev::addV4L2Device(dev_t devnum)
>  void DeviceEnumeratorUdev::udevNotify()
>  {
>  	struct udev_device *dev = udev_monitor_receive_device(monitor_);
> +	if (!dev) {
> +		LOG(DeviceEnumerator, Warning)
> +			<< "Ignoring notfication received without a device";

Is that even worth a warning or only debug?

Anyways:
Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com> 

Cheers,
Stefan

> +		return;
> +	}
> +
>  	std::string_view action(udev_device_get_action(dev));
>  	std::string_view deviceNode(udev_device_get_devnode(dev));
>  
> -- 
> 2.45.2
>
Laurent Pinchart Aug. 6, 2024, 7:40 a.m. UTC | #2
On Tue, Aug 06, 2024 at 09:19:37AM +0200, Stefan Klug wrote:
> Hi Kieran,
> 
> Thank you for the patch. 
> 
> On Mon, Aug 05, 2024 at 11:31:48PM +0100, Kieran Bingham wrote:
> > The udev_monitor_receive_device() can return NULL on an error as
> > detailed in the man pages for the function.
> > 
> > The udevNotify() handler in the DeviceEnumeratorUdev directly uses the
> > return value of udev_monitor_receive_device() in successive calls to
> > process the event without having first checked the udev_device.
> > 
> > Ensure we identify, and handle events where the udev_device can not be
> > returned successfully.
> > 
> > Bug: https://bugs.libcamera.org/show_bug.cgi?id=230
> > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> > ---
> > This is a proposed fix for the reported CI failure caught by Systemd
> > runners. It's not clear if this failure was an injected failure, startup
> > race or otherwise - but given the documentation of this function states
> > it can return null ... it seems valid to null check before passing into
> > other APIs especially when a trace is reported directly at this point.
> > 
> > I don't presently have a way to replicate or reproduce the error either,
> > but this patch itself has passed our ci tests (which isn't too
> > unexpected as the change is quite minimal otherwise).
> > 
> > 
> >  src/libcamera/device_enumerator_udev.cpp | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/src/libcamera/device_enumerator_udev.cpp b/src/libcamera/device_enumerator_udev.cpp
> > index 01c70b6daa82..c6c78651baa0 100644
> > --- a/src/libcamera/device_enumerator_udev.cpp
> > +++ b/src/libcamera/device_enumerator_udev.cpp
> > @@ -332,6 +332,12 @@ int DeviceEnumeratorUdev::addV4L2Device(dev_t devnum)
> >  void DeviceEnumeratorUdev::udevNotify()
> >  {
> >  	struct udev_device *dev = udev_monitor_receive_device(monitor_);
> > +	if (!dev) {
> > +		LOG(DeviceEnumerator, Warning)
> > +			<< "Ignoring notfication received without a device";
> 
> Is that even worth a warning or only debug?

As we've never seen this product for years, I wonder why it's happening.
A warning would help noticing the issue. On the other hand, are such
errors means to happen in rare ubt valid cases ? And then, are they
meant to be ignored, or do they indicate something is fundamentally
wrong and action needs to be taken ? If no action is needed and there
are no consequences, then a debug message is more than enough.

If we want to diagnose this, we need more information. The message
should print the value of errno, and it would be nice if the original
reporter of the issues could then report the error number.

> Anyways:
> Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com> 
> 
> > +		return;
> > +	}
> > +
> >  	std::string_view action(udev_device_get_action(dev));
> >  	std::string_view deviceNode(udev_device_get_devnode(dev));
Kieran Bingham Aug. 6, 2024, 8:43 a.m. UTC | #3
Quoting Laurent Pinchart (2024-08-06 08:40:53)
> On Tue, Aug 06, 2024 at 09:19:37AM +0200, Stefan Klug wrote:
> > Hi Kieran,
> > 
> > Thank you for the patch. 
> > 
> > On Mon, Aug 05, 2024 at 11:31:48PM +0100, Kieran Bingham wrote:
> > > The udev_monitor_receive_device() can return NULL on an error as
> > > detailed in the man pages for the function.
> > > 
> > > The udevNotify() handler in the DeviceEnumeratorUdev directly uses the
> > > return value of udev_monitor_receive_device() in successive calls to
> > > process the event without having first checked the udev_device.
> > > 
> > > Ensure we identify, and handle events where the udev_device can not be
> > > returned successfully.
> > > 
> > > Bug: https://bugs.libcamera.org/show_bug.cgi?id=230
> > > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> > > ---
> > > This is a proposed fix for the reported CI failure caught by Systemd
> > > runners. It's not clear if this failure was an injected failure, startup
> > > race or otherwise - but given the documentation of this function states
> > > it can return null ... it seems valid to null check before passing into
> > > other APIs especially when a trace is reported directly at this point.
> > > 
> > > I don't presently have a way to replicate or reproduce the error either,
> > > but this patch itself has passed our ci tests (which isn't too
> > > unexpected as the change is quite minimal otherwise).
> > > 
> > > 
> > >  src/libcamera/device_enumerator_udev.cpp | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/src/libcamera/device_enumerator_udev.cpp b/src/libcamera/device_enumerator_udev.cpp
> > > index 01c70b6daa82..c6c78651baa0 100644
> > > --- a/src/libcamera/device_enumerator_udev.cpp
> > > +++ b/src/libcamera/device_enumerator_udev.cpp
> > > @@ -332,6 +332,12 @@ int DeviceEnumeratorUdev::addV4L2Device(dev_t devnum)
> > >  void DeviceEnumeratorUdev::udevNotify()
> > >  {
> > >     struct udev_device *dev = udev_monitor_receive_device(monitor_);
> > > +   if (!dev) {
> > > +           LOG(DeviceEnumerator, Warning)
> > > +                   << "Ignoring notfication received without a device";
> > 
> > Is that even worth a warning or only debug?
> 
> As we've never seen this product for years, I wonder why it's happening.
> A warning would help noticing the issue. On the other hand, are such
> errors means to happen in rare ubt valid cases ? And then, are they
> meant to be ignored, or do they indicate something is fundamentally
> wrong and action needs to be taken ? If no action is needed and there
> are no consequences, then a debug message is more than enough.
> 
> If we want to diagnose this, we need more information. The message
> should print the value of errno, and it would be nice if the original
> reporter of the issues could then report the error number.

Oh interesting idea. I didn't notice if the man pages reference setting
errno - but it's worth reporting to help diagnose the issue for sure.

 https://github.com/systemd/systemd/blob/5a5a7093b8e054cf3bfa65716ffe77a5d93647ff/src/libudev/libudev-monitor.c#L239
calls return_with_errno(NULL, r);, so I think errno is indeed
worthwhile reporting.

I'll update in v2.

--
Kieran


> > Anyways:
> > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com> 
> > 
> > > +           return;
> > > +   }
> > > +
> > >     std::string_view action(udev_device_get_action(dev));
> > >     std::string_view deviceNode(udev_device_get_devnode(dev));
> 
> -- 
> Regards,
> 
> Laurent Pinchart
Kieran Bingham Aug. 7, 2024, 1:49 p.m. UTC | #4
Quoting Stefan Klug (2024-08-06 08:19:37)
> Hi Kieran,
> 
> Thank you for the patch. 
> 
> On Mon, Aug 05, 2024 at 11:31:48PM +0100, Kieran Bingham wrote:
> > The udev_monitor_receive_device() can return NULL on an error as
> > detailed in the man pages for the function.
> > 
> > The udevNotify() handler in the DeviceEnumeratorUdev directly uses the
> > return value of udev_monitor_receive_device() in successive calls to
> > process the event without having first checked the udev_device.
> > 
> > Ensure we identify, and handle events where the udev_device can not be
> > returned successfully.
> > 
> > Bug: https://bugs.libcamera.org/show_bug.cgi?id=230
> > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
> > ---
> > This is a proposed fix for the reported CI failure caught by Systemd
> > runners. It's not clear if this failure was an injected failure, startup
> > race or otherwise - but given the documentation of this function states
> > it can return null ... it seems valid to null check before passing into
> > other APIs especially when a trace is reported directly at this point.
> > 
> > I don't presently have a way to replicate or reproduce the error either,
> > but this patch itself has passed our ci tests (which isn't too
> > unexpected as the change is quite minimal otherwise).
> > 
> > 
> >  src/libcamera/device_enumerator_udev.cpp | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/src/libcamera/device_enumerator_udev.cpp b/src/libcamera/device_enumerator_udev.cpp
> > index 01c70b6daa82..c6c78651baa0 100644
> > --- a/src/libcamera/device_enumerator_udev.cpp
> > +++ b/src/libcamera/device_enumerator_udev.cpp
> > @@ -332,6 +332,12 @@ int DeviceEnumeratorUdev::addV4L2Device(dev_t devnum)
> >  void DeviceEnumeratorUdev::udevNotify()
> >  {
> >       struct udev_device *dev = udev_monitor_receive_device(monitor_);
> > +     if (!dev) {
> > +             LOG(DeviceEnumerator, Warning)
> > +                     << "Ignoring notfication received without a device";
> 
> Is that even worth a warning or only debug?
> 
> Anyways:
> Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com> 

Thanks, but I'll assume that was "Reviewed-by: " :-)

> 
> Cheers,
> Stefan
> 
> > +             return;
> > +     }
> > +
> >       std::string_view action(udev_device_get_action(dev));
> >       std::string_view deviceNode(udev_device_get_devnode(dev));
> >  
> > -- 
> > 2.45.2
> >

Patch
diff mbox series

diff --git a/src/libcamera/device_enumerator_udev.cpp b/src/libcamera/device_enumerator_udev.cpp
index 01c70b6daa82..c6c78651baa0 100644
--- a/src/libcamera/device_enumerator_udev.cpp
+++ b/src/libcamera/device_enumerator_udev.cpp
@@ -332,6 +332,12 @@  int DeviceEnumeratorUdev::addV4L2Device(dev_t devnum)
 void DeviceEnumeratorUdev::udevNotify()
 {
 	struct udev_device *dev = udev_monitor_receive_device(monitor_);
+	if (!dev) {
+		LOG(DeviceEnumerator, Warning)
+			<< "Ignoring notfication received without a device";
+		return;
+	}
+
 	std::string_view action(udev_device_get_action(dev));
 	std::string_view deviceNode(udev_device_get_devnode(dev));