Message ID | 20240121035948.4226-1-laurent.pinchart@ideasonboard.com |
---|---|
Headers | show |
Series |
|
Related | show |
On Sun, Jan 21, 2024 at 05:59:36AM +0200, Laurent Pinchart via libcamera-devel wrote: > Hello, > > Bug https://bugs.libcamera.org/show_bug.cgi?id=208 reports a race > condition that is ultimately due to incorrect usage of the libcamera > multi-threading infrastructure by the soft ISP code under development. > Instead of blaming the author of that code, I believe it shows we're not > doing well enough at communicating the threading requirements. This > patch series is an attempt to improve the situation. > > libcamera has a threading model that is documented. In particular, the > documentation classifies functions as thread-safe, thread-bound and > reentrant. Patch 01/12 starts by fixing a minor issue in the > documentation of a thread-bound function that does not contain the > correct reference. > > Patch 02/12 is a drive-by improvement that is, strictly speaking, > unrelated to this series, but was developed at the same time. > > The next two patches fix a first thread-related issues with the > Object::deleteLater() function. Patch 03/12 extends a unit test to > display the issue, and patch 04/12 fixes it. > > Patches 05/12 to 10/12 continue with fixing various kinds of incorrect > deletion of objects in unit tests. It turned out that all the race > conditions related to this patch series are in unit tests, libcamera > itself isn't (as far as I could see) affected. This is good news, even > if it means we haven't been careful enough when writing unit tests, > which calls for improvements in that area in the future. > > Patches 11/12 and 12/12 finally add assertions to complain loudly about > incorrect deletion of Object instances (11/12) and incorrect calling > contexts of thread-bound functions (12/12). > > Only the thread-bound members of Object subclasses have been hardened, > more work is needed to extend this to individual members of other > classes (currently only DeviceEnumerator::enumerate()), and to classes > that are marked as thread-bound at the class level (IPCUnixSocket and > V4L2VideoDevice). > > Even if V4L2VideoDevice isn't addressed directly by this series, and the > bug report is related to usage of the V4L2VideoDevice class, I believe > the issue in the soft ISP code will be caught by the assertions added to > the EventNotifier class. Milan, would you be able to test this, and > confirm that libcamera now complains loudly ? Also, I would like more extensive testing on Raspberry Pi to make sure this doesn't cause any regression. Naush, David, would you be able to give this series a try ? > Laurent Pinchart (12): > libcamera: object: Fix thread-bound reference in documentation > libcamera: signal: Replace object.h inclusion with forward declatation > test: object-delete: Test deferred delete just before thread stops > libcamera: thread: Ensure deferred deletion of all objects before > stopping > test: event-thread: Destroy Object from correct thread context > test: message: Remove incorrect slow receiver test > test: message: Destroy Object from correct thread context > test: signal-threads: Destroy Object from correct thread context > test: timer-thread: Move timer start from wrong thread to separate > test > test: timer-thread: Destroy Object from correct thread context > libcamera: object: Document and ensure Object deletion constraints > libcamera: object: Add and use thread-bound assertion > > include/libcamera/base/object.h | 2 + > include/libcamera/base/signal.h | 3 +- > src/libcamera/base/bound_method.cpp | 1 + > src/libcamera/base/event_notifier.cpp | 6 + > src/libcamera/base/object.cpp | 50 +++++++- > src/libcamera/base/signal.cpp | 1 + > src/libcamera/base/thread.cpp | 7 ++ > src/libcamera/base/timer.cpp | 10 +- > test/event-thread.cpp | 38 +++++-- > test/ipa/ipa_interface_test.cpp | 1 + > test/meson.build | 9 +- > test/message.cpp | 54 +++------ > test/object-delete.cpp | 30 ++++- > test/signal-threads.cpp | 24 ++-- > test/timer-fail.cpp | 107 ++++++++++++++++++ > test/timer-thread.cpp | 37 ++---- > .../module_ipa_proxy.h.tmpl | 1 + > 17 files changed, 276 insertions(+), 105 deletions(-) > create mode 100644 test/timer-fail.cpp > > > base-commit: 0d99f2de13faccf199e6c9e806702cb21d0b6c24
Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > Even if V4L2VideoDevice isn't addressed directly by this series, and the > bug report is related to usage of the V4L2VideoDevice class, I believe > the issue in the soft ISP code will be caught by the assertions added to > the EventNotifier class. Milan, would you be able to test this, and > confirm that libcamera now complains loudly ? Nice cleanup, thank you! Yes, this complains immediately and loudly in the given case. Now a related question is how to deal with threads and instances of classes that don't inherit Object. I think V4L2VideoDevice::queueBuffer is a good example. V4L2VideoDevice is thread-bound but doesn't inherit Object. So the caller, unless it has created the given instance itself (and being thread-bound itself or tracking its initial thread), has no idea about the thread a V4L2VideoDevice instance is bound too, even less how to call it there, and even less without worrying about implications of asynchronous calls. This means ensuring most things run in the same thread. This may be easy in some cases and less fine in other ones. It discourages using multiple threads, not sure whether this is a good thing or not. It's not robust in any case. A possible alternative would be making less things thread-bound. I suppose there are reasons why this is not happening in libcamera. Another alternative would be to use Object (as a direct base class or in wrappers) or Thread (together with initial thread tracking) to call the right thread in more places. It would be limited to situations where possibly asynchronous invocations don't harm if nothing else. The call chain in the software ISP branch, DebayerCpu -> SwIspSimple -> SimpleCameraData -> V4L2VideoDevice (thread-bound) -> EventNotifier (inheriting Object), must be changed to eventually invoke V4L2VideoDevice::queueBuffer in the right thread. Is the recommended practice to examine thoroughly what's running where or are there any simpler options? Thanks, Milan
On 22.01.2024 19:30, Milan Zamazal wrote: > Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > >> Even if V4L2VideoDevice isn't addressed directly by this series, and the >> bug report is related to usage of the V4L2VideoDevice class, I believe >> the issue in the soft ISP code will be caught by the assertions added to >> the EventNotifier class. Milan, would you be able to test this, and >> confirm that libcamera now complains loudly ? > > Nice cleanup, thank you! +1, thanks Laurent! > Yes, this complains immediately and loudly in the > given case. I was also able to get the assertion on my much faster board. This required to use the highest resolution for capture, and it can take some time (tens of seconds) before the assertion is triggered. > Now a related question is how to deal with threads and instances of classes that > don't inherit Object. > > I think V4L2VideoDevice::queueBuffer is a good example. V4L2VideoDevice is > thread-bound but doesn't inherit Object. So the caller, unless it has created > the given instance itself (and being thread-bound itself or tracking its initial > thread), has no idea about the thread a V4L2VideoDevice instance is bound too, > even less how to call it there, and even less without worrying about > implications of asynchronous calls. I can create a proxy Object (to call V4L2VideoDevice::queueBuffer from) in the pipeline handler thread and connect it to the Soft ISP's signal, but it doesn't look like a nice solution... > This means ensuring most things run in the same thread. This may be easy in > some cases and less fine in other ones. It discourages using multiple threads, > not sure whether this is a good thing or not. It's not robust in any case. > > A possible alternative would be making less things thread-bound. I suppose > there are reasons why this is not happening in libcamera. > > Another alternative would be to use Object (as a direct base class or in > wrappers) or Thread (together with initial thread tracking) to call the right > thread in more places. It would be limited to situations where possibly > asynchronous invocations don't harm if nothing else. ...yes, something along these lines. Thanks, Andrei > The call chain in the software ISP branch, DebayerCpu -> SwIspSimple -> > SimpleCameraData -> V4L2VideoDevice (thread-bound) -> EventNotifier (inheriting > Object), must be changed to eventually invoke V4L2VideoDevice::queueBuffer in > the right thread. Is the recommended practice to examine thoroughly what's > running where or are there any simpler options? > > Thanks, > Milan >
On Tue, Jan 23, 2024 at 02:52:04AM +0300, Andrei Konovalov wrote: > On 22.01.2024 19:30, Milan Zamazal wrote: > > Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > > > >> Even if V4L2VideoDevice isn't addressed directly by this series, and the > >> bug report is related to usage of the V4L2VideoDevice class, I believe > >> the issue in the soft ISP code will be caught by the assertions added to > >> the EventNotifier class. Milan, would you be able to test this, and > >> confirm that libcamera now complains loudly ? > > > > Nice cleanup, thank you! > > +1, thanks Laurent! > > > Yes, this complains immediately and loudly in the > > given case. > > I was also able to get the assertion on my much faster board. > This required to use the highest resolution for capture, and it can take some > time (tens of seconds) before the assertion is triggered. > > > Now a related question is how to deal with threads and instances of classes that > > don't inherit Object. > > > > I think V4L2VideoDevice::queueBuffer is a good example. V4L2VideoDevice is > > thread-bound but doesn't inherit Object. So the caller, unless it has created > > the given instance itself (and being thread-bound itself or tracking its initial > > thread), has no idea about the thread a V4L2VideoDevice instance is bound too, > > even less how to call it there, and even less without worrying about > > implications of asynchronous calls. > > I can create a proxy Object (to call V4L2VideoDevice::queueBuffer from) in the pipeline > handler thread and connect it to the Soft ISP's signal, but it doesn't look like a > nice solution... Agreed, it's not very nice :-) > > This means ensuring most things run in the same thread. This may be easy in > > some cases and less fine in other ones. It discourages using multiple threads, > > not sure whether this is a good thing or not. It's not robust in any case. > > > > A possible alternative would be making less things thread-bound. I suppose > > there are reasons why this is not happening in libcamera. Yes, complexity. It's error-prone enough as it is, making more functions callable from different threads will mean more race conditions as all the callers will need to handle the thread-safety requirements. As much as possible of the code base is single-threaded in the sense that developers don't need to think about threads, and thread boundaries are handled with strict rules that need to be considered only when adding a new thread, which should be a rare case. > > Another alternative would be to use Object (as a direct base class or in > > wrappers) or Thread (together with initial thread tracking) to call the right > > thread in more places. It would be limited to situations where possibly > > asynchronous invocations don't harm if nothing else. > > ...yes, something along these lines. A good solution may be to model the soft ISP object like if it was a real hardware device. Give it functions to perform actions, and signals to report events. Internally, create a proxy to the thread to relay the function calls with invokeMethod(). The signals should work out of the box, if you emit them from the ISP thread, and connect the pipeline handler slots to them, you should get queued signal behaviour automatically as PipelineHandler inherits from Object. I think this will also make it easier to implement a GPU-based ISP later. > > The call chain in the software ISP branch, DebayerCpu -> SwIspSimple -> > > SimpleCameraData -> V4L2VideoDevice (thread-bound) -> EventNotifier (inheriting > > Object), must be changed to eventually invoke V4L2VideoDevice::queueBuffer in > > the right thread. Is the recommended practice to examine thoroughly what's > > running where or are there any simpler options? The soft ISP should not deal with V4L2 at all. It should be modelled as a processing component that takes incoming calls and generates events when processing is complete, like a V4L2VideoDevice does. It's then the job of the pipeline handler to connect the building blocks to create a pipeline.
On Mon, Jan 22, 2024 at 05:30:55PM +0100, Milan Zamazal wrote: > Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes: > > > Even if V4L2VideoDevice isn't addressed directly by this series, and the > > bug report is related to usage of the V4L2VideoDevice class, I believe > > the issue in the soft ISP code will be caught by the assertions added to > > the EventNotifier class. Milan, would you be able to test this, and > > confirm that libcamera now complains loudly ? > > Nice cleanup, thank you! Yes, this complains immediately and loudly in the > given case. > > Now a related question is how to deal with threads and instances of classes that > don't inherit Object. > > I think V4L2VideoDevice::queueBuffer is a good example. V4L2VideoDevice is > thread-bound but doesn't inherit Object. So the caller, unless it has created > the given instance itself (and being thread-bound itself or tracking its initial > thread), has no idea about the thread a V4L2VideoDevice instance is bound too, > even less how to call it there, and even less without worrying about > implications of asynchronous calls. That's kind of the point actually, by design: if a class doesn't inherit from Object, unless some of its functions are explicitly marked as being thread-safe, then it's not meant to be called from different threads. > This means ensuring most things run in the same thread. This may be easy in > some cases and less fine in other ones. It discourages using multiple threads, > not sure whether this is a good thing or not. It's not robust in any case. Yes, it discourages usage of multiple threads. libcamera instead encourages usage of event loops without blocking calls. Threads can be used when there's a good reason to do so. As that's uncommon, most development doesn't have to worry about race conditions. Only when adding a thread should the developer know what they're doing. > A possible alternative would be making less things thread-bound. I suppose > there are reasons why this is not happening in libcamera. > > Another alternative would be to use Object (as a direct base class or in > wrappers) or Thread (together with initial thread tracking) to call the right > thread in more places. It would be limited to situations where possibly > asynchronous invocations don't harm if nothing else. > > The call chain in the software ISP branch, DebayerCpu -> SwIspSimple -> > SimpleCameraData -> V4L2VideoDevice (thread-bound) -> EventNotifier (inheriting > Object), must be changed to eventually invoke V4L2VideoDevice::queueBuffer in > the right thread. Is the recommended practice to examine thoroughly what's > running where or are there any simpler options?