From patchwork Thu Jun 30 20:32:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Pinchart X-Patchwork-Id: 16491 Return-Path: X-Original-To: parsemail@patchwork.libcamera.org Delivered-To: parsemail@patchwork.libcamera.org Received: from lancelot.ideasonboard.com (lancelot.ideasonboard.com [92.243.16.209]) by patchwork.libcamera.org (Postfix) with ESMTPS id 1CEEEBD808 for ; Thu, 30 Jun 2022 20:32:51 +0000 (UTC) Received: from lancelot.ideasonboard.com (localhost [IPv6:::1]) by lancelot.ideasonboard.com (Postfix) with ESMTP id 4ACA26564F; Thu, 30 Jun 2022 22:32:50 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=libcamera.org; s=mail; t=1656621170; bh=4jsfxW2utTClHjYT8mD+Cva3IEExR9uqwHWdIYD/iRA=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=n5vVJWBhpk0zvLB+8eK0YTHR5eU9o2Ilv/6a7wN1Q3LDy3pe7aZpWctypG376nqod i0HI9AtIKxEFDYKKFGAEkVyVF4VZaik6Xvl7RZxL8g/4+XQopXS0AwoPvFAOVRtF67 WpJO3AOqj8q2UGmhrc/yCHqi/UwrIpOocnR0zb18VH2LuVSUiHfgT2Y0zZJxdhtza2 NCRWWZ7A8HA0L2DXW6NmDuvULqJqlf5RRmYgJuVPeHlTkhJKCCkMoLKcgdk9GUt0Zt P7R2RGz1xQ8BrrzRythzkLSpjdyzIU7I7Oy/lzvhlhmK/G1lbElqbrzKTEUEFJYRVw ulPlLKwKxaR+w== Received: from perceval.ideasonboard.com (perceval.ideasonboard.com [IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647]) by lancelot.ideasonboard.com (Postfix) with ESMTPS id 82FAA65633 for ; Thu, 30 Jun 2022 22:32:48 +0200 (CEST) Authentication-Results: lancelot.ideasonboard.com; dkim=pass (1024-bit key; unprotected) header.d=ideasonboard.com header.i=@ideasonboard.com header.b="g/cdltVe"; dkim-atps=neutral Received: from pendragon.lan (62-78-145-57.bb.dnainternet.fi [62.78.145.57]) by perceval.ideasonboard.com (Postfix) with ESMTPSA id E2DAC45F; Thu, 30 Jun 2022 22:32:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com; s=mail; t=1656621168; bh=4jsfxW2utTClHjYT8mD+Cva3IEExR9uqwHWdIYD/iRA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g/cdltVe3v96hhRtItPhT2s+91wadEKd5gZ9lvVUATHSxEpNXxao8bB7AZwuxZl0k jmjTELGxAei95qTfAt72RKkwmO0obPCPeD1IR7TmuwWFUM6E/bvvwEVIXyElDfxb1V ig9vpPEsRfoB4ZZPjJpRjONt9MTcIEwBsvyk4oSA= To: libcamera-devel@lists.libcamera.org Date: Thu, 30 Jun 2022 23:32:21 +0300 Message-Id: <20220630203221.12440-1-laurent.pinchart@ideasonboard.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220630000251.31295-13-laurent.pinchart@ideasonboard.com> References: <20220630000251.31295-13-laurent.pinchart@ideasonboard.com> MIME-Version: 1.0 Subject: [libcamera-devel] [PATCH v2.1 12/12] gstreamer: Fix race conditions in task pause/resume X-BeenThere: libcamera-devel@lists.libcamera.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Laurent Pinchart via libcamera-devel From: Laurent Pinchart Reply-To: Laurent Pinchart Cc: Nicolas Dufresne Errors-To: libcamera-devel-bounces@lists.libcamera.org Sender: "libcamera-devel" The task run function races with two other threads that want to resume the task: the requestCompleted() handler and the buffer-notify signal handler. If the former queues completed requests or the latter queues back buffers to the pool, and then resume the task, after the task run handler checks the queues but before it attemps to pause the task, then the task may be paused without noticing that more work is available. The most immediate way to fix this is to take the stream_lock in the requestCompleted() and buffer-notify signal handlers, or cover the whole task run handler with the GstLibcameraSrcState lock. This could cause long delays in the requestCompleted() handler, so that's not a good option. Instead, add a wakeup flag, preotected by the GstLibcameraSrcState lock, that allows detection of a lost race, and retry the task run. Signed-off-by: Laurent Pinchart Reviewed-by: Nicolas Dufresne --- Changes since v2: - Invert the pause/resume logic Changes since v1: - Fix incorrect wakeup and pause logic --- src/gstreamer/gstlibcamerasrc.cpp | 71 ++++++++++++++++++++++--------- 1 file changed, 50 insertions(+), 21 deletions(-) diff --git a/src/gstreamer/gstlibcamerasrc.cpp b/src/gstreamer/gstlibcamerasrc.cpp index 9ea59631a9f2..a7a0c4403e28 100644 --- a/src/gstreamer/gstlibcamerasrc.cpp +++ b/src/gstreamer/gstlibcamerasrc.cpp @@ -259,6 +259,7 @@ GstLibcameraSrcState::requestCompleted(Request *request) int GstLibcameraSrcState::processRequest() { std::unique_ptr wrap; + int err = 0; { MutexLocker locker(lock_); @@ -267,10 +268,13 @@ int GstLibcameraSrcState::processRequest() wrap = std::move(completedRequests_.front()); completedRequests_.pop(); } + + if (completedRequests_.empty()) + err = -ENOBUFS; } if (!wrap) - return -ENODATA; + return -ENOBUFS; GstFlowReturn ret = GST_FLOW_OK; gst_flow_combiner_reset(src_->flow_combiner); @@ -310,7 +314,7 @@ int GstLibcameraSrcState::processRequest() return -EPIPE; } - return 0; + return err; } static bool @@ -380,47 +384,72 @@ gst_libcamera_src_task_run(gpointer user_data) GstLibcameraSrc *self = GST_LIBCAMERA_SRC(user_data); GstLibcameraSrcState *state = self->state; + /* + * Start by pausing the task. The task may also get resumed by the + * buffer-notify signal when new buffers are queued back to the pool, + * or by the request completion handler when a new request has + * completed. Both will resume the task after adding the buffers or + * request to their respective lists, which are checked below to decide + * if the task needs to be resumed for another iteration. This is thus + * guaranteed to be race-free, the lock taken by gst_task_pause() and + * gst_task_resume() serves as a memory barrier. + */ + gst_task_pause(self->task); + + bool doResume = false; + /* * Create and queue one request. If no buffers are available the * function returns -ENOBUFS, which we ignore here as that's not a * fatal error. */ int ret = state->queueRequest(); - if (ret == -ENOMEM) { + switch (ret) { + case 0: + /* + * The request was successfully queued, there may be enough + * buffers to create a new one. Don't pause the task to give it + * another try. + */ + doResume = true; + break; + + case -ENOMEM: GST_ELEMENT_ERROR(self, RESOURCE, NO_SPACE_LEFT, ("Failed to allocate request for camera '%s'.", state->cam_->id().c_str()), ("libcamera::Camera::createRequest() failed")); gst_task_stop(self->task); return; + + case -ENOBUFS: + default: + break; } - /* Process one completed request, if available. */ + /* + * Process one completed request, if available, and record if further + * requests are ready for processing. + */ ret = state->processRequest(); switch (ret) { + case 0: + /* Another completed request is available, resume the task. */ + doResume = true; + break; + case -EPIPE: gst_task_stop(self->task); return; - case -ENODATA: - gst_task_pause(self->task); - return; + case -ENOBUFS: + default: + break; } - /* - * Here we need to decide if we want to pause. This needs to - * happen in lock step with the callback thread which may want - * to resume the task and might push pending buffers. - */ - bool do_pause; - - { - MutexLocker locker(state->lock_); - do_pause = state->completedRequests_.empty(); - } - - if (do_pause) - gst_task_pause(self->task); + /* Resume the task for another iteration if needed. */ + if (doResume) + gst_task_resume(self->task); } static void