From patchwork Tue Mar 10 12:01:02 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans de Goede X-Patchwork-Id: 26271 Return-Path: X-Original-To: parsemail@patchwork.libcamera.org Delivered-To: parsemail@patchwork.libcamera.org Received: from lancelot.ideasonboard.com (lancelot.ideasonboard.com [92.243.16.209]) by patchwork.libcamera.org (Postfix) with ESMTPS id 71A06BE086 for ; Tue, 10 Mar 2026 12:01:20 +0000 (UTC) Received: from lancelot.ideasonboard.com (localhost [IPv6:::1]) by lancelot.ideasonboard.com (Postfix) with ESMTP id 1318062657; Tue, 10 Mar 2026 13:01:20 +0100 (CET) Authentication-Results: lancelot.ideasonboard.com; dkim=pass (2048-bit key; unprotected) header.d=qualcomm.com header.i=@qualcomm.com header.b="ZmR+DHXZ"; dkim=pass (2048-bit key; unprotected) header.d=oss.qualcomm.com header.i=@oss.qualcomm.com header.b="L0d9fckY"; dkim-atps=neutral Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by lancelot.ideasonboard.com (Postfix) with ESMTPS id 7B0C0622F1 for ; Tue, 10 Mar 2026 13:01:18 +0100 (CET) Received: from pps.filterd (m0279866.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62ABc4gI2021657 for ; Tue, 10 Mar 2026 12:01:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=qcppdkim1; bh=pf95oxuDZJ2 Gr6M9VS6ydxNzYiU2Skcmi0Q2u9BrCxo=; b=ZmR+DHXZrlDiSovhoxafo2qlmRw KgGB9bcXfsMhWANw+p0Qb/4RGGTsUTBERhrV9jGr9vWCkTEZpdzjqRGhZcqFp9DS 9dgkeBwNQ8JQpQqg2JJRBosAoxqNlGtJB+Xl8JD+fPj8AGqrabAx3ZnMVfspemNq l5JNtD0e2IFo63HNOgErGQHTCQt8pwqEdRgl8aTTl7xFF/ZQ1rhxjdKTUk0uztiP PE5yzVSf3XQiIOspS/ZpPCSRkBYxhMW7ASRadW/8g3eySM/Vvy/vv5y7b6Arihrn g1sG+5WBQK5+ke1XYQTe+JbqXrs8nLuufIWRmeOizmaayVrZtMtXH7R558g== Received: from mail-ua1-f69.google.com (mail-ua1-f69.google.com [209.85.222.69]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4cte3w977v-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Tue, 10 Mar 2026 12:01:16 +0000 (GMT) Received: by mail-ua1-f69.google.com with SMTP id a1e0cc1a2514c-94ea8c27188so11702729241.1 for ; Tue, 10 Mar 2026 05:01:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1773144075; x=1773748875; darn=lists.libcamera.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pf95oxuDZJ2Gr6M9VS6ydxNzYiU2Skcmi0Q2u9BrCxo=; b=L0d9fckYKr+rC5sKlux2J+tdkrC4fVZ5s9+XhUnjmsONqRzoccbqRKAT+JBnilCdng uUXePVVGa4djbTMIhiCSJsNStVc9I2Kx7aLbWhjfkteWNyVmOv7zC2hIk8NGNcnZeB99 i4BdFJSjzI82LH3A1ZwCkr07V3vS9/sB0ql25jfN5Md5ecTLL7TKNUk1zNMl9insa4Sc J2UIBmaNqd5ppKP7iVry1K77ox1XFMxkAQoxwhUE8JkQaJzuurIbVZlajXUWXpuL/kVa sTfIwHcy+N25U3qMeSwnxqkQeDGRBqMndkuCq4UyyLo0DcsjPj667Br8QTdZXMTg8yaY SMTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773144075; x=1773748875; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=pf95oxuDZJ2Gr6M9VS6ydxNzYiU2Skcmi0Q2u9BrCxo=; b=ljfO4i3jmrLigIpYv1IlQcfokYiZj6viZo7Mdjp5VJRFSrIGbsDV7YUqmfcZYWWJtZ ONxr7hHRyqVqbXlMR7QDsP2c08kJedTjFsUGjMvegWSYELKnmRzCqj4+MWc8jjXvuBes 6d1qvRo64MJpXZNjQ/k6XvS34TIQPUyhzGpb2qowRf2vy+XVl4G97Ef4/kIPTRw+fp2y /TTS69HMB1GgjlfXTXeZ3mtoMoEVXUrMRDb6JYh3H2MCsBKB+D9lxGgzlalWRuRLmzNy Mi7xlRgypjenRDhS0RdNd+wXt5zwuzQSPtAM2qMVO2WTkHm5PVt/rRB2eWf3zin5ON2F CHmA== X-Gm-Message-State: AOJu0YySmjytw5GIijTHnsa8i//hPPq3K6UstfPuhL1LDlOiOvw/bGY5 XzOb8kxWGmqgdlkXy2dbR/3kPCAZNFV2jwAYD4c5moKLEQs2dJi8Z0sMlh/GMC8rmaAwixyQedf t7ooTSOk611nLM6h21nvdG83nQrSJKMQbvcnH8wXu3Je9MDcjvFUr3IH7yIV/vcDkbB+NKKWbrM htGY3twDn/ X-Gm-Gg: ATEYQzyYNn5YtJw/R06EBJLa0x6wpYQ2fFmXny35loqTfVdJpAxEg+XBww1b6V/9ZBU 8gVx4HLNbaKy/zDDfk4eU8yVkLmY/bgyuWWiAoNrova8f4h0UJBJwne2fb2dDQ/j0aFu4SzCJB9 qyRMWs0kBdAZeRaD/NT+W9OiPGfGd8hhkerq9mHAlwnBXnBmTa8ZaFsRIWRduzSisAl5H+RFCR+ +EWDvSVfEfAjNq/SdNMfvzKfkLj/C3yoWAZNYPu5rrBF1OTuTcqSciNTAqXP5kKQ6LoODe9G5jQ OXoe2gRuIwXs03Z1z7dITmjPuMjN3QlUzvGLJunNQketAjZZbZS1Z7FEQm110ej5h1KwVi1pmZt c9cFYuyJLtM6FnnxgDOOv6bGAcE+CbmT67mhQ9V/dhH6e+tDHmWWjdzkevM+I1B4lcK+URbenbt EHayvRhqUkcSdDcW/qmcIHmsD17aIPRxqn1Q== X-Received: by 2002:a05:6102:292a:b0:5f5:2e63:f574 with SMTP id ada2fe7eead31-5ffe61e56f1mr5982796137.29.1773144073684; Tue, 10 Mar 2026 05:01:13 -0700 (PDT) X-Received: by 2002:a05:6102:292a:b0:5f5:2e63:f574 with SMTP id ada2fe7eead31-5ffe61e56f1mr5982644137.29.1773144071586; Tue, 10 Mar 2026 05:01:11 -0700 (PDT) Received: from t14s (2001-1c00-0c32-7800-beb3-9058-f5fe-3f2e.cable.dynamic.v6.ziggo.nl. [2001:1c00:c32:7800:beb3:9058:f5fe:3f2e]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48541aa7aacsm87843405e9.13.2026.03.10.05.01.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Mar 2026 05:01:10 -0700 (PDT) From: Hans de Goede To: libcamera-devel@lists.libcamera.org Cc: Milan Zamazal , Hans de Goede Subject: [PATCH v7 1/5] software_isp: swstats_cpu: Prepare for multi-threading support Date: Tue, 10 Mar 2026 13:01:02 +0100 Message-ID: <20260310120106.79922-2-johannes.goede@oss.qualcomm.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260310120106.79922-1-johannes.goede@oss.qualcomm.com> References: <20260310120106.79922-1-johannes.goede@oss.qualcomm.com> MIME-Version: 1.0 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzEwMDEwNCBTYWx0ZWRfX5HIBTM3/0TIu tVtvqIAJSfUB58Z5FVD/dSlbTD8+0JAgsn5FXsrwpkm7FxSWf6LQ0LQ7BGIDlzJF9WxgE/jXWpM ewTfX2Y/nho8eWRMhMnPlNHduUSaYrC9noQRuj/C2ewOlL8h9a7W8v/Ukoea5KzfGV0PJC6bCKg Fg6CS5GrE/MtMtxchxfAESdpHA3VW52MSjOaKjmfS9djEUi2bUfkA7q4/ph9JdlKQAbXlfzcS8x JsMkRXq5+7R0c9DEgXXC7KNatCFlVnSq5A4kQ5pNFRyG/EhDuNBI1roFbfs7FoGzUJVyOseWvvY rguLffFWe0Bw88W1NVMVrOC5Z+MlI+ppwN4RFKcBCZsKYFwqnkhUgINMIWvWMllK21xNZ6r7YFU MJaVJae9ruYUPn1n7xE0fAHyOYCOA5FPyJ2rpz04EZNOaSMnwveXprDrImYkwZ0LMPm+eOp1DwG FhMYPD9oU+F6viekkvQ== X-Authority-Analysis: v=2.4 cv=GtFPO01C c=1 sm=1 tr=0 ts=69b0080c cx=c_pps a=UbhLPJ621ZpgOD2l3yZY1w==:117 a=xqWC_Br6kY4A:10 a=Yq5XynenixoA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=YMgV9FUhrdKAYTUUvYB2:22 a=20KFwNOVAAAA:8 a=EUspDBNiAAAA:8 a=VvikW9XIONpR7lsxkDIA:9 a=TOPH6uDL9cOC6tEoww4z:22 X-Proofpoint-ORIG-GUID: 66G8xABiWie1LNNBnPyj6IjKjo6AE9-9 X-Proofpoint-GUID: 66G8xABiWie1LNNBnPyj6IjKjo6AE9-9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293, Aquarius:18.0.1143, Hydra:6.1.51, FMLib:17.12.100.49 definitions=2026-03-10_02,2026-03-09_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 phishscore=0 malwarescore=0 clxscore=1015 lowpriorityscore=0 spamscore=0 suspectscore=0 priorityscore=1501 impostorscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2602130000 definitions=main-2603100104 X-BeenThere: libcamera-devel@lists.libcamera.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libcamera-devel-bounces@lists.libcamera.org Sender: "libcamera-devel" Make the storage used to accumulate the RGB sums and the Y histogram value a vector of SwIspStats objects instead of a single object so that when using multi-threading every thread can use its own storage to collect intermediate stats to avoid cache-line bouncing. Benchmarking with the GPU-ISP which does separate swstats benchmarking, on the Arduino Uno-Q which has a weak CPU which is good for performance testing, shows 20ms to generate stats for a 3272x2464 frame both before and after this change. Reviewed-by: Milan Zamazal Signed-off-by: Hans de Goede --- Changes in v4: - Use const in for (const auto &s : stats_) {} in SwStatsCpu::finishFrame() - Add Milan's Reviewed-by Changes in v3: - Use for (auto &s : stats_) {} Changes in v2: - Move the allocation of the vector of SwIspStats objects to inside the SwStatsCpu class, controlled by a configure() arguments instead of making the caller allocate the objects --- .../internal/software_isp/swstats_cpu.h | 25 ++++----- src/libcamera/software_isp/swstats_cpu.cpp | 54 +++++++++++++------ 2 files changed, 50 insertions(+), 29 deletions(-) diff --git a/include/libcamera/internal/software_isp/swstats_cpu.h b/include/libcamera/internal/software_isp/swstats_cpu.h index 64b3e23f5..feee92f99 100644 --- a/include/libcamera/internal/software_isp/swstats_cpu.h +++ b/include/libcamera/internal/software_isp/swstats_cpu.h @@ -12,6 +12,7 @@ #pragma once #include +#include #include @@ -51,13 +52,13 @@ public: const Size &patternSize() { return patternSize_; } - int configure(const StreamConfiguration &inputCfg); + int configure(const StreamConfiguration &inputCfg, unsigned int statsBufferCount = 1); void setWindow(const Rectangle &window); void startFrame(uint32_t frame); void finishFrame(uint32_t frame, uint32_t bufferId); void processFrame(uint32_t frame, uint32_t bufferId, FrameBuffer *input); - void processLine0(uint32_t frame, unsigned int y, const uint8_t *src[]) + void processLine0(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0) { if (frame % kStatPerNumFrames) return; @@ -66,10 +67,10 @@ public: y >= (window_.y + window_.height)) return; - (this->*stats0_)(src); + (this->*stats0_)(src, stats_[statsBufferIndex]); } - void processLine2(uint32_t frame, unsigned int y, const uint8_t *src[]) + void processLine2(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0) { if (frame % kStatPerNumFrames) return; @@ -78,25 +79,25 @@ public: y >= (window_.y + window_.height)) return; - (this->*stats2_)(src); + (this->*stats2_)(src, stats_[statsBufferIndex]); } Signal statsReady; private: - using statsProcessFn = void (SwStatsCpu::*)(const uint8_t *src[]); + using statsProcessFn = void (SwStatsCpu::*)(const uint8_t *src[], SwIspStats &stats); using processFrameFn = void (SwStatsCpu::*)(MappedFrameBuffer &in); int setupStandardBayerOrder(BayerFormat::Order order); /* Bayer 8 bpp unpacked */ - void statsBGGR8Line0(const uint8_t *src[]); + void statsBGGR8Line0(const uint8_t *src[], SwIspStats &stats); /* Bayer 10 bpp unpacked */ - void statsBGGR10Line0(const uint8_t *src[]); + void statsBGGR10Line0(const uint8_t *src[], SwIspStats &stats); /* Bayer 12 bpp unpacked */ - void statsBGGR12Line0(const uint8_t *src[]); + void statsBGGR12Line0(const uint8_t *src[], SwIspStats &stats); /* Bayer 10 bpp packed */ - void statsBGGR10PLine0(const uint8_t *src[]); - void statsGBRG10PLine0(const uint8_t *src[]); + void statsBGGR10PLine0(const uint8_t *src[], SwIspStats &stats); + void statsGBRG10PLine0(const uint8_t *src[], SwIspStats &stats); void processBayerFrame2(MappedFrameBuffer &in); @@ -116,8 +117,8 @@ private: unsigned int xShift_; unsigned int stride_; + std::vector stats_; SharedMemObject sharedStats_; - SwIspStats stats_; Benchmark bench_; }; diff --git a/src/libcamera/software_isp/swstats_cpu.cpp b/src/libcamera/software_isp/swstats_cpu.cpp index 1cedcfbc1..ded0dcf1a 100644 --- a/src/libcamera/software_isp/swstats_cpu.cpp +++ b/src/libcamera/software_isp/swstats_cpu.cpp @@ -74,11 +74,12 @@ namespace libcamera { */ /** - * \fn void SwStatsCpu::processLine0(uint32_t frame, unsigned int y, const uint8_t *src[]) + * \fn void SwStatsCpu::processLine0(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0) * \brief Process line 0 * \param[in] frame The frame number * \param[in] y The y coordinate. * \param[in] src The input data. + * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading. * * This function processes line 0 for input formats with * patternSize height == 1. @@ -97,14 +98,18 @@ namespace libcamera { * to the line in plane 0, etc. * * For non Bayer single plane input data only a single src pointer is required. + * + * The statsBufferIndex value must be less than the statsBufferCount value passed + * to configure(). */ /** - * \fn void SwStatsCpu::processLine2(uint32_t frame, unsigned int y, const uint8_t *src[]) + * \fn void SwStatsCpu::processLine2(uint32_t frame, unsigned int y, const uint8_t *src[], unsigned int statsBufferIndex = 0) * \brief Process line 2 and 3 * \param[in] frame The frame number * \param[in] y The y coordinate. * \param[in] src The input data. + * \param[in] statsBufferIndex Index of stats buffer to use for multi-threading. * * This function processes line 2 and 3 for input formats with * patternSize height == 4. @@ -182,14 +187,14 @@ static constexpr unsigned int kBlueYMul = 29; /* 0.114 * 256 */ yVal = r * kRedYMul; \ yVal += g * kGreenYMul; \ yVal += b * kBlueYMul; \ - stats_.yHistogram[yVal * SwIspStats::kYHistogramSize / (256 * 256 * (div))]++; + stats.yHistogram[yVal * SwIspStats::kYHistogramSize / (256 * 256 * (div))]++; #define SWSTATS_FINISH_LINE_STATS() \ - stats_.sum_.r() += sumR; \ - stats_.sum_.g() += sumG; \ - stats_.sum_.b() += sumB; + stats.sum_.r() += sumR; \ + stats.sum_.g() += sumG; \ + stats.sum_.b() += sumB; -void SwStatsCpu::statsBGGR8Line0(const uint8_t *src[]) +void SwStatsCpu::statsBGGR8Line0(const uint8_t *src[], SwIspStats &stats) { const uint8_t *src0 = src[1] + window_.x; const uint8_t *src1 = src[2] + window_.x; @@ -214,7 +219,7 @@ void SwStatsCpu::statsBGGR8Line0(const uint8_t *src[]) SWSTATS_FINISH_LINE_STATS() } -void SwStatsCpu::statsBGGR10Line0(const uint8_t *src[]) +void SwStatsCpu::statsBGGR10Line0(const uint8_t *src[], SwIspStats &stats) { const uint16_t *src0 = (const uint16_t *)src[1] + window_.x; const uint16_t *src1 = (const uint16_t *)src[2] + window_.x; @@ -240,7 +245,7 @@ void SwStatsCpu::statsBGGR10Line0(const uint8_t *src[]) SWSTATS_FINISH_LINE_STATS() } -void SwStatsCpu::statsBGGR12Line0(const uint8_t *src[]) +void SwStatsCpu::statsBGGR12Line0(const uint8_t *src[], SwIspStats &stats) { const uint16_t *src0 = (const uint16_t *)src[1] + window_.x; const uint16_t *src1 = (const uint16_t *)src[2] + window_.x; @@ -266,7 +271,7 @@ void SwStatsCpu::statsBGGR12Line0(const uint8_t *src[]) SWSTATS_FINISH_LINE_STATS() } -void SwStatsCpu::statsBGGR10PLine0(const uint8_t *src[]) +void SwStatsCpu::statsBGGR10PLine0(const uint8_t *src[], SwIspStats &stats) { const uint8_t *src0 = src[1] + window_.x * 5 / 4; const uint8_t *src1 = src[2] + window_.x * 5 / 4; @@ -292,7 +297,7 @@ void SwStatsCpu::statsBGGR10PLine0(const uint8_t *src[]) SWSTATS_FINISH_LINE_STATS() } -void SwStatsCpu::statsGBRG10PLine0(const uint8_t *src[]) +void SwStatsCpu::statsGBRG10PLine0(const uint8_t *src[], SwIspStats &stats) { const uint8_t *src0 = src[1] + window_.x * 5 / 4; const uint8_t *src1 = src[2] + window_.x * 5 / 4; @@ -332,8 +337,10 @@ void SwStatsCpu::startFrame(uint32_t frame) if (window_.width == 0) LOG(SwStatsCpu, Error) << "Calling startFrame() without setWindow()"; - stats_.sum_ = RGB({ 0, 0, 0 }); - stats_.yHistogram.fill(0); + for (auto &s : stats_) { + s.sum_ = RGB({ 0, 0, 0 }); + s.yHistogram.fill(0); + } } /** @@ -345,8 +352,19 @@ void SwStatsCpu::startFrame(uint32_t frame) */ void SwStatsCpu::finishFrame(uint32_t frame, uint32_t bufferId) { - stats_.valid = frame % kStatPerNumFrames == 0; - *sharedStats_ = stats_; + bool valid = frame % kStatPerNumFrames == 0; + + if (valid) { + sharedStats_->sum_ = RGB({ 0, 0, 0 }); + sharedStats_->yHistogram.fill(0); + for (const auto &s : stats_) { + sharedStats_->sum_ += s.sum_; + for (unsigned int j = 0; j < SwIspStats::kYHistogramSize; j++) + sharedStats_->yHistogram[j] += s.yHistogram[j]; + } + } + + sharedStats_->valid = valid; statsReady.emit(frame, bufferId); } @@ -389,12 +407,14 @@ int SwStatsCpu::setupStandardBayerOrder(BayerFormat::Order order) /** * \brief Configure the statistics object for the passed in input format * \param[in] inputCfg The input format + * \param[in] statsBufferCount number of internal stats buffers to use for multi-threading * * \return 0 on success, a negative errno value on failure */ -int SwStatsCpu::configure(const StreamConfiguration &inputCfg) +int SwStatsCpu::configure(const StreamConfiguration &inputCfg, unsigned int statsBufferCount) { stride_ = inputCfg.stride; + stats_.resize(statsBufferCount); BayerFormat bayerFormat = BayerFormat::fromPixelFormat(inputCfg.pixelFormat); @@ -504,7 +524,7 @@ void SwStatsCpu::processBayerFrame2(MappedFrameBuffer &in) /* linePointers[0] is not used by any stats0_ functions */ linePointers[1] = src; linePointers[2] = src + stride_; - (this->*stats0_)(linePointers); + (this->*stats0_)(linePointers, stats_[0]); src += stride_ * 2; } }