From patchwork Tue Dec 10 15:34:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Milan Zamazal X-Patchwork-Id: 22284 Return-Path: X-Original-To: parsemail@patchwork.libcamera.org Delivered-To: parsemail@patchwork.libcamera.org Received: from lancelot.ideasonboard.com (lancelot.ideasonboard.com [92.243.16.209]) by patchwork.libcamera.org (Postfix) with ESMTPS id B42F0C3257 for ; Tue, 10 Dec 2024 15:35:24 +0000 (UTC) Received: from lancelot.ideasonboard.com (localhost [IPv6:::1]) by lancelot.ideasonboard.com (Postfix) with ESMTP id 52D1467EAD; Tue, 10 Dec 2024 16:35:24 +0100 (CET) Authentication-Results: lancelot.ideasonboard.com; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="MQ2QGgRR"; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lancelot.ideasonboard.com (Postfix) with ESMTPS id DBC7C67EA7 for ; Tue, 10 Dec 2024 16:35:20 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733844920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m3U/XD+2rtUCm3T4Y0gxuvTPbLOYpvHFAdvethaoPAI=; b=MQ2QGgRRu+FJ78zdLZHgTLZyjfFPWZDKtsUKqbN7pFbk2XSsxypx9/SK3H8P777KB9lHsR 5okRZbTdjYZffX3gA+6Se77iRy/CfwqeKmUbqpFmJnGet/FYem2VZNJ7XA64jyOzJgdZM9 73AlX4vhXIpx0WIKqdZasS/KwZjVtBM= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-8-ZG24PpS1M1StXhgUgUbDGQ-1; Tue, 10 Dec 2024 10:35:18 -0500 X-MC-Unique: ZG24PpS1M1StXhgUgUbDGQ-1 X-Mimecast-MFC-AGG-ID: ZG24PpS1M1StXhgUgUbDGQ Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8D0F41955F44; Tue, 10 Dec 2024 15:35:17 +0000 (UTC) Received: from nuthatch.redhat.com (unknown [10.45.224.8]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2FB2F1956053; Tue, 10 Dec 2024 15:35:14 +0000 (UTC) From: Milan Zamazal To: libcamera-devel@lists.libcamera.org Cc: Milan Zamazal , Robert Mader , Hans de Goede , Laurent Pinchart , Kieran Bingham Subject: [PATCH v3 9/9] libcamera: software_isp: Apply CCM in debayering Date: Tue, 10 Dec 2024 16:34:39 +0100 Message-ID: <20241210153440.1007470-10-mzamazal@redhat.com> In-Reply-To: <20241210153440.1007470-1-mzamazal@redhat.com> References: <20241210153440.1007470-1-mzamazal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: mKDwDIjKvTKn8NN8bv3CC96t_OqyAAeTqOno-085s-0_1733844917 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: libcamera-devel@lists.libcamera.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libcamera-devel-bounces@lists.libcamera.org Sender: "libcamera-devel" This patch applies color correction matrix (CCM) in debayering if the CCM is specified. Not using CCM must still be supported for performance reasons. The CCM is applied as follows: [r1 r2 r3] [r g b] * [g1 g2 g3] [b1 b2 b3] The CCM matrix (the right side of the multiplication) is constant during single frame processing, while the input pixel (the left side) changes. Because each of the color channels is only 8-bit in software ISP, we can make 9 lookup tables with 256 input values for multiplications of each of the r_i, g_i, b_i values. This way we don't have to multiply each pixel, we can use table lookups and additions instead. Gamma (which is non-linear and thus cannot be a part of the 9 lookup tables values) is applied on the final values rounded to integers using another lookup table. We use int16_t to store the precomputed multiplications. This seems to be noticeably (>10%) faster than `float' for the price of slightly less accuracy and it covers the range of values that sane CCMs produce. The selection and structure of data is performance critical, for example using bytes would add significant (>10%) speedup but would be too short to cover the value range. The color lookup tables are changed to unions to be able to serve as both simple lookup tables or CCM lookup tables. This is arguable but it keeps the code easier to understand if nothing else. It is important to make the unions on the whole lookup tables rather than their values because the latter might cause a noticeable slowdown on simple lookup due to the sheer fact that the table is not compact. Using unions makes the initialization of the tables dubious because it is done before the decision about using CCM is made but the initial values are dubious anyway. Using std::variant instead of unions would be even more difficult, consider e.g. mmap allocation and the followup type-casting of the debayer params. The tables are copied (as before), which is not elegant but also not a big problem. There are patches posted that use shared buffers for parameters passing in software ISP (see software ISP TODO #5) and they can be adjusted for the new parameter format. Color gains from white balance are supposed not to be a part of the specified CCM. They are applied on it using matrix multiplication, which is simple and in correspondence with future additions in the form of matrix multiplication, like saturation adjustment. With this patch, the reported per-frame slowdown when applying CCM is about 45% on Debix Model A and about 75% on TI AM69 SK. Using std::clamp in debayering adds some performance penalty (a few percent). The clamping is necessary to eliminate out of range values possibly produced by the CCM. If it could be avoided by adjusting the precomputed tables some way then performance could be improved a bit. Signed-off-by: Milan Zamazal --- .../internal/software_isp/debayer_params.h | 20 +++++- src/ipa/simple/algorithms/lut.cpp | 63 ++++++++++++++----- src/ipa/simple/algorithms/lut.h | 1 + src/libcamera/software_isp/debayer.cpp | 50 +++++++++++++++ src/libcamera/software_isp/debayer_cpu.cpp | 34 +++++++--- src/libcamera/software_isp/debayer_cpu.h | 7 ++- src/libcamera/software_isp/software_isp.cpp | 6 +- 7 files changed, 147 insertions(+), 34 deletions(-) diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h index 7d8fdd48..f8731eae 100644 --- a/include/libcamera/internal/software_isp/debayer_params.h +++ b/include/libcamera/internal/software_isp/debayer_params.h @@ -18,11 +18,25 @@ namespace libcamera { struct DebayerParams { static constexpr unsigned int kRGBLookupSize = 256; + struct CcmColumn { + int16_t r; + int16_t g; + int16_t b; + }; + using ColorLookupTable = std::array; + using CcmLookupTable = std::array; + using GammaLookupTable = std::array; + + union LookupTable { + ColorLookupTable simple; + CcmLookupTable ccm; + }; - ColorLookupTable red; - ColorLookupTable green; - ColorLookupTable blue; + LookupTable red; + LookupTable green; + LookupTable blue; + GammaLookupTable gammaLut; }; } /* namespace libcamera */ diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp index 243f0818..ddb2adf1 100644 --- a/src/ipa/simple/algorithms/lut.cpp +++ b/src/ipa/simple/algorithms/lut.cpp @@ -80,6 +80,11 @@ void Lut::updateGammaTable(IPAContext &context) context.activeState.gamma.contrast = contrast; } +int16_t Lut::ccmValue(unsigned int i, float ccm) const +{ + return std::round(i * ccm); +} + void Lut::prepare(IPAContext &context, [[maybe_unused]] const uint32_t frame, [[maybe_unused]] IPAFrameContext &frameContext, @@ -91,28 +96,52 @@ void Lut::prepare(IPAContext &context, * observed, it's not permanently prone to minor fluctuations or * rounding errors. */ - if (context.activeState.gamma.blackLevel != context.activeState.blc.level || - context.activeState.gamma.contrast != context.activeState.knobs.contrast) + const bool gammaUpdateNeeded = + context.activeState.gamma.blackLevel != context.activeState.blc.level || + context.activeState.gamma.contrast != context.activeState.knobs.contrast; + if (gammaUpdateNeeded) updateGammaTable(context); auto &gains = context.activeState.awb.gains; auto &gammaTable = context.activeState.gamma.gammaTable; const unsigned int gammaTableSize = gammaTable.size(); - - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { - const double div = static_cast(DebayerParams::kRGBLookupSize) / - gammaTableSize; - /* Apply gamma after gain! */ - unsigned int idx; - idx = std::min({ static_cast(i * gains.red / div), - gammaTableSize - 1 }); - params->red[i] = gammaTable[idx]; - idx = std::min({ static_cast(i * gains.green / div), - gammaTableSize - 1 }); - params->green[i] = gammaTable[idx]; - idx = std::min({ static_cast(i * gains.blue / div), - gammaTableSize - 1 }); - params->blue[i] = gammaTable[idx]; + const double div = static_cast(DebayerParams::kRGBLookupSize) / + gammaTableSize; + + if (!context.activeState.ccm.enabled) { + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { + /* Apply gamma after gain! */ + unsigned int idx; + idx = std::min({ static_cast(i * gains.red / div), + gammaTableSize - 1 }); + params->red.simple[i] = gammaTable[idx]; + idx = std::min({ static_cast(i * gains.green / div), + gammaTableSize - 1 }); + params->green.simple[i] = gammaTable[idx]; + idx = std::min({ static_cast(i * gains.blue / div), + gammaTableSize - 1 }); + params->blue.simple[i] = gammaTable[idx]; + } + } else if (context.activeState.ccm.changed || gammaUpdateNeeded) { + Matrix gainCcm = { { gains.red, 0, 0, + 0, gains.green, 0, + 0, 0, gains.blue } }; + auto ccm = gainCcm * context.activeState.ccm.ccm; + auto &red = params->red.ccm; + auto &green = params->green.ccm; + auto &blue = params->blue.ccm; + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { + red[i].r = ccmValue(i, ccm[0][0]); + red[i].g = ccmValue(i, ccm[1][0]); + red[i].b = ccmValue(i, ccm[2][0]); + green[i].r = ccmValue(i, ccm[0][1]); + green[i].g = ccmValue(i, ccm[1][1]); + green[i].b = ccmValue(i, ccm[2][1]); + blue[i].r = ccmValue(i, ccm[0][2]); + blue[i].g = ccmValue(i, ccm[1][2]); + blue[i].b = ccmValue(i, ccm[2][2]); + params->gammaLut[i] = gammaTable[i / div]; + } } } diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h index 889f864b..77324800 100644 --- a/src/ipa/simple/algorithms/lut.h +++ b/src/ipa/simple/algorithms/lut.h @@ -33,6 +33,7 @@ public: private: void updateGammaTable(IPAContext &context); + int16_t ccmValue(unsigned int i, float ccm) const; }; } /* namespace ipa::soft::algorithms */ diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp index 595f73f6..47f9996b 100644 --- a/src/libcamera/software_isp/debayer.cpp +++ b/src/libcamera/software_isp/debayer.cpp @@ -23,11 +23,56 @@ namespace libcamera { * \brief Size of a color lookup table */ +/** + * \struct DebayerParams::CcmColumn + * \brief Type of a single column of a color correction matrix + */ + +/** + * \var DebayerParams::CcmColumn::r + * \brief Red (first) component of a CCM column + */ + +/** + * \var DebayerParams::CcmColumn::g + * \brief Green (second) component of a CCM column + */ + +/** + * \var DebayerParams::CcmColumn::b + * \brief Blue (third) component of a CCM column + */ + /** * \typedef DebayerParams::ColorLookupTable + * \brief Type of the simple lookup tables for red, green, blue values + */ + +/** + * \typedef DebayerParams::CcmLookupTable + * \brief Type of the CCM lookup tables for red, green, blue values + */ + +/** + * \typedef DebayerParams::GammaLookupTable + * \brief Type of the gamma lookup tables for CCM + */ + +/** + * \union DebayerParams::LookupTable * \brief Type of the lookup tables for red, green, blue values */ +/** + * \var DebayerParams::LookupTable::simple + * \brief Simple lookup table for red, green, blue values + */ + +/** + * \var DebayerParams::LookupTable::ccm + * \brief CCM lookup table for red, green, blue values + */ + /** * \var DebayerParams::red * \brief Lookup table for red color, mapping input values to output values @@ -43,6 +88,11 @@ namespace libcamera { * \brief Lookup table for blue color, mapping input values to output values */ +/** + * \var DebayerParams::gammaLut + * \brief Gamma lookup table used with color correction matrix + */ + /** * \class Debayer * \brief Base debayering class diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp index 3c6597f9..38715b72 100644 --- a/src/libcamera/software_isp/debayer_cpu.cpp +++ b/src/libcamera/software_isp/debayer_cpu.cpp @@ -11,6 +11,7 @@ #include "debayer_cpu.h" +#include #include #include #include @@ -51,8 +52,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr stats) enableInputMemcpy_ = true; /* Initialize color lookup tables */ - for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) - red_[i] = green_[i] = blue_[i] = i; + for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { + red_.simple[i] = green_.simple[i] = blue_.simple[i] = i; + red_.ccm[i].r = red_.ccm[i].g = red_.ccm[i].b = 0; + green_.ccm[i].r = green_.ccm[i].g = green_.ccm[i].b = 0; + blue_.ccm[i].r = blue_.ccm[i].g = blue_.ccm[i].b = 0; + } } DebayerCpu::~DebayerCpu() = default; @@ -62,12 +67,24 @@ DebayerCpu::~DebayerCpu() = default; const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \ const pixel_t *next = (const pixel_t *)src[2] + xShift_; -#define STORE_PIXEL(b, g, r) \ - *dst++ = blue_[b]; \ - *dst++ = green_[g]; \ - *dst++ = red_[r]; \ - if constexpr (addAlphaByte) \ - *dst++ = 255; \ +#define GAMMA(value) \ + *dst++ = gammaLut_[std::clamp(value, 0, static_cast(gammaLut_.size()) - 1)] + +#define STORE_PIXEL(b_, g_, r_) \ + if constexpr (ccmEnabled) { \ + DebayerParams::CcmColumn &blue = blue_.ccm[b_]; \ + DebayerParams::CcmColumn &green = green_.ccm[g_]; \ + DebayerParams::CcmColumn &red = red_.ccm[r_]; \ + GAMMA(blue.r + blue.g + blue.b); \ + GAMMA(green.r + green.g + green.b); \ + GAMMA(red.r + red.g + red.b); \ + } else { \ + *dst++ = blue_.simple[b_]; \ + *dst++ = green_.simple[g_]; \ + *dst++ = red_.simple[r_]; \ + } \ + if constexpr (addAlphaByte) \ + *dst++ = 255; \ x++; /* @@ -752,6 +769,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output green_ = params.green; red_ = swapRedBlueGains_ ? params.blue : params.red; blue_ = swapRedBlueGains_ ? params.red : params.blue; + gammaLut_ = params.gammaLut; /* Copy metadata from the input buffer */ FrameMetadata &metadata = output->_d()->metadata(); diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h index b2ec8f1b..35c51e4f 100644 --- a/src/libcamera/software_isp/debayer_cpu.h +++ b/src/libcamera/software_isp/debayer_cpu.h @@ -137,9 +137,10 @@ private: /* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */ static constexpr unsigned int kMaxLineBuffers = 5; - DebayerParams::ColorLookupTable red_; - DebayerParams::ColorLookupTable green_; - DebayerParams::ColorLookupTable blue_; + DebayerParams::LookupTable red_; + DebayerParams::LookupTable green_; + DebayerParams::LookupTable blue_; + DebayerParams::GammaLookupTable gammaLut_; debayerFn debayer0_; debayerFn debayer1_; debayerFn debayer2_; diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp index 6a07de85..40e11b9e 100644 --- a/src/libcamera/software_isp/software_isp.cpp +++ b/src/libcamera/software_isp/software_isp.cpp @@ -82,9 +82,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor, for (unsigned int i = 0; i < 256; i++) gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5); for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) { - debayerParams_.red[i] = gammaTable[i]; - debayerParams_.green[i] = gammaTable[i]; - debayerParams_.blue[i] = gammaTable[i]; + debayerParams_.red.simple[i] = gammaTable[i]; + debayerParams_.green.simple[i] = gammaTable[i]; + debayerParams_.blue.simple[i] = gammaTable[i]; } if (!dmaHeap_.isValid()) {