From patchwork Sun Dec 15 23:01:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Pinchart X-Patchwork-Id: 22321 Return-Path: X-Original-To: parsemail@patchwork.libcamera.org Delivered-To: parsemail@patchwork.libcamera.org Received: from lancelot.ideasonboard.com (lancelot.ideasonboard.com [92.243.16.209]) by patchwork.libcamera.org (Postfix) with ESMTPS id DE499C32F6 for ; Sun, 15 Dec 2024 23:02:29 +0000 (UTC) Received: from lancelot.ideasonboard.com (localhost [IPv6:::1]) by lancelot.ideasonboard.com (Postfix) with ESMTP id 8689467F33; Mon, 16 Dec 2024 00:02:27 +0100 (CET) Authentication-Results: lancelot.ideasonboard.com; dkim=pass (1024-bit key; unprotected) header.d=ideasonboard.com header.i=@ideasonboard.com header.b="AQn5DpH8"; dkim-atps=neutral Received: from perceval.ideasonboard.com (perceval.ideasonboard.com [213.167.242.64]) by lancelot.ideasonboard.com (Postfix) with ESMTPS id 2184F67F1D for ; Mon, 16 Dec 2024 00:02:24 +0100 (CET) Received: from pendragon.ideasonboard.com (81-175-209-231.bb.dnainternet.fi [81.175.209.231]) by perceval.ideasonboard.com (Postfix) with ESMTPSA id D8E502C6; Mon, 16 Dec 2024 00:01:47 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com; s=mail; t=1734303708; bh=tl24tobVxMQFnd3EKwnlIOCXTZiAunsUHKJ9v+NMni8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AQn5DpH8NF8pwcBoEeo4b6gE843BxbQrNOKRvjLBE6F/YFREVT/mKL/SqUk3E+XIw /P+62v27JG0QkZoIxk2rIytmx2GsjXz9fwexG5Uatvv4XryrxefyPCAD5ah3DmchXN rmMcqV2iGoPBjS9JfSjuQYvitdsDqh95TYBYQbJU= From: Laurent Pinchart To: libcamera-devel@lists.libcamera.org Cc: =?utf-8?q?Barnab=C3=A1s_P=C5=91cze?= Subject: [RFC PATCH 1/8] Documentation: coding-style: Document usage of classes for strings Date: Mon, 16 Dec 2024 01:01:59 +0200 Message-ID: <20241215230206.11002-2-laurent.pinchart@ideasonboard.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241215230206.11002-1-laurent.pinchart@ideasonboard.com> References: <20241215230206.11002-1-laurent.pinchart@ideasonboard.com> MIME-Version: 1.0 X-BeenThere: libcamera-devel@lists.libcamera.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libcamera-devel-bounces@lists.libcamera.org Sender: "libcamera-devel" C++ has three main ways to handle strings: std::string, std::string_view and C-style strings. Document their pros and cons and the rules governing their usage in libcamera. Signed-off-by: Laurent Pinchart --- Documentation/coding-style.rst | 98 ++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) diff --git a/Documentation/coding-style.rst b/Documentation/coding-style.rst index 6ac3a4a0d5175118..dc4b093b58a99006 100644 --- a/Documentation/coding-style.rst +++ b/Documentation/coding-style.rst @@ -269,6 +269,104 @@ the compiler select the right function. This avoids potential errors such as calling abs(int) with a float argument, performing an unwanted implicit integer conversion. For this reason, cmath is preferred over math.h. +Strings +~~~~~~~ + +This section focusses on strings as a sequence of characters represented by the +`char` type, as those are the only strings used in libcamera. The principles +are however equally applicable to other types of strings. + +C++ includes multiple standard ways to represent and handle strings. Each of +them have pros and cons and different usage patterns. + +1. C-style strings + + C-style strings are null-terminated arrays of characters of type `char`. + They are represented by a `char *` pointing to the first character. C string + literals (`"Hello world!"`) produce read-only global null-terminated arrays + of type `const char`. + + Handling strings through `char *` relies on assumptions that are not + enforced at compile time: the memory must not be freed as long as the + pointer remains valid, and the string must be null-terminated. This causes a + risk of use-after-free or buffer out-of-bounds reads. Furthermore, as the + size of the underlying memory is not bundled with the pointer, handling + writable strings as a `char *` risks buffer out-of-bounds writes. + +2. `std::string` class + + The `std::string` class is the main C++ data type to represent strings. The + class holds the string data as a null-terminated array of characters, as + well as the length of the array. `std::string` literals (`"Hello world!"s`) + converts the character array literal into a temporary `std::string` instance + whose lifetime ends with the statement. + + Usage of `std::string` makes string handling safer and convenient thanks to + the integration with the rest of the C++ standard library. This comes at a + cost, as making copies of the class copies the underlying data, requiring + dynamic allocation of memory. This is partially offset by usage of move + constructors or assignment operators. + +3. `std::string_view` class + + This newer addition to the C++ standard library is a non-owning reference to + a characters array. Like the `std::string` class, the `std::string_view` + stores a pointer to the array and a length, but unlike C-style strings and + the `std::string` class, the array is not guaranteed to be null-terminated. + `std::string_view` literals (`"Hello world!"sv`) create a temporary + `std::string_view` instance whose lifetime ends with the statement, but the + underlying character array has global static storage and lifetime. + + String views are useful to represent in a single way strings with + heterogenous underlying storage, as they can be constructed from a `char *` + or a `std::string`. They reduce the risk of buffer out-of-bounds accesses as + they carry the array length, and they speed up handling of substrings as + they don't cause copies, unlike `std::string`. As the string views store a + borrowed reference to the underlying storage, they are prone to + use-after-free issues. The lack of a null terminator furthermore makes them + unsafe to handle strings that need to be passed to C functions that assume + null-terminated strings (such as most of the C standard library functions). + + C++17 lacks many functions that make `std::string_view` usage convenient. + This includes `operator+()` to concatenate a `std::string` and a + `std::string_view`, as well as heterogenous lookup functions for containers + (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2363r5.html). + libcamera works around the former by defining the missing operators, and the + latter by using the `find()` function instead of `at()` or `operator[]()`. + Those issues are properly addressed in C++26. + +With those pros, cons are caveats in mind, libcamera follows a set of rules +that governs selection of appropriate string types for function arguments: + +* If the function is internal to a compilation unit, and all callers use the + same string type for the function argument, use that type (by reference or + value, as appropriate). +* If the function needs to modify the string, make a copy, or convert it to a + `std::string` for any other reason, pass a `std::string` by value. Use + `std::move()` in the caller if the string is not needed after the function + call, as well as inside the function to move the string to a local copy if + needed. This minimizes the number of copies in all use cases. +* If the function needs to pass the string to a C function that takes a + `const char *`, and the caller may reasonably use a C string literal, pass a + `const char *`. This avoids copies when the caller uses a `char *` pointer or + a C string literal, and avoids buffer out-of-bound reads that could occur with + a non null-terminated `std::string_view`. +* If the function only needs to pass the string to other functions that take a + `const std::string &` reference, pass a `const std::string &`. This minimizes + the construction of `std::string` instances when the caller already holds an + instance. +* If the function only reads from the string, doesn't rely on it being + null-terminated, and can reasonably be called with either C strings, C string + literals, or `std::string` instances, pass a `std::string_view` instance. This + improves safety of the code without impacting performance. +* Do not use `std::string_view` in public API function parameters. Previous + rules showed that selection of the optimal string type for function + parameters depends on the internal implementation of the function, which + could change over time. This conflicts with the stability requirements of the + libcamera public API. Furthermore, as C++17 makes `std::string_view` usage + inconvenient in many cases, we don't want to force that inconvenience upon + libcamera users. + Documentation -------------