[{"id":14660,"web_url":"https://patchwork.libcamera.org/comment/14660/","msgid":"<YAnXm/Kakvwmg5S3@pendragon.ideasonboard.com>","date":"2021-01-21T19:35:55","subject":"Re: [libcamera-devel] [PATCH v2 1/9] android: jpeg: exif: Expand\n\tsetString to support different encodings","submitter":{"id":2,"url":"https://patchwork.libcamera.org/api/people/2/","name":"Laurent Pinchart","email":"laurent.pinchart@ideasonboard.com"},"content":"Hi Paul,\n\nThank you for the patch.\n\nOn Thu, Jan 21, 2021 at 07:15:41PM +0900, Paul Elder wrote:\n> GPSProcessingMethod and UserComment in EXIF tags can be in UTF-16.\n> Expand setString to take an encoding when the field type is undefined.\n> Update callers accordingly.\n> \n> Signed-off-by: Paul Elder <paul.elder@ideasonboard.com>\n> \n> ---\n> Changes in v2:\n> - moved from utils into exif\n> - support no-encoding\n> ---\n>  src/android/jpeg/exif.cpp | 107 +++++++++++++++++++++++++++++++-------\n>  src/android/jpeg/exif.h   |  12 ++++-\n>  2 files changed, 99 insertions(+), 20 deletions(-)\n> \n> diff --git a/src/android/jpeg/exif.cpp b/src/android/jpeg/exif.cpp\n> index 33b3fa7f..cff366a4 100644\n> --- a/src/android/jpeg/exif.cpp\n> +++ b/src/android/jpeg/exif.cpp\n> @@ -7,6 +7,9 @@\n>  \n>  #include \"exif.h\"\n>  \n> +#include <map>\n> +#include <uchar.h>\n> +\n>  #include \"libcamera/internal/log.h\"\n>  #include \"libcamera/internal/utils.h\"\n>  \n> @@ -64,7 +67,7 @@ Exif::Exif()\n>  \texif_data_set_byte_order(data_, order_);\n>  \n>  \tsetString(EXIF_IFD_EXIF, EXIF_TAG_EXIF_VERSION,\n> -\t\t  EXIF_FORMAT_UNDEFINED, \"0231\");\n> +\t\t  EXIF_FORMAT_UNDEFINED, NoEncoding, \"0231\");\n>  \n>  \t/* Create the mandatory EXIF fields with default data. */\n>  \texif_data_fix(data_);\n> @@ -178,10 +181,18 @@ void Exif::setRational(ExifIfd ifd, ExifTag tag, ExifRational item)\n>  \texif_entry_unref(entry);\n>  }\n>  \n> -void Exif::setString(ExifIfd ifd, ExifTag tag, ExifFormat format, const std::string &item)\n> +static const std::map<Exif::StringEncoding, std::vector<uint8_t>> StringEncodingCodes = {\n\nstatic const std::map<Exif::StringEncoding, std::array<uint8_t, 8>> StringEncodingCodes\n\nwould be more efficient (if it doesn't cause too many issues in the code\nbelow).\n\nAnd s/StringEncodingCodes/stringEncodingCodes/\n\n> +\t{ Exif::ASCII,     { 0x41, 0x53, 0x43, 0x49, 0x49, 0x00, 0x00, 0x00 } },\n> +\t{ Exif::JIS,       { 0x4a, 0x49, 0x53, 0x00, 0x00, 0x00, 0x00, 0x00 } },\n> +\t{ Exif::Unicode,   { 0x55, 0x4e, 0x49, 0x43, 0x4f, 0x44, 0x45, 0x00 } },\n> +\t{ Exif::Undefined, { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } },\n\nShould we drop JIS and Undefined ? It would be different if we were\nparsing EXIF data, for as an EXIF generator, I doubt we'll ever generate\na string in any of those encodings.\n\n> +};\n> +\n> +void Exif::setString(ExifIfd ifd, ExifTag tag, ExifFormat format,\n> +\t\t     StringEncoding enc, const std::string &item)\n\ns/enc/encoding/ ?\n\n>  {\n>  \tstd::string ascii;\n> -\tsize_t length;\n> +\tsize_t length = 0;\n>  \tconst char *str;\n>  \n>  \tif (format == EXIF_FORMAT_ASCII) {\n> @@ -190,14 +201,44 @@ void Exif::setString(ExifIfd ifd, ExifTag tag, ExifFormat format, const std::str\n>  \n>  \t\t/* Pad 1 extra byte to null-terminate the ASCII string. */\n>  \t\tlength = ascii.length() + 1;\n> -\t} else {\n> -\t\tstr = item.c_str();\n> -\n> -\t\t/*\n> -\t\t * Strings stored in different formats (EXIF_FORMAT_UNDEFINED)\n> -\t\t * are not null-terminated.\n> -\t\t */\n> -\t\tlength = item.length();\n> +\t} else if (format == EXIF_FORMAT_UNDEFINED) {\n\nAny reason to replace else with else if ? If you keep a plain else, you\nwon't have to initialize length to 0.\n\n> +\t\tstd::vector<uint8_t> buf;\n> +\t\tconst std::vector<uint8_t> &constbuf = buf;\n\nI don't think you need constbuf.\n\nYou can move the encoding lookup out of the switch:\n\n\t\tauto encodingString = stringEncodingCodes.find(enc);\n\t\tif (encodingString != stringEncodingCodes.end())\n\t\t\tbuf = {\n\t\t\t\tencodingString->second.begin(),\n\t\t\t\tencodingString->second.end()\n\t\t\t};\n\n(This handles the conversion from std::array to std::vector)\n\n> +\t\tstd::u16string u16str;\n> +\n> +\t\tswitch (enc) {\n> +\t\tcase Unicode:\n> +\t\t\tbuf = StringEncodingCodes.find(enc)->second;\n> +\t\t\tu16str = utf8ToUtf16(item);\n> +\n> +\t\t\t/* Little-endian */\n\nShould this depend on the endianness of the EXIF ?\n\n> +\t\t\tbuf.resize(8 + u16str.size() * 2);\n> +\t\t\tfor (size_t i = 0; i < u16str.size(); i++) {\n\n\t\t\tfor (char16_t c : u16str) {\n\n> +\t\t\t\tbuf[8 + 2 * i] = (u16str[i] & 0xff);\n> +\t\t\t\tbuf[8 + 2 * i + 1] = ((u16str[i] >> 8) & 0xff);\n> +\t\t\t}\n> +\n> +\t\t\tstr = reinterpret_cast<const char *>(constbuf.data());\n> +\t\t\tlength = buf.size();\n> +\n> +\t\t\tbreak;\n> +\t\t/* \\todo Support JIS */\n> +\t\tcase JIS:\n> +\t\tcase ASCII:\n> +\t\tcase Undefined:\n> +\t\t\tbuf = StringEncodingCodes.find(enc)->second;\n> +\t\t\t[[fallthrough]];\n\n\t\tcase NoEncoding:\n\nand you should be able to drop the default case.\n\n> +\t\tdefault:\n> +\t\t\tbuf.insert(buf.end(), item.begin(), item.end());\n> +\t\t\tstr = reinterpret_cast<const char *>(constbuf.data());\n> +\n> +\t\t\t/*\n> +\t\t\t * Strings stored in different formats (EXIF_FORMAT_UNDEFINED)\n> +\t\t\t * are not null-terminated.\n> +\t\t\t */\n> +\t\t\tlength = buf.size();\n> +\t\t\tbreak;\n> +\t\t}\n\nOnce you exit this scope, buf will be destroyed, and str becomes an\ninvalid pointer. You need to declare buf in the top level scope.\n\n>  \t}\n>  \n>  \tExifEntry *entry = createEntry(ifd, tag, format, length, length);\n> @@ -210,12 +251,12 @@ void Exif::setString(ExifIfd ifd, ExifTag tag, ExifFormat format, const std::str\n>  \n>  void Exif::setMake(const std::string &make)\n>  {\n> -\tsetString(EXIF_IFD_0, EXIF_TAG_MAKE, EXIF_FORMAT_ASCII, make);\n> +\tsetString(EXIF_IFD_0, EXIF_TAG_MAKE, EXIF_FORMAT_ASCII, NoEncoding, make);\n>  }\n>  \n>  void Exif::setModel(const std::string &model)\n>  {\n> -\tsetString(EXIF_IFD_0, EXIF_TAG_MODEL, EXIF_FORMAT_ASCII, model);\n> +\tsetString(EXIF_IFD_0, EXIF_TAG_MODEL, EXIF_FORMAT_ASCII, NoEncoding, model);\n>  }\n>  \n>  void Exif::setSize(const Size &size)\n> @@ -233,9 +274,9 @@ void Exif::setTimestamp(time_t timestamp)\n>  \tstrftime(str, sizeof(str), \"%Y:%m:%d %H:%M:%S\", &tm);\n>  \tstd::string ts(str);\n>  \n> -\tsetString(EXIF_IFD_0, EXIF_TAG_DATE_TIME, EXIF_FORMAT_ASCII, ts);\n> -\tsetString(EXIF_IFD_EXIF, EXIF_TAG_DATE_TIME_ORIGINAL, EXIF_FORMAT_ASCII, ts);\n> -\tsetString(EXIF_IFD_EXIF, EXIF_TAG_DATE_TIME_DIGITIZED, EXIF_FORMAT_ASCII, ts);\n> +\tsetString(EXIF_IFD_0, EXIF_TAG_DATE_TIME, EXIF_FORMAT_ASCII, NoEncoding, ts);\n> +\tsetString(EXIF_IFD_EXIF, EXIF_TAG_DATE_TIME_ORIGINAL, EXIF_FORMAT_ASCII, NoEncoding, ts);\n> +\tsetString(EXIF_IFD_EXIF, EXIF_TAG_DATE_TIME_DIGITIZED, EXIF_FORMAT_ASCII, NoEncoding, ts);\n>  \n>  \t/* Query and set timezone information if available. */\n>  \tint r = strftime(str, sizeof(str), \"%z\", &tm);\n> @@ -244,13 +285,13 @@ void Exif::setTimestamp(time_t timestamp)\n>  \t\ttz.insert(3, 1, ':');\n>  \t\tsetString(EXIF_IFD_EXIF,\n>  \t\t\t  static_cast<ExifTag>(_ExifTag::OFFSET_TIME),\n> -\t\t\t  EXIF_FORMAT_ASCII, tz);\n> +\t\t\t  EXIF_FORMAT_ASCII, NoEncoding, tz);\n>  \t\tsetString(EXIF_IFD_EXIF,\n>  \t\t\t  static_cast<ExifTag>(_ExifTag::OFFSET_TIME_ORIGINAL),\n> -\t\t\t  EXIF_FORMAT_ASCII, tz);\n> +\t\t\t  EXIF_FORMAT_ASCII, NoEncoding, tz);\n>  \t\tsetString(EXIF_IFD_EXIF,\n>  \t\t\t  static_cast<ExifTag>(_ExifTag::OFFSET_TIME_DIGITIZED),\n> -\t\t\t  EXIF_FORMAT_ASCII, tz);\n> +\t\t\t  EXIF_FORMAT_ASCII, NoEncoding, tz);\n>  \t}\n>  }\n>  \n> @@ -290,6 +331,34 @@ void Exif::setThumbnail(Span<const unsigned char> thumbnail,\n>  \tsetShort(EXIF_IFD_0, EXIF_TAG_COMPRESSION, compression);\n>  }\n>  \n> +/**\n> + * \\brief Convert UTF-8 string to UTF-16 string\n> + * \\param[in] str String to convert\n> + *\n> + * \\return \\a str in UTF-16\n> + */\n> +std::u16string Exif::utf8ToUtf16(const std::string &str)\n> +{\n> +\tstd::mbstate_t state{};\n\nThis should be mbstate_t as you're including uchar.h.\n\n> +\tchar16_t c16;\n> +\tconst char *ptr = str.data();\n> +\tconst char *end = ptr + str.size();\n> +\n> +\tstd::u16string ret;\n> +\twhile (size_t rc = mbrtoc16(&c16, ptr, end - ptr + 1, &state)) {\n> +\t\tif (rc == static_cast<size_t>(-2) ||\n> +\t\t    rc == static_cast<size_t>(-1))\n> +\t\t\tbreak;\n> +\n> +\t\tret.push_back(c16);\n> +\n> +\t\tif (rc > 0)\n> +\t\t\tptr += rc;\n> +\t}\n> +\n> +\treturn ret;\n> +}\n> +\n>  [[nodiscard]] int Exif::generate()\n>  {\n>  \tif (exifData_) {\n> diff --git a/src/android/jpeg/exif.h b/src/android/jpeg/exif.h\n> index 5cab4559..db98ba63 100644\n> --- a/src/android/jpeg/exif.h\n> +++ b/src/android/jpeg/exif.h\n> @@ -26,6 +26,14 @@ public:\n>  \t\tJPEG = 6,\n>  \t};\n>  \n> +\tenum StringEncoding {\n> +\t\tNoEncoding = 0,\n> +\t\tASCII = 1,\n> +\t\tJIS = 2,\n> +\t\tUnicode = 3,\n> +\t\tUndefined = 4,\n> +\t};\n> +\n>  \tvoid setMake(const std::string &make);\n>  \tvoid setModel(const std::string &model);\n>  \n> @@ -46,9 +54,11 @@ private:\n>  \tvoid setShort(ExifIfd ifd, ExifTag tag, uint16_t item);\n>  \tvoid setLong(ExifIfd ifd, ExifTag tag, uint32_t item);\n>  \tvoid setString(ExifIfd ifd, ExifTag tag, ExifFormat format,\n> -\t\t       const std::string &item);\n> +\t\t       StringEncoding enc, const std::string &item);\n\nWould it make sense to move the encoding parameter last, to default it\nto NoEncoding ? That way most of the callers (and in particular the ones\nthat pass EXIF_FORMAT_ASCII for format) wouldn't need to set enc to\nNoEncoding explicitly.\n\n>  \tvoid setRational(ExifIfd ifd, ExifTag tag, ExifRational item);\n>  \n> +\tstd::u16string utf8ToUtf16(const std::string &str);\n> +\n>  \tbool valid_;\n>  \n>  \tExifData *data_;","headers":{"Return-Path":"<libcamera-devel-bounces@lists.libcamera.org>","X-Original-To":"parsemail@patchwork.libcamera.org","Delivered-To":"parsemail@patchwork.libcamera.org","Received":["from lancelot.ideasonboard.com (lancelot.ideasonboard.com\n\t[92.243.16.209])\n\tby patchwork.libcamera.org (Postfix) with ESMTPS id 0CD4BBD808\n\tfor <parsemail@patchwork.libcamera.org>;\n\tThu, 21 Jan 2021 19:36:17 +0000 (UTC)","from lancelot.ideasonboard.com (localhost [IPv6:::1])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTP id 7A36C68210;\n\tThu, 21 Jan 2021 20:36:16 +0100 (CET)","from perceval.ideasonboard.com (perceval.ideasonboard.com\n\t[IPv6:2001:4b98:dc2:55:216:3eff:fef7:d647])\n\tby lancelot.ideasonboard.com (Postfix) with ESMTPS id ADFCC68204\n\tfor <libcamera-devel@lists.libcamera.org>;\n\tThu, 21 Jan 2021 20:36:14 +0100 (CET)","from pendragon.ideasonboard.com (62-78-145-57.bb.dnainternet.fi\n\t[62.78.145.57])\n\tby perceval.ideasonboard.com (Postfix) with ESMTPSA id 1975350E;\n\tThu, 21 Jan 2021 20:36:14 +0100 (CET)"],"Authentication-Results":"lancelot.ideasonboard.com;\n\tdkim=fail reason=\"signature verification failed\" (1024-bit key;\n\tunprotected) header.d=ideasonboard.com header.i=@ideasonboard.com\n\theader.b=\"FvT1duwj\"; dkim-atps=neutral","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=ideasonboard.com;\n\ts=mail; t=1611257774;\n\tbh=au6zv5Env7RixbT9xD9MJgZQtPME3L40UYPFzHSiiRg=;\n\th=Date:From:To:Cc:Subject:References:In-Reply-To:From;\n\tb=FvT1duwjeQ3cDql+5xTdZUX+LlYZddbmw0PccCBpL/wpl7yXMoUgNe+sSm8FgMubv\n\t1xTrh66MckS9EIqibhGj+kBZh4B1eYeQFDChDPjN+B2sdSL/HcLMzqKh37thOd0kFv\n\tBdzmERX+D77DHDubbILht5sFQ7Io+FDxbXPvK90o=","Date":"Thu, 21 Jan 2021 21:35:55 +0200","From":"Laurent Pinchart <laurent.pinchart@ideasonboard.com>","To":"Paul Elder <paul.elder@ideasonboard.com>","Message-ID":"<YAnXm/Kakvwmg5S3@pendragon.ideasonboard.com>","References":"<20210121101549.134574-1-paul.elder@ideasonboard.com>\n\t<20210121101549.134574-2-paul.elder@ideasonboard.com>","MIME-Version":"1.0","Content-Disposition":"inline","In-Reply-To":"<20210121101549.134574-2-paul.elder@ideasonboard.com>","Subject":"Re: [libcamera-devel] [PATCH v2 1/9] android: jpeg: exif: Expand\n\tsetString to support different encodings","X-BeenThere":"libcamera-devel@lists.libcamera.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"<libcamera-devel.lists.libcamera.org>","List-Unsubscribe":"<https://lists.libcamera.org/options/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=unsubscribe>","List-Archive":"<https://lists.libcamera.org/pipermail/libcamera-devel/>","List-Post":"<mailto:libcamera-devel@lists.libcamera.org>","List-Help":"<mailto:libcamera-devel-request@lists.libcamera.org?subject=help>","List-Subscribe":"<https://lists.libcamera.org/listinfo/libcamera-devel>,\n\t<mailto:libcamera-devel-request@lists.libcamera.org?subject=subscribe>","Cc":"libcamera-devel@lists.libcamera.org","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"7bit","Errors-To":"libcamera-devel-bounces@lists.libcamera.org","Sender":"\"libcamera-devel\" <libcamera-devel-bounces@lists.libcamera.org>"}}]