Message ID | 20200813095726.3497193-3-niklas.soderlund@ragnatech.se |
---|---|
State | Superseded |
Delegated to: | Niklas Söderlund |
Headers | show |
Series |
|
Related | show |
Hi Niklas, On 13/08/2020 10:57, Niklas Söderlund wrote: > Add method that strips non-ASCII characters from a string. > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> > --- > include/libcamera/internal/utils.h | 2 ++ > src/libcamera/utils.cpp | 21 +++++++++++++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h > index 45cd6f120c51586b..5bfd2a8782dbd623 100644 > --- a/include/libcamera/internal/utils.h > +++ b/include/libcamera/internal/utils.h > @@ -197,6 +197,8 @@ private: > > details::StringSplitter split(const std::string &str, const std::string &delim); > > +std::string stripUnicode(const std::string &str); > + > std::string libcameraBuildPath(); > std::string libcameraSourcePath(); > > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp > index 615df46ac142a2a9..041fdc91a0a35277 100644 > --- a/src/libcamera/utils.cpp > +++ b/src/libcamera/utils.cpp > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) > return details::StringSplitter(str, delim); > } > > +/** > + * \brief Strip all Unicode characters from a string > + * \param[in] str The string to strip > + * > + * Strip all non-ASCII characters form a string. An Unicode character that spans s/An/A/ > + * multiple bytes (and therefor is not also an ASCII character) may be s/therefor/therefore/ > + * identified by the fact that its most significant bit is always set. > + * > + * \todo When switching to C++ 20 use std::remove_if. > + * > + * \return An ASCII string > + */ > +std::string stripUnicode(const std::string &str) > +{ > + std::string ret; > + for (const char &c : str) > + if (!(c & 0x80)) > + ret += c; Should we replace non-ascii characters with a replacement such as '_'? Although the name is 'strip' so that does imply we want to remove them. Also should std::isprint() be used? I presume we want to remove all non-printable chars - i.e. anything that's not valid for display? > + return ret; > +} > + > /** > * \brief Check if libcamera is installed or not > * >
On 13/08/2020 16:21, Kieran Bingham wrote: > Hi Niklas, > > On 13/08/2020 10:57, Niklas Söderlund wrote: >> Add method that strips non-ASCII characters from a string. >> >> Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> >> --- >> include/libcamera/internal/utils.h | 2 ++ >> src/libcamera/utils.cpp | 21 +++++++++++++++++++++ >> 2 files changed, 23 insertions(+) >> >> diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h >> index 45cd6f120c51586b..5bfd2a8782dbd623 100644 >> --- a/include/libcamera/internal/utils.h >> +++ b/include/libcamera/internal/utils.h >> @@ -197,6 +197,8 @@ private: >> >> details::StringSplitter split(const std::string &str, const std::string &delim); >> >> +std::string stripUnicode(const std::string &str); >> + >> std::string libcameraBuildPath(); >> std::string libcameraSourcePath(); >> >> diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp >> index 615df46ac142a2a9..041fdc91a0a35277 100644 >> --- a/src/libcamera/utils.cpp >> +++ b/src/libcamera/utils.cpp >> @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) >> return details::StringSplitter(str, delim); >> } >> >> +/** >> + * \brief Strip all Unicode characters from a string >> + * \param[in] str The string to strip >> + * >> + * Strip all non-ASCII characters form a string. An Unicode character that spans > > s/An/A/ > >> + * multiple bytes (and therefor is not also an ASCII character) may be > > s/therefor/therefore/ > >> + * identified by the fact that its most significant bit is always set. >> + * >> + * \todo When switching to C++ 20 use std::remove_if. >> + * >> + * \return An ASCII string throwing out the bikeshed now I've seen it in use: How about calling this toAscii? stripUnicode() toAscii() 5 chars shorter, and follows the toString toXXX model we have elsewhere? >> + */ >> +std::string stripUnicode(const std::string &str) >> +{ >> + std::string ret; >> + for (const char &c : str) >> + if (!(c & 0x80)) >> + ret += c; > > Should we replace non-ascii characters with a replacement such as '_'? > > Although the name is 'strip' so that does imply we want to remove them. > > Also should std::isprint() be used? I presume we want to remove all > non-printable chars - i.e. anything that's not valid for display? > > >> + return ret; >> +} >> + >> /** >> * \brief Check if libcamera is installed or not >> * >> >
diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h index 45cd6f120c51586b..5bfd2a8782dbd623 100644 --- a/include/libcamera/internal/utils.h +++ b/include/libcamera/internal/utils.h @@ -197,6 +197,8 @@ private: details::StringSplitter split(const std::string &str, const std::string &delim); +std::string stripUnicode(const std::string &str); + std::string libcameraBuildPath(); std::string libcameraSourcePath(); diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp index 615df46ac142a2a9..041fdc91a0a35277 100644 --- a/src/libcamera/utils.cpp +++ b/src/libcamera/utils.cpp @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) return details::StringSplitter(str, delim); } +/** + * \brief Strip all Unicode characters from a string + * \param[in] str The string to strip + * + * Strip all non-ASCII characters form a string. An Unicode character that spans + * multiple bytes (and therefor is not also an ASCII character) may be + * identified by the fact that its most significant bit is always set. + * + * \todo When switching to C++ 20 use std::remove_if. + * + * \return An ASCII string + */ +std::string stripUnicode(const std::string &str) +{ + std::string ret; + for (const char &c : str) + if (!(c & 0x80)) + ret += c; + return ret; +} + /** * \brief Check if libcamera is installed or not *
Add method that strips non-ASCII characters from a string. Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> --- include/libcamera/internal/utils.h | 2 ++ src/libcamera/utils.cpp | 21 +++++++++++++++++++++ 2 files changed, 23 insertions(+)