Message ID | 20200813223722.4050835-3-niklas.soderlund@ragnatech.se |
---|---|
State | Superseded |
Delegated to: | Niklas Söderlund |
Headers | show |
Series |
|
Related | show |
Hi Niklas, On 13/08/2020 23:37, Niklas Söderlund wrote: > Add method that strips non-ASCII characters from a string. > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> I still would have thought the isprint might be more useful, as I don't think we want chars like newlines/or console control characthers getting through these strings. But if you don't think that's an issue here: Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com> I don't want to block the series, and this is an internal helper so it can always be updated if we hit any issue. > --- > * Changes since v3 > - Fix spelling in comment. > - Rename to toAscii() > --- > include/libcamera/internal/utils.h | 2 ++ > src/libcamera/utils.cpp | 21 +++++++++++++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h > index 45cd6f120c51586b..b27f5a2323552058 100644 > --- a/include/libcamera/internal/utils.h > +++ b/include/libcamera/internal/utils.h > @@ -197,6 +197,8 @@ private: > > details::StringSplitter split(const std::string &str, const std::string &delim); > > +std::string toAscii(const std::string &str); > + > std::string libcameraBuildPath(); > std::string libcameraSourcePath(); > > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp > index 615df46ac142a2a9..726b84bfbae53ff2 100644 > --- a/src/libcamera/utils.cpp > +++ b/src/libcamera/utils.cpp > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) > return details::StringSplitter(str, delim); > } > > +/** > + * \brief Strip all Unicode characters from a string > + * \param[in] str The string to strip > + * > + * Strip all non-ASCII characters form a string. A Unicode character that spans > + * multiple bytes (and therefore is not also an ASCII character) may be > + * identified by the fact that its most significant bit is always set. > + * > + * \todo When switching to C++ 20 use std::remove_if. > + * > + * \return An ASCII string > + */ > +std::string toAscii(const std::string &str) > +{ > + std::string ret; > + for (const char &c : str) > + if (!(c & 0x80)) > + ret += c; > + return ret; > +} > + > /** > * \brief Check if libcamera is installed or not > * >
Hi Kieran, On 2020-08-14 09:28:59 +0100, Kieran Bingham wrote: > Hi Niklas, > > On 13/08/2020 23:37, Niklas Söderlund wrote: > > Add method that strips non-ASCII characters from a string. > > > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> > > I still would have thought the isprint might be more useful, as I don't > think we want chars like newlines/or console control characthers getting > through these strings. I thought about it but if we used std::isprint() this would not produce an ASCII string, instead it would produce an printable string :-) I read some post (which I can't find now) that spoke agains using std::isprint() to drop non-ASCII charters, but it might have been targeted on proprietary OS implementations... As you say, this is an internal helper and we can switch to std::isprint() later if need. > > But if you don't think that's an issue here: > > Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Thanks! > > I don't want to block the series, and this is an internal helper so it > can always be updated if we hit any issue. > > > > > --- > > * Changes since v3 > > - Fix spelling in comment. > > - Rename to toAscii() > > --- > > include/libcamera/internal/utils.h | 2 ++ > > src/libcamera/utils.cpp | 21 +++++++++++++++++++++ > > 2 files changed, 23 insertions(+) > > > > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h > > index 45cd6f120c51586b..b27f5a2323552058 100644 > > --- a/include/libcamera/internal/utils.h > > +++ b/include/libcamera/internal/utils.h > > @@ -197,6 +197,8 @@ private: > > > > details::StringSplitter split(const std::string &str, const std::string &delim); > > > > +std::string toAscii(const std::string &str); > > + > > std::string libcameraBuildPath(); > > std::string libcameraSourcePath(); > > > > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp > > index 615df46ac142a2a9..726b84bfbae53ff2 100644 > > --- a/src/libcamera/utils.cpp > > +++ b/src/libcamera/utils.cpp > > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) > > return details::StringSplitter(str, delim); > > } > > > > +/** > > + * \brief Strip all Unicode characters from a string > > + * \param[in] str The string to strip > > + * > > + * Strip all non-ASCII characters form a string. A Unicode character that spans > > + * multiple bytes (and therefore is not also an ASCII character) may be > > + * identified by the fact that its most significant bit is always set. > > + * > > + * \todo When switching to C++ 20 use std::remove_if. > > + * > > + * \return An ASCII string > > + */ > > +std::string toAscii(const std::string &str) > > +{ > > + std::string ret; > > + for (const char &c : str) > > + if (!(c & 0x80)) > > + ret += c; > > + return ret; > > +} > > + > > /** > > * \brief Check if libcamera is installed or not > > * > > > > -- > Regards > -- > Kieran
On Fri, Aug 14, 2020 at 12:37:17AM +0200, Niklas Söderlund wrote: > Add method that strips non-ASCII characters from a string. > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> > --- > * Changes since v3 > - Fix spelling in comment. > - Rename to toAscii() > --- > include/libcamera/internal/utils.h | 2 ++ > src/libcamera/utils.cpp | 21 +++++++++++++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h > index 45cd6f120c51586b..b27f5a2323552058 100644 > --- a/include/libcamera/internal/utils.h > +++ b/include/libcamera/internal/utils.h > @@ -197,6 +197,8 @@ private: > > details::StringSplitter split(const std::string &str, const std::string &delim); > > +std::string toAscii(const std::string &str); > + > std::string libcameraBuildPath(); > std::string libcameraSourcePath(); > > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp > index 615df46ac142a2a9..726b84bfbae53ff2 100644 > --- a/src/libcamera/utils.cpp > +++ b/src/libcamera/utils.cpp > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) > return details::StringSplitter(str, delim); > } > > +/** > + * \brief Strip all Unicode characters from a string > + * \param[in] str The string to strip > + * > + * Strip all non-ASCII characters form a string. A Unicode character that spans > + * multiple bytes (and therefore is not also an ASCII character) may be "Unicode character" refers to the code points, while "spans multiply bytes" refers to encodings. As std::string has no notion of encoding, this documentation should tell what the expected input encoding is. > + * identified by the fact that its most significant bit is always set. Open question, do we want to remove non-ASCII characters, or replaced them (with a ".", "?" or something else) ? > + * > + * \todo When switching to C++ 20 use std::remove_if. That would be less efficient :-) > + * > + * \return An ASCII string > + */ > +std::string toAscii(const std::string &str) toAscii() makes it sound the function converts the string, while it really removes characters. > +{ > + std::string ret; ret.reserve(str.size()); > + for (const char &c : str) > + if (!(c & 0x80)) > + ret += c; > + return ret; > +} > + > /** > * \brief Check if libcamera is installed or not > *
Hi Laurent, Thanks for your feedback. On 2020-08-16 14:58:13 +0300, Laurent Pinchart wrote: > On Fri, Aug 14, 2020 at 12:37:17AM +0200, Niklas Söderlund wrote: > > Add method that strips non-ASCII characters from a string. > > > > Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> > > --- > > * Changes since v3 > > - Fix spelling in comment. > > - Rename to toAscii() > > --- > > include/libcamera/internal/utils.h | 2 ++ > > src/libcamera/utils.cpp | 21 +++++++++++++++++++++ > > 2 files changed, 23 insertions(+) > > > > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h > > index 45cd6f120c51586b..b27f5a2323552058 100644 > > --- a/include/libcamera/internal/utils.h > > +++ b/include/libcamera/internal/utils.h > > @@ -197,6 +197,8 @@ private: > > > > details::StringSplitter split(const std::string &str, const std::string &delim); > > > > +std::string toAscii(const std::string &str); > > + > > std::string libcameraBuildPath(); > > std::string libcameraSourcePath(); > > > > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp > > index 615df46ac142a2a9..726b84bfbae53ff2 100644 > > --- a/src/libcamera/utils.cpp > > +++ b/src/libcamera/utils.cpp > > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) > > return details::StringSplitter(str, delim); > > } > > > > +/** > > + * \brief Strip all Unicode characters from a string > > + * \param[in] str The string to strip > > + * > > + * Strip all non-ASCII characters form a string. A Unicode character that spans > > + * multiple bytes (and therefore is not also an ASCII character) may be > > "Unicode character" refers to the code points, while "spans multiply > bytes" refers to encodings. As std::string has no notion of encoding, > this documentation should tell what the expected input encoding is. > > > + * identified by the fact that its most significant bit is always set. > > Open question, do we want to remove non-ASCII characters, or replaced > them (with a ".", "?" or something else) ? I have not strong opinion, dropping them feels more natural to me but I'm open to substitution. What do other think? > > > + * > > + * \todo When switching to C++ 20 use std::remove_if. > > That would be less efficient :-) > > > + * > > + * \return An ASCII string > > + */ > > +std::string toAscii(const std::string &str) > > toAscii() makes it sound the function converts the string, while it > really removes characters. I originally called this stripUnicode() would that name work better for you? > > > +{ > > + std::string ret; > > ret.reserve(str.size()); > > > + for (const char &c : str) > > + if (!(c & 0x80)) > > + ret += c; > > + return ret; > > +} > > + > > /** > > * \brief Check if libcamera is installed or not > > * > > -- > Regards, > > Laurent Pinchart
diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h index 45cd6f120c51586b..b27f5a2323552058 100644 --- a/include/libcamera/internal/utils.h +++ b/include/libcamera/internal/utils.h @@ -197,6 +197,8 @@ private: details::StringSplitter split(const std::string &str, const std::string &delim); +std::string toAscii(const std::string &str); + std::string libcameraBuildPath(); std::string libcameraSourcePath(); diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp index 615df46ac142a2a9..726b84bfbae53ff2 100644 --- a/src/libcamera/utils.cpp +++ b/src/libcamera/utils.cpp @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim) return details::StringSplitter(str, delim); } +/** + * \brief Strip all Unicode characters from a string + * \param[in] str The string to strip + * + * Strip all non-ASCII characters form a string. A Unicode character that spans + * multiple bytes (and therefore is not also an ASCII character) may be + * identified by the fact that its most significant bit is always set. + * + * \todo When switching to C++ 20 use std::remove_if. + * + * \return An ASCII string + */ +std::string toAscii(const std::string &str) +{ + std::string ret; + for (const char &c : str) + if (!(c & 0x80)) + ret += c; + return ret; +} + /** * \brief Check if libcamera is installed or not *
Add method that strips non-ASCII characters from a string. Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> --- * Changes since v3 - Fix spelling in comment. - Rename to toAscii() --- include/libcamera/internal/utils.h | 2 ++ src/libcamera/utils.cpp | 21 +++++++++++++++++++++ 2 files changed, 23 insertions(+)