[libcamera-devel,v3,2/7] libcamera: utils: Add method to strip Unicode characters

Message ID 20200813095726.3497193-3-niklas.soderlund@ragnatech.se
State Superseded
Delegated to: Niklas Söderlund
Headers show
Series
  • libcamera: Allow for user-friendly names in applications
Related show

Commit Message

Niklas Söderlund Aug. 13, 2020, 9:57 a.m. UTC
Add method that strips non-ASCII characters from a string.

Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
---
 include/libcamera/internal/utils.h |  2 ++
 src/libcamera/utils.cpp            | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

Comments

Kieran Bingham Aug. 13, 2020, 3:21 p.m. UTC | #1
Hi Niklas,

On 13/08/2020 10:57, Niklas Söderlund wrote:
> Add method that strips non-ASCII characters from a string.
> 
> Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> ---
>  include/libcamera/internal/utils.h |  2 ++
>  src/libcamera/utils.cpp            | 21 +++++++++++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h
> index 45cd6f120c51586b..5bfd2a8782dbd623 100644
> --- a/include/libcamera/internal/utils.h
> +++ b/include/libcamera/internal/utils.h
> @@ -197,6 +197,8 @@ private:
>  
>  details::StringSplitter split(const std::string &str, const std::string &delim);
>  
> +std::string stripUnicode(const std::string &str);
> +
>  std::string libcameraBuildPath();
>  std::string libcameraSourcePath();
>  
> diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp
> index 615df46ac142a2a9..041fdc91a0a35277 100644
> --- a/src/libcamera/utils.cpp
> +++ b/src/libcamera/utils.cpp
> @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim)
>  	return details::StringSplitter(str, delim);
>  }
>  
> +/**
> + * \brief Strip all Unicode characters from a string
> + * \param[in] str The string to strip
> + *
> + * Strip all non-ASCII characters form a string. An Unicode character that spans

s/An/A/

> + * multiple bytes (and therefor is not also an ASCII character) may be

s/therefor/therefore/

> + * identified by the fact that its most significant bit is always set.
> + *
> + * \todo When switching to C++ 20 use std::remove_if.
> + *
> + * \return An ASCII string
> + */
> +std::string stripUnicode(const std::string &str)
> +{
> +	std::string ret;
> +	for (const char &c : str)
> +		if (!(c & 0x80))
> +			ret += c;

Should we replace non-ascii characters with a replacement such as '_'?

Although the name is 'strip' so that does imply we want to remove them.

Also should std::isprint() be used? I presume we want to remove all
non-printable chars - i.e. anything that's not valid for display?


> +	return ret;
> +}
> +
>  /**
>   * \brief Check if libcamera is installed or not
>   *
>
Kieran Bingham Aug. 13, 2020, 3:42 p.m. UTC | #2
On 13/08/2020 16:21, Kieran Bingham wrote:
> Hi Niklas,
> 
> On 13/08/2020 10:57, Niklas Söderlund wrote:
>> Add method that strips non-ASCII characters from a string.
>>
>> Signed-off-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
>> ---
>>  include/libcamera/internal/utils.h |  2 ++
>>  src/libcamera/utils.cpp            | 21 +++++++++++++++++++++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h
>> index 45cd6f120c51586b..5bfd2a8782dbd623 100644
>> --- a/include/libcamera/internal/utils.h
>> +++ b/include/libcamera/internal/utils.h
>> @@ -197,6 +197,8 @@ private:
>>  
>>  details::StringSplitter split(const std::string &str, const std::string &delim);
>>  
>> +std::string stripUnicode(const std::string &str);
>> +
>>  std::string libcameraBuildPath();
>>  std::string libcameraSourcePath();
>>  
>> diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp
>> index 615df46ac142a2a9..041fdc91a0a35277 100644
>> --- a/src/libcamera/utils.cpp
>> +++ b/src/libcamera/utils.cpp
>> @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim)
>>  	return details::StringSplitter(str, delim);
>>  }
>>  
>> +/**
>> + * \brief Strip all Unicode characters from a string
>> + * \param[in] str The string to strip
>> + *
>> + * Strip all non-ASCII characters form a string. An Unicode character that spans
> 
> s/An/A/
> 
>> + * multiple bytes (and therefor is not also an ASCII character) may be
> 
> s/therefor/therefore/
> 
>> + * identified by the fact that its most significant bit is always set.
>> + *
>> + * \todo When switching to C++ 20 use std::remove_if.
>> + *
>> + * \return An ASCII string

throwing out the bikeshed now I've seen it in use:

How about calling this toAscii?

stripUnicode()
toAscii()

5 chars shorter, and follows the toString toXXX model we have elsewhere?

>> + */
>> +std::string stripUnicode(const std::string &str)
>> +{
>> +	std::string ret;
>> +	for (const char &c : str)
>> +		if (!(c & 0x80))
>> +			ret += c;
> 
> Should we replace non-ascii characters with a replacement such as '_'?
> 
> Although the name is 'strip' so that does imply we want to remove them.
> 
> Also should std::isprint() be used? I presume we want to remove all
> non-printable chars - i.e. anything that's not valid for display?
> 
> 
>> +	return ret;
>> +}
>> +
>>  /**
>>   * \brief Check if libcamera is installed or not
>>   *
>>
>

Patch

diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h
index 45cd6f120c51586b..5bfd2a8782dbd623 100644
--- a/include/libcamera/internal/utils.h
+++ b/include/libcamera/internal/utils.h
@@ -197,6 +197,8 @@  private:
 
 details::StringSplitter split(const std::string &str, const std::string &delim);
 
+std::string stripUnicode(const std::string &str);
+
 std::string libcameraBuildPath();
 std::string libcameraSourcePath();
 
diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp
index 615df46ac142a2a9..041fdc91a0a35277 100644
--- a/src/libcamera/utils.cpp
+++ b/src/libcamera/utils.cpp
@@ -342,6 +342,27 @@  details::StringSplitter split(const std::string &str, const std::string &delim)
 	return details::StringSplitter(str, delim);
 }
 
+/**
+ * \brief Strip all Unicode characters from a string
+ * \param[in] str The string to strip
+ *
+ * Strip all non-ASCII characters form a string. An Unicode character that spans
+ * multiple bytes (and therefor is not also an ASCII character) may be
+ * identified by the fact that its most significant bit is always set.
+ *
+ * \todo When switching to C++ 20 use std::remove_if.
+ *
+ * \return An ASCII string
+ */
+std::string stripUnicode(const std::string &str)
+{
+	std::string ret;
+	for (const char &c : str)
+		if (!(c & 0x80))
+			ret += c;
+	return ret;
+}
+
 /**
  * \brief Check if libcamera is installed or not
  *