[RFC,v2,1/2] py: gen-py-formats.py: Open input file in binary mode
diff mbox series

Message ID 20250912125528.1963619-1-barnabas.pocze@ideasonboard.com
State New
Headers show
Series
  • [RFC,v2,1/2] py: gen-py-formats.py: Open input file in binary mode
Related show

Commit Message

Barnabás Pőcze Sept. 12, 2025, 12:55 p.m. UTC
Other code generation scripts do that already and let pyyaml deal with
decoding utf-8, etc. So do the same here as well.

Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
---
 src/py/libcamera/gen-py-formats.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Laurent Pinchart Sept. 12, 2025, 2 p.m. UTC | #1
On Fri, Sep 12, 2025 at 02:55:27PM +0200, Barnabás Pőcze wrote:
> Other code generation scripts do that already and let pyyaml deal with
> decoding utf-8, etc. So do the same here as well.

How does pyyaml determine the encoding ? Does it just hardcode utf-8 ?

> Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
> ---
>  src/py/libcamera/gen-py-formats.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/py/libcamera/gen-py-formats.py b/src/py/libcamera/gen-py-formats.py
> index 0ff1d12ac..6323e237f 100755
> --- a/src/py/libcamera/gen-py-formats.py
> +++ b/src/py/libcamera/gen-py-formats.py
> @@ -37,7 +37,7 @@ def main(argv):
>                          help='Template file name.')
>      args = parser.parse_args(argv[1:])
>  
> -    with open(args.input, encoding='utf-8') as f:
> +    with open(args.input, 'rb') as f:
>          formats = yaml.safe_load(f)['formats']
>  
>      data = generate(formats)
Barnabás Pőcze Sept. 12, 2025, 2:08 p.m. UTC | #2
2025. 09. 12. 16:00 keltezéssel, Laurent Pinchart írta:
> On Fri, Sep 12, 2025 at 02:55:27PM +0200, Barnabás Pőcze wrote:
>> Other code generation scripts do that already and let pyyaml deal with
>> decoding utf-8, etc. So do the same here as well.
> 
> How does pyyaml determine the encoding ? Does it just hardcode utf-8 ?

https://yaml.org/spec/1.2.2/#52-character-encodings says that if there is no
BOM, then it is utf-8. And additionally:

   If a character stream begins with a byte order mark, the character encoding will be
   taken to be as indicated by the byte order mark. Otherwise, the stream must begin
   with an ASCII character. This allows the encoding to be deduced by the pattern of
   null (x00) characters.

So for our purposes it will deduce utf-8 since no yaml file that is used here starts
with a BOM or a "long ascii character" as far as I can tell.

Due to this special behaviour, I'd say opening it in binary mode is the correct choice.


Regards,
Barnabás Pőcze


> 
>> Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
>> ---
>>   src/py/libcamera/gen-py-formats.py | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/py/libcamera/gen-py-formats.py b/src/py/libcamera/gen-py-formats.py
>> index 0ff1d12ac..6323e237f 100755
>> --- a/src/py/libcamera/gen-py-formats.py
>> +++ b/src/py/libcamera/gen-py-formats.py
>> @@ -37,7 +37,7 @@ def main(argv):
>>                           help='Template file name.')
>>       args = parser.parse_args(argv[1:])
>>   
>> -    with open(args.input, encoding='utf-8') as f:
>> +    with open(args.input, 'rb') as f:
>>           formats = yaml.safe_load(f)['formats']
>>   
>>       data = generate(formats)
>
Laurent Pinchart Sept. 12, 2025, 2:23 p.m. UTC | #3
On Fri, Sep 12, 2025 at 04:08:50PM +0200, Barnabás Pőcze wrote:
> 2025. 09. 12. 16:00 keltezéssel, Laurent Pinchart írta:
> > On Fri, Sep 12, 2025 at 02:55:27PM +0200, Barnabás Pőcze wrote:
> >> Other code generation scripts do that already and let pyyaml deal with
> >> decoding utf-8, etc. So do the same here as well.
> > 
> > How does pyyaml determine the encoding ? Does it just hardcode utf-8 ?
> 
> https://yaml.org/spec/1.2.2/#52-character-encodings says that if there is no
> BOM, then it is utf-8. And additionally:
> 
>    If a character stream begins with a byte order mark, the character encoding will be
>    taken to be as indicated by the byte order mark. Otherwise, the stream must begin
>    with an ASCII character. This allows the encoding to be deduced by the pattern of
>    null (x00) characters.
> 
> So for our purposes it will deduce utf-8 since no yaml file that is used here starts
> with a BOM or a "long ascii character" as far as I can tell.
> 
> Due to this special behaviour, I'd say opening it in binary mode is the correct choice.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> >> Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
> >> ---
> >>   src/py/libcamera/gen-py-formats.py | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/py/libcamera/gen-py-formats.py b/src/py/libcamera/gen-py-formats.py
> >> index 0ff1d12ac..6323e237f 100755
> >> --- a/src/py/libcamera/gen-py-formats.py
> >> +++ b/src/py/libcamera/gen-py-formats.py
> >> @@ -37,7 +37,7 @@ def main(argv):
> >>                           help='Template file name.')
> >>       args = parser.parse_args(argv[1:])
> >>   
> >> -    with open(args.input, encoding='utf-8') as f:
> >> +    with open(args.input, 'rb') as f:
> >>           formats = yaml.safe_load(f)['formats']
> >>   
> >>       data = generate(formats)

Patch
diff mbox series

diff --git a/src/py/libcamera/gen-py-formats.py b/src/py/libcamera/gen-py-formats.py
index 0ff1d12ac..6323e237f 100755
--- a/src/py/libcamera/gen-py-formats.py
+++ b/src/py/libcamera/gen-py-formats.py
@@ -37,7 +37,7 @@  def main(argv):
                         help='Template file name.')
     args = parser.parse_args(argv[1:])
 
-    with open(args.input, encoding='utf-8') as f:
+    with open(args.input, 'rb') as f:
         formats = yaml.safe_load(f)['formats']
 
     data = generate(formats)