[v2,09/25] libtuning: Improve filename parsing
diff mbox series

Message ID 20240628104828.2928109-10-stefan.klug@ideasonboard.com
State Superseded
Headers show
Series
  • Add ccm calibration to libtuning
Related show

Commit Message

Stefan Klug June 28, 2024, 10:47 a.m. UTC
In the tuning datasets, the files had names like
'imx335_1600l_3000k_1.dng'. That failed on the old filename parsing
function. As there is no need to dictate the order of the tags, split
the big regex into chunks and parse them one by one. This also makes
the code easier to digest.

Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
---
 utils/tuning/libtuning/utils.py | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

Comments

Laurent Pinchart June 28, 2024, 11:05 p.m. UTC | #1
Hi Stefan,

Thank you for the patch.

On Fri, Jun 28, 2024 at 12:47:02PM +0200, Stefan Klug wrote:
> In the tuning datasets, the files had names like
> 'imx335_1600l_3000k_1.dng'. That failed on the old filename parsing
> function. As there is no need to dictate the order of the tags, split
> the big regex into chunks and parse them one by one. This also makes
> the code easier to digest.

Is there a risk we'll later add tags we can't differentiate without
considering their position ? I suppose we can decide how to format them,
so it should be fine ?

> Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
> Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
> ---
>  utils/tuning/libtuning/utils.py | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/utils/tuning/libtuning/utils.py b/utils/tuning/libtuning/utils.py
> index 178c6957c581..00cf5a57512f 100644
> --- a/utils/tuning/libtuning/utils.py
> +++ b/utils/tuning/libtuning/utils.py
> @@ -43,14 +43,28 @@ def _list_image_files(directory):
>  
>  
>  def _parse_image_filename(fn: Path):
> -    result = re.search(r'^(alsc_)?(\d+)[kK]_(\d+)?[lLuU]?.\w{3,4}$', fn.name)
> -    if result is None:
> -        logger.error(f'The file name of {fn.name} is incorrectly formatted')
> -        return None, None, None
> -
> -    color = int(result.group(2))
> -    lsc_only = result.group(1) is not None
> -    lux = None if lsc_only else int(result.group(3))
> +    lsc_only = False
> +    color = None

color_temperature ?

> +    lux = None
> +
> +    parts = fn.stem.split('_')
> +    for part in parts:
> +        if part == 'alsc':

Should we rename this to 'lsc' at some point ? The *adaptive* LSC is
specific to Raspberry Pi.

> +            lsc_only = True
> +            continue

Maybe a blank line here and after the next continue to let the code
breathe a bit more.

> +        r = re.match(r'(\d+)[kK]', part)

Would

        r = re.match(r'([0-9]+)[kK]', part)

be more readable ?

Do we need to be case-insensitive here, or could we standardize on one
option and stick to it ?

> +        if r:
> +            color = int(r.group(1))
> +            continue
> +        r = re.match(r'(\d+)[lLuU]', part)

Same here, and additionally, do we need to support both L and U ? Is
there a difference ?

> +        if r:
> +            lux = int(r.group(1))
> +
> +    if color is None:
> +        logger.error(f'The file name of {fn.name} does not contain a color temperature')

        logger.error(f'The file name "{fn.name}" does not contain a color temperature')

Same below.

> +
> +    if lux is None and lsc_only is False:
> +        logger.error(f'The file name of {fn.name} must either contain alsc or a lux level')
>  
>      return color, lux, lsc_only
>
Stefan Klug July 1, 2024, 3:31 p.m. UTC | #2
Hi Laurent,

On Sat, Jun 29, 2024 at 02:05:54AM +0300, Laurent Pinchart wrote:
> Hi Stefan,
> 
> Thank you for the patch.
> 
> On Fri, Jun 28, 2024 at 12:47:02PM +0200, Stefan Klug wrote:
> > In the tuning datasets, the files had names like
> > 'imx335_1600l_3000k_1.dng'. That failed on the old filename parsing
> > function. As there is no need to dictate the order of the tags, split
> > the big regex into chunks and parse them one by one. This also makes
> > the code easier to digest.
> 
> Is there a risk we'll later add tags we can't differentiate without
> considering their position ? I suppose we can decide how to format them,
> so it should be fine ?

Yes, I guess the risk is limited. The only thing I see right now is the
addition of a blc tag. But that works nicely with the scheme.

> 
> > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
> > Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
> > ---
> >  utils/tuning/libtuning/utils.py | 30 ++++++++++++++++++++++--------
> >  1 file changed, 22 insertions(+), 8 deletions(-)
> > 
> > diff --git a/utils/tuning/libtuning/utils.py b/utils/tuning/libtuning/utils.py
> > index 178c6957c581..00cf5a57512f 100644
> > --- a/utils/tuning/libtuning/utils.py
> > +++ b/utils/tuning/libtuning/utils.py
> > @@ -43,14 +43,28 @@ def _list_image_files(directory):
> >  
> >  
> >  def _parse_image_filename(fn: Path):
> > -    result = re.search(r'^(alsc_)?(\d+)[kK]_(\d+)?[lLuU]?.\w{3,4}$', fn.name)
> > -    if result is None:
> > -        logger.error(f'The file name of {fn.name} is incorrectly formatted')
> > -        return None, None, None
> > -
> > -    color = int(result.group(2))
> > -    lsc_only = result.group(1) is not None
> > -    lux = None if lsc_only else int(result.group(3))
> > +    lsc_only = False
> > +    color = None
> 
> color_temperature ?

Yes, I tried to modify as little as possible. Maybe too little :-)

> 
> > +    lux = None
> > +
> > +    parts = fn.stem.split('_')
> > +    for part in parts:
> > +        if part == 'alsc':
> 
> Should we rename this to 'lsc' at some point ? The *adaptive* LSC is
> specific to Raspberry Pi.
> 
> > +            lsc_only = True
> > +            continue
> 
> Maybe a blank line here and after the next continue to let the code
> breathe a bit more.
> 
> > +        r = re.match(r'(\d+)[kK]', part)
> 
> Would
> 
>         r = re.match(r'([0-9]+)[kK]', part)
> 
> be more readable ?

I guess that is a personal preference. I prefer \d because it's less
characters.

> 
> Do we need to be case-insensitive here, or could we standardize on one
> option and stick to it ?

At the point of writing that (and I would say still now) I didn't have a
full overview on the tuning datasets that are floating out there and
that are expected to work. So I didn't question that and just kept it
compatible. I don't have a problem with case-insensitive here (maybe we
should just do a lower() on the whole string before parsing). If we want
to decide for a casing I (as a programmer) would prefer the lower k, but
the official SI unit is K...grrr

> 
> > +        if r:
> > +            color = int(r.group(1))
> > +            continue
> > +        r = re.match(r'(\d+)[lLuU]', part)
> 
> Same here, and additionally, do we need to support both L and U ? Is
> there a difference ?

Good question. As above I wanted to stay compatible, because there was
no other defined standard (Or I wasn't aware of it). If we go for SI
units it would be lx...

Writing all that I think I'd lean for case-insensistive and SI-units (So
lx instead of l). The u still tbd.

> 
> > +        if r:
> > +            lux = int(r.group(1))
> > +
> > +    if color is None:
> > +        logger.error(f'The file name of {fn.name} does not contain a color temperature')
> 
>         logger.error(f'The file name "{fn.name}" does not contain a color temperature')
> 
> Same below.

This and the color_temperature above are fixed locally.

Regards,
Stefan

> 
> > +
> > +    if lux is None and lsc_only is False:
> > +        logger.error(f'The file name of {fn.name} must either contain alsc or a lux level')
> >  
> >      return color, lux, lsc_only
> >  
> 
> -- 
> Regards,
> 
> Laurent Pinchart
Laurent Pinchart July 2, 2024, 10:09 a.m. UTC | #3
On Mon, Jul 01, 2024 at 05:31:33PM +0200, Stefan Klug wrote:
> Hi Laurent,
> 
> On Sat, Jun 29, 2024 at 02:05:54AM +0300, Laurent Pinchart wrote:
> > Hi Stefan,
> > 
> > Thank you for the patch.
> > 
> > On Fri, Jun 28, 2024 at 12:47:02PM +0200, Stefan Klug wrote:
> > > In the tuning datasets, the files had names like
> > > 'imx335_1600l_3000k_1.dng'. That failed on the old filename parsing
> > > function. As there is no need to dictate the order of the tags, split
> > > the big regex into chunks and parse them one by one. This also makes
> > > the code easier to digest.
> > 
> > Is there a risk we'll later add tags we can't differentiate without
> > considering their position ? I suppose we can decide how to format them,
> > so it should be fine ?
> 
> Yes, I guess the risk is limited. The only thing I see right now is the
> addition of a blc tag. But that works nicely with the scheme.
> 
> > > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>
> > > Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
> > > ---
> > >  utils/tuning/libtuning/utils.py | 30 ++++++++++++++++++++++--------
> > >  1 file changed, 22 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/utils/tuning/libtuning/utils.py b/utils/tuning/libtuning/utils.py
> > > index 178c6957c581..00cf5a57512f 100644
> > > --- a/utils/tuning/libtuning/utils.py
> > > +++ b/utils/tuning/libtuning/utils.py
> > > @@ -43,14 +43,28 @@ def _list_image_files(directory):
> > >  
> > >  
> > >  def _parse_image_filename(fn: Path):
> > > -    result = re.search(r'^(alsc_)?(\d+)[kK]_(\d+)?[lLuU]?.\w{3,4}$', fn.name)
> > > -    if result is None:
> > > -        logger.error(f'The file name of {fn.name} is incorrectly formatted')
> > > -        return None, None, None
> > > -
> > > -    color = int(result.group(2))
> > > -    lsc_only = result.group(1) is not None
> > > -    lux = None if lsc_only else int(result.group(3))
> > > +    lsc_only = False
> > > +    color = None
> > 
> > color_temperature ?
> 
> Yes, I tried to modify as little as possible. Maybe too little :-)
> 
> > 
> > > +    lux = None
> > > +
> > > +    parts = fn.stem.split('_')
> > > +    for part in parts:
> > > +        if part == 'alsc':
> > 
> > Should we rename this to 'lsc' at some point ? The *adaptive* LSC is
> > specific to Raspberry Pi.
> > 
> > > +            lsc_only = True
> > > +            continue
> > 
> > Maybe a blank line here and after the next continue to let the code
> > breathe a bit more.
> > 
> > > +        r = re.match(r'(\d+)[kK]', part)
> > 
> > Would
> > 
> >         r = re.match(r'([0-9]+)[kK]', part)
> > 
> > be more readable ?
> 
> I guess that is a personal preference. I prefer \d because it's less
> characters.
> 
> > Do we need to be case-insensitive here, or could we standardize on one
> > option and stick to it ?
> 
> At the point of writing that (and I would say still now) I didn't have a
> full overview on the tuning datasets that are floating out there and
> that are expected to work. So I didn't question that and just kept it
> compatible. I don't have a problem with case-insensitive here (maybe we
> should just do a lower() on the whole string before parsing). If we want
> to decide for a casing I (as a programmer) would prefer the lower k, but
> the official SI unit is K...grrr
> 
> > > +        if r:
> > > +            color = int(r.group(1))
> > > +            continue
> > > +        r = re.match(r'(\d+)[lLuU]', part)
> > 
> > Same here, and additionally, do we need to support both L and U ? Is
> > there a difference ?
> 
> Good question. As above I wanted to stay compatible, because there was
> no other defined standard (Or I wasn't aware of it). If we go for SI
> units it would be lx...
> 
> Writing all that I think I'd lean for case-insensistive and SI-units (So
> lx instead of l). The u still tbd.

SI unit sound good.

The reason I like case-sensitive is mostly consistency, as it avoids
proliferation of files with different cases, which can mess up sorting.
It's a minor issue in the end, so up to you.

> > > +        if r:
> > > +            lux = int(r.group(1))
> > > +
> > > +    if color is None:
> > > +        logger.error(f'The file name of {fn.name} does not contain a color temperature')
> > 
> >         logger.error(f'The file name "{fn.name}" does not contain a color temperature')
> > 
> > Same below.
> 
> This and the color_temperature above are fixed locally.
> 
> > > +
> > > +    if lux is None and lsc_only is False:
> > > +        logger.error(f'The file name of {fn.name} must either contain alsc or a lux level')
> > >  
> > >      return color, lux, lsc_only
> > >

Patch
diff mbox series

diff --git a/utils/tuning/libtuning/utils.py b/utils/tuning/libtuning/utils.py
index 178c6957c581..00cf5a57512f 100644
--- a/utils/tuning/libtuning/utils.py
+++ b/utils/tuning/libtuning/utils.py
@@ -43,14 +43,28 @@  def _list_image_files(directory):
 
 
 def _parse_image_filename(fn: Path):
-    result = re.search(r'^(alsc_)?(\d+)[kK]_(\d+)?[lLuU]?.\w{3,4}$', fn.name)
-    if result is None:
-        logger.error(f'The file name of {fn.name} is incorrectly formatted')
-        return None, None, None
-
-    color = int(result.group(2))
-    lsc_only = result.group(1) is not None
-    lux = None if lsc_only else int(result.group(3))
+    lsc_only = False
+    color = None
+    lux = None
+
+    parts = fn.stem.split('_')
+    for part in parts:
+        if part == 'alsc':
+            lsc_only = True
+            continue
+        r = re.match(r'(\d+)[kK]', part)
+        if r:
+            color = int(r.group(1))
+            continue
+        r = re.match(r'(\d+)[lLuU]', part)
+        if r:
+            lux = int(r.group(1))
+
+    if color is None:
+        logger.error(f'The file name of {fn.name} does not contain a color temperature')
+
+    if lux is None and lsc_only is False:
+        logger.error(f'The file name of {fn.name} must either contain alsc or a lux level')
 
     return color, lux, lsc_only