[1/3] utils: checkstyle.py: Rework commit message parsing
diff mbox series

Message ID 20240810005840.20841-1-laurent.pinchart@ideasonboard.com
State Accepted
Commit eeb435bbda98da9f03d56e23722e4ac741d6107c
Headers show
Series
  • [1/3] utils: checkstyle.py: Rework commit message parsing
Related show

Commit Message

Laurent Pinchart Aug. 10, 2024, 12:58 a.m. UTC
When parsing commit messages, the Commit class tries to optimize the
process by invoking git-show once only, extracting both the commit
author, title and modified files in one go. As a result, the derived
Amendment class needs to implement the commit parsing separately, as the
modified files need to be extracted differently for amendments (and the
commit ID also needs to be retrieved differently). Furthermore, because
of the list of named files, extracting the trailers needs to invoke
git-show separately.

Improve the situation by reworking the commit message parsing in three
steps. In the first step, use git-show to extract the commit ID, author,
title and body. In the second step, invoke git-interpret-trailers to
extract the trailers from the body that was previously extracted. The
third and final step extracts the list of modified files, using
different methods for regular commits and amendments.

This allows sharing code for the first two steps between the Commit and
Amendment classes, making the code simpler. The Commit class still
invokes git three times, while the Amendment class runs it three times
instead of four, improving performance slightly.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 utils/checkstyle.py | 60 +++++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 27 deletions(-)


base-commit: 62760bd2605a83e663b9003244ff42f8946f8955

Comments

Milan Zamazal Aug. 12, 2024, 6:44 a.m. UTC | #1
Hi Laurent,

thank you for the improvement.

Laurent Pinchart <laurent.pinchart@ideasonboard.com> writes:

> When parsing commit messages, the Commit class tries to optimize the
> process by invoking git-show once only, extracting both the commit
> author, title and modified files in one go. As a result, the derived
> Amendment class needs to implement the commit parsing separately, as the
> modified files need to be extracted differently for amendments (and the
> commit ID also needs to be retrieved differently). Furthermore, because
> of the list of named files, extracting the trailers needs to invoke
> git-show separately.
>
> Improve the situation by reworking the commit message parsing in three
> steps. In the first step, use git-show to extract the commit ID, author,
> title and body. In the second step, invoke git-interpret-trailers to
> extract the trailers from the body that was previously extracted. The
> third and final step extracts the list of modified files, using
> different methods for regular commits and amendments.
>
> This allows sharing code for the first two steps between the Commit and
> Amendment classes, making the code simpler. The Commit class still
> invokes git three times, while the Amendment class runs it three times
> instead of four, improving performance slightly.
>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

Reviewed-by: Milan Zamazal <mzamazal@redhat.com>

> ---
>  utils/checkstyle.py | 60 +++++++++++++++++++++++++--------------------
>  1 file changed, 33 insertions(+), 27 deletions(-)
>
> diff --git a/utils/checkstyle.py b/utils/checkstyle.py
> index dae5d518920a..aa0731abdb5a 100755
> --- a/utils/checkstyle.py
> +++ b/utils/checkstyle.py
> @@ -211,33 +211,42 @@ class CommitFile:
>  
>  class Commit:
>      def __init__(self, commit):
> -        self.commit = commit
> +        self._commit = commit
>          self._author = None
>          self._trailers = []
>          self._parse()
>  
> -    def _parse_trailers(self):
> -        proc_show = subprocess.run(['git', 'show', '--format=%b',
> -                                    '--no-patch', self.commit],
> -                                   stdout=subprocess.PIPE)
> +    def _parse_commit(self):
> +        # Get and parse the commit message.
> +        ret = subprocess.run(['git', 'show', '--format=%H%n%an <%ae>%n%s%n%b',
> +                              '--no-patch', self.commit],
> +                             stdout=subprocess.PIPE).stdout.decode('utf-8')
> +        lines = ret.splitlines()
> +
> +        self._commit = lines[0]
> +        self._author = lines[1]
> +        self._title = lines[2]
> +        self._body = lines[3:]
> +
> +        # Parse the trailers. Feed git-interpret-trailers with a full commit
> +        # message that includes both the title and the body, as it otherwise
> +        # fails to find trailers when the body contains trailers only.
> +        message = self._title + '\n\n' + '\n'.join(self._body)
>          trailers = subprocess.run(['git', 'interpret-trailers', '--parse'],
> -                                  input=proc_show.stdout,
> +                                  input=message.encode('utf-8'),
>                                    stdout=subprocess.PIPE).stdout.decode('utf-8')
>  
>          self._trailers = trailers.splitlines()
>  
>      def _parse(self):
> -        # Get the commit title and list of files.
> -        ret = subprocess.run(['git', 'show', '--format=%an <%ae>%n%s',
> -                              '--name-status', self.commit],
> +        self._parse_commit()
> +
> +        # Get the list of files. Use an empty format specifier to suppress the
> +        # commit message completely.
> +        ret = subprocess.run(['git', 'show', '--format=', '--name-status',
> +                              self.commit],
>                               stdout=subprocess.PIPE).stdout.decode('utf-8')
> -        lines = ret.splitlines()
> -
> -        self._author = lines[0]
> -        self._title = lines[1]
> -        self._files = [CommitFile(f) for f in lines[2:] if f]
> -
> -        self._parse_trailers()
> +        self._files = [CommitFile(f) for f in ret.splitlines()]
>  
>      def files(self, filter='AMR'):
>          return [f.filename for f in self._files if f.status in filter]
> @@ -246,6 +255,10 @@ class Commit:
>      def author(self):
>          return self._author
>  
> +    @property
> +    def commit(self):
> +        return self._commit
> +
>      @property
>      def title(self):
>          return self._title
> @@ -284,21 +297,14 @@ class StagedChanges(Commit):
>  
>  class Amendment(Commit):
>      def __init__(self):
> -        Commit.__init__(self, '')
> +        Commit.__init__(self, 'HEAD')
>  
>      def _parse(self):
> -        # Create a title using HEAD commit and parse the trailers.
> -        ret = subprocess.run(['git', 'show', '--format=%an <%ae>%n%H %s',
> -                             '--no-patch'],
> -                             stdout=subprocess.PIPE).stdout.decode('utf-8')
> -        lines = ret.splitlines()
> +        self._parse_commit()
>  
> -        self._author = lines[0]
> -        self._title = 'Amendment of ' + lines[1]
> +        self._title = f'Amendment of "{self.title}"'
>  
> -        self._parse_trailers()
> -
> -        # Extract the list of modified files
> +        # Extract the list of modified files.
>          ret = subprocess.run(['git', 'diff', '--staged', '--name-status', 'HEAD~'],
>                               stdout=subprocess.PIPE).stdout.decode('utf-8')
>          self._files = [CommitFile(f) for f in ret.splitlines()]
>
> base-commit: 62760bd2605a83e663b9003244ff42f8946f8955
Dan Scally Aug. 12, 2024, 12:42 p.m. UTC | #2
Hi Laurent

On 10/08/2024 01:58, Laurent Pinchart wrote:
> When parsing commit messages, the Commit class tries to optimize the
> process by invoking git-show once only, extracting both the commit
> author, title and modified files in one go. As a result, the derived
> Amendment class needs to implement the commit parsing separately, as the
> modified files need to be extracted differently for amendments (and the
> commit ID also needs to be retrieved differently). Furthermore, because
> of the list of named files, extracting the trailers needs to invoke
> git-show separately.
>
> Improve the situation by reworking the commit message parsing in three
> steps. In the first step, use git-show to extract the commit ID, author,
> title and body. In the second step, invoke git-interpret-trailers to
> extract the trailers from the body that was previously extracted. The
> third and final step extracts the list of modified files, using
> different methods for regular commits and amendments.
>
> This allows sharing code for the first two steps between the Commit and
> Amendment classes, making the code simpler. The Commit class still
> invokes git three times, while the Amendment class runs it three times
> instead of four, improving performance slightly.
>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> ---


Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>

>   utils/checkstyle.py | 60 +++++++++++++++++++++++++--------------------
>   1 file changed, 33 insertions(+), 27 deletions(-)
>
> diff --git a/utils/checkstyle.py b/utils/checkstyle.py
> index dae5d518920a..aa0731abdb5a 100755
> --- a/utils/checkstyle.py
> +++ b/utils/checkstyle.py
> @@ -211,33 +211,42 @@ class CommitFile:
>   
>   class Commit:
>       def __init__(self, commit):
> -        self.commit = commit
> +        self._commit = commit
>           self._author = None
>           self._trailers = []
>           self._parse()
>   
> -    def _parse_trailers(self):
> -        proc_show = subprocess.run(['git', 'show', '--format=%b',
> -                                    '--no-patch', self.commit],
> -                                   stdout=subprocess.PIPE)
> +    def _parse_commit(self):
> +        # Get and parse the commit message.
> +        ret = subprocess.run(['git', 'show', '--format=%H%n%an <%ae>%n%s%n%b',
> +                              '--no-patch', self.commit],
I should read the whole commit when I get confused; it took me way too long to spot that this now 
references the property.
> +                             stdout=subprocess.PIPE).stdout.decode('utf-8')
> +        lines = ret.splitlines()
> +
> +        self._commit = lines[0]
> +        self._author = lines[1]
> +        self._title = lines[2]
> +        self._body = lines[3:]
> +
> +        # Parse the trailers. Feed git-interpret-trailers with a full commit
> +        # message that includes both the title and the body, as it otherwise
> +        # fails to find trailers when the body contains trailers only.
> +        message = self._title + '\n\n' + '\n'.join(self._body)
>           trailers = subprocess.run(['git', 'interpret-trailers', '--parse'],
> -                                  input=proc_show.stdout,
> +                                  input=message.encode('utf-8'),
>                                     stdout=subprocess.PIPE).stdout.decode('utf-8')
>   
>           self._trailers = trailers.splitlines()
>   
>       def _parse(self):
> -        # Get the commit title and list of files.
> -        ret = subprocess.run(['git', 'show', '--format=%an <%ae>%n%s',
> -                              '--name-status', self.commit],
> +        self._parse_commit()
> +
> +        # Get the list of files. Use an empty format specifier to suppress the
> +        # commit message completely.
> +        ret = subprocess.run(['git', 'show', '--format=', '--name-status',
> +                              self.commit],
>                                stdout=subprocess.PIPE).stdout.decode('utf-8')
> -        lines = ret.splitlines()
> -
> -        self._author = lines[0]
> -        self._title = lines[1]
> -        self._files = [CommitFile(f) for f in lines[2:] if f]
> -
> -        self._parse_trailers()
> +        self._files = [CommitFile(f) for f in ret.splitlines()]
>   
>       def files(self, filter='AMR'):
>           return [f.filename for f in self._files if f.status in filter]
> @@ -246,6 +255,10 @@ class Commit:
>       def author(self):
>           return self._author
>   
> +    @property
> +    def commit(self):
> +        return self._commit
> +
>       @property
>       def title(self):
>           return self._title
> @@ -284,21 +297,14 @@ class StagedChanges(Commit):
>   
>   class Amendment(Commit):
>       def __init__(self):
> -        Commit.__init__(self, '')
> +        Commit.__init__(self, 'HEAD')
>   
>       def _parse(self):
> -        # Create a title using HEAD commit and parse the trailers.
> -        ret = subprocess.run(['git', 'show', '--format=%an <%ae>%n%H %s',
> -                             '--no-patch'],
> -                             stdout=subprocess.PIPE).stdout.decode('utf-8')
> -        lines = ret.splitlines()
> +        self._parse_commit()
>   
> -        self._author = lines[0]
> -        self._title = 'Amendment of ' + lines[1]
> +        self._title = f'Amendment of "{self.title}"'
>   
> -        self._parse_trailers()
> -
> -        # Extract the list of modified files
> +        # Extract the list of modified files.
>           ret = subprocess.run(['git', 'diff', '--staged', '--name-status', 'HEAD~'],
>                                stdout=subprocess.PIPE).stdout.decode('utf-8')
>           self._files = [CommitFile(f) for f in ret.splitlines()]
>
> base-commit: 62760bd2605a83e663b9003244ff42f8946f8955

Patch
diff mbox series

diff --git a/utils/checkstyle.py b/utils/checkstyle.py
index dae5d518920a..aa0731abdb5a 100755
--- a/utils/checkstyle.py
+++ b/utils/checkstyle.py
@@ -211,33 +211,42 @@  class CommitFile:
 
 class Commit:
     def __init__(self, commit):
-        self.commit = commit
+        self._commit = commit
         self._author = None
         self._trailers = []
         self._parse()
 
-    def _parse_trailers(self):
-        proc_show = subprocess.run(['git', 'show', '--format=%b',
-                                    '--no-patch', self.commit],
-                                   stdout=subprocess.PIPE)
+    def _parse_commit(self):
+        # Get and parse the commit message.
+        ret = subprocess.run(['git', 'show', '--format=%H%n%an <%ae>%n%s%n%b',
+                              '--no-patch', self.commit],
+                             stdout=subprocess.PIPE).stdout.decode('utf-8')
+        lines = ret.splitlines()
+
+        self._commit = lines[0]
+        self._author = lines[1]
+        self._title = lines[2]
+        self._body = lines[3:]
+
+        # Parse the trailers. Feed git-interpret-trailers with a full commit
+        # message that includes both the title and the body, as it otherwise
+        # fails to find trailers when the body contains trailers only.
+        message = self._title + '\n\n' + '\n'.join(self._body)
         trailers = subprocess.run(['git', 'interpret-trailers', '--parse'],
-                                  input=proc_show.stdout,
+                                  input=message.encode('utf-8'),
                                   stdout=subprocess.PIPE).stdout.decode('utf-8')
 
         self._trailers = trailers.splitlines()
 
     def _parse(self):
-        # Get the commit title and list of files.
-        ret = subprocess.run(['git', 'show', '--format=%an <%ae>%n%s',
-                              '--name-status', self.commit],
+        self._parse_commit()
+
+        # Get the list of files. Use an empty format specifier to suppress the
+        # commit message completely.
+        ret = subprocess.run(['git', 'show', '--format=', '--name-status',
+                              self.commit],
                              stdout=subprocess.PIPE).stdout.decode('utf-8')
-        lines = ret.splitlines()
-
-        self._author = lines[0]
-        self._title = lines[1]
-        self._files = [CommitFile(f) for f in lines[2:] if f]
-
-        self._parse_trailers()
+        self._files = [CommitFile(f) for f in ret.splitlines()]
 
     def files(self, filter='AMR'):
         return [f.filename for f in self._files if f.status in filter]
@@ -246,6 +255,10 @@  class Commit:
     def author(self):
         return self._author
 
+    @property
+    def commit(self):
+        return self._commit
+
     @property
     def title(self):
         return self._title
@@ -284,21 +297,14 @@  class StagedChanges(Commit):
 
 class Amendment(Commit):
     def __init__(self):
-        Commit.__init__(self, '')
+        Commit.__init__(self, 'HEAD')
 
     def _parse(self):
-        # Create a title using HEAD commit and parse the trailers.
-        ret = subprocess.run(['git', 'show', '--format=%an <%ae>%n%H %s',
-                             '--no-patch'],
-                             stdout=subprocess.PIPE).stdout.decode('utf-8')
-        lines = ret.splitlines()
+        self._parse_commit()
 
-        self._author = lines[0]
-        self._title = 'Amendment of ' + lines[1]
+        self._title = f'Amendment of "{self.title}"'
 
-        self._parse_trailers()
-
-        # Extract the list of modified files
+        # Extract the list of modified files.
         ret = subprocess.run(['git', 'diff', '--staged', '--name-status', 'HEAD~'],
                              stdout=subprocess.PIPE).stdout.decode('utf-8')
         self._files = [CommitFile(f) for f in ret.splitlines()]