[2/2] bitbake-diffsigs: make finding of changed signatures more robust

Message ID 20220413063534.799526-3-adriaan.schmidt@siemens.com
State Superseded, archived
Headers show
Series Sstate maintenance script | expand

Commit Message

Schmidt, Adriaan April 12, 2022, 10:35 p.m. UTC
In `runtaskhashes`, the keys contain the absolute paths to the recipe. When
working with shared sstate caches (where these absolute paths can be different)
we see that compare_sigfiles does not identifiy a changed hash of a dependent
task as "changed", but instead as "removed"&"added", preventing the function
from recursing and continuing the comparison.

By calling `clean_basepaths` before comparing the `runtaskhashes` dicts, we
avoid this.

Submitted upstream: https://lists.openembedded.org/g/bitbake-devel/message/13603

Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>
---
 bitbake/lib/bb/siggen.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Henning Schild April 13, 2022, 12:19 a.m. UTC | #1
Am Wed, 13 Apr 2022 08:35:34 +0200
schrieb Adriaan Schmidt <adriaan.schmidt@siemens.com>:

> In `runtaskhashes`, the keys contain the absolute paths to the
> recipe. When working with shared sstate caches (where these absolute
> paths can be different) we see that compare_sigfiles does not
> identifiy a changed hash of a dependent task as "changed", but
> instead as "removed"&"added", preventing the function from recursing
> and continuing the comparison.
> 
> By calling `clean_basepaths` before comparing the `runtaskhashes`
> dicts, we avoid this.
> 
> Submitted upstream:
> https://lists.openembedded.org/g/bitbake-devel/message/13603

Submitted does not count, we will have to wait till it is merged and
then we can think about a backport.

How important is that? If it is just "nice to have" i think we should
wait and not even do the backport once merged into bitbake.

Henning
 
> Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>
> ---
>  bitbake/lib/bb/siggen.py | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> index 0d88c6ec..8b23fd04 100644
> --- a/bitbake/lib/bb/siggen.py
> +++ b/bitbake/lib/bb/siggen.py
> @@ -944,8 +944,8 @@ def compare_sigfiles(a, b, recursecb=None,
> color=False, collapsed=False): 
>  
>      if 'runtaskhashes' in a_data and 'runtaskhashes' in b_data:
> -        a = a_data['runtaskhashes']
> -        b = b_data['runtaskhashes']
> +        a = clean_basepaths(a_data['runtaskhashes'])
> +        b = clean_basepaths(b_data['runtaskhashes'])
>          changed, added, removed = dict_diff(a, b)
>          if added:
>              for dep in added:
> @@ -956,7 +956,7 @@ def compare_sigfiles(a, b, recursecb=None,
> color=False, collapsed=False): #output.append("Dependency on task %s
> was replaced by %s with same hash" % (dep, bdep)) bdep_found = True
>                  if not bdep_found:
> -
> output.append(color_format("{color_title}Dependency on task %s was
> added{color_default} with hash %s") % (clean_basepath(dep), b[dep]))
> +
> output.append(color_format("{color_title}Dependency on task %s was
> added{color_default} with hash %s") % (dep, b[dep])) if removed: for
> dep in removed: adep_found = False
> @@ -966,11 +966,11 @@ def compare_sigfiles(a, b, recursecb=None,
> color=False, collapsed=False): #output.append("Dependency on task %s
> was replaced by %s with same hash" % (adep, dep)) adep_found = True
>                  if not adep_found:
> -
> output.append(color_format("{color_title}Dependency on task %s was
> removed{color_default} with hash %s") % (clean_basepath(dep), a[dep]))
> +
> output.append(color_format("{color_title}Dependency on task %s was
> removed{color_default} with hash %s") % (dep, a[dep])) if changed:
> for dep in changed: if not collapsed:
> -                    output.append(color_format("{color_title}Hash
> for dependent task %s changed{color_default} from %s to %s") %
> (clean_basepath(dep), a[dep], b[dep]))
> +                    output.append(color_format("{color_title}Hash
> for dependent task %s changed{color_default} from %s to %s") % (dep,
> a[dep], b[dep])) if callable(recursecb): recout = recursecb(dep,
> a[dep], b[dep]) if recout:
Schmidt, Adriaan April 14, 2022, 12:03 a.m. UTC | #2
Schild, Henning, Mittwoch, 13. April 2022 10:20
> Am Wed, 13 Apr 2022 08:35:34 +0200
> schrieb Adriaan Schmidt <adriaan.schmidt@siemens.com>:
> 
> > In `runtaskhashes`, the keys contain the absolute paths to the
> > recipe. When working with shared sstate caches (where these absolute
> > paths can be different) we see that compare_sigfiles does not
> > identifiy a changed hash of a dependent task as "changed", but
> > instead as "removed"&"added", preventing the function from recursing
> > and continuing the comparison.
> >
> > By calling `clean_basepaths` before comparing the `runtaskhashes`
> > dicts, we avoid this.
> >
> > Submitted upstream:
> > https://lists.openembedded.org/g/bitbake-devel/message/13603
> 
> Submitted does not count, we will have to wait till it is merged and
> then we can think about a backport.

It's now on bitbake's `master-next`:

https://git.openembedded.org/bitbake/commit/?h=master-next&id=01b2b300901dc8b93973318127f8eb3c29b9a168

> How important is that? If it is just "nice to have" i think we should
> wait and not even do the backport once merged into bitbake.

In our specific setup on a GitLab K8s CI runner, we need this to get
useful analysis. Other setups might be fine without.

Adriaan

> Henning
> 
> > Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>
> > ---
> >  bitbake/lib/bb/siggen.py | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> > index 0d88c6ec..8b23fd04 100644
> > --- a/bitbake/lib/bb/siggen.py
> > +++ b/bitbake/lib/bb/siggen.py
> > @@ -944,8 +944,8 @@ def compare_sigfiles(a, b, recursecb=None,
> > color=False, collapsed=False):
> >
> >      if 'runtaskhashes' in a_data and 'runtaskhashes' in b_data:
> > -        a = a_data['runtaskhashes']
> > -        b = b_data['runtaskhashes']
> > +        a = clean_basepaths(a_data['runtaskhashes'])
> > +        b = clean_basepaths(b_data['runtaskhashes'])
> >          changed, added, removed = dict_diff(a, b)
> >          if added:
> >              for dep in added:
> > @@ -956,7 +956,7 @@ def compare_sigfiles(a, b, recursecb=None,
> > color=False, collapsed=False): #output.append("Dependency on task %s
> > was replaced by %s with same hash" % (dep, bdep)) bdep_found = True
> >                  if not bdep_found:
> > -
> > output.append(color_format("{color_title}Dependency on task %s was
> > added{color_default} with hash %s") % (clean_basepath(dep), b[dep]))
> > +
> > output.append(color_format("{color_title}Dependency on task %s was
> > added{color_default} with hash %s") % (dep, b[dep])) if removed: for
> > dep in removed: adep_found = False
> > @@ -966,11 +966,11 @@ def compare_sigfiles(a, b, recursecb=None,
> > color=False, collapsed=False): #output.append("Dependency on task %s
> > was replaced by %s with same hash" % (adep, dep)) adep_found = True
> >                  if not adep_found:
> > -
> > output.append(color_format("{color_title}Dependency on task %s was
> > removed{color_default} with hash %s") % (clean_basepath(dep), a[dep]))
> > +
> > output.append(color_format("{color_title}Dependency on task %s was
> > removed{color_default} with hash %s") % (dep, a[dep])) if changed:
> > for dep in changed: if not collapsed:
> > -                    output.append(color_format("{color_title}Hash
> > for dependent task %s changed{color_default} from %s to %s") %
> > (clean_basepath(dep), a[dep], b[dep]))
> > +                    output.append(color_format("{color_title}Hash
> > for dependent task %s changed{color_default} from %s to %s") % (dep,
> > a[dep], b[dep])) if callable(recursecb): recout = recursecb(dep,
> > a[dep], b[dep]) if recout:
Henning Schild April 14, 2022, 12:22 a.m. UTC | #3
Am Thu, 14 Apr 2022 10:03:39 +0200
schrieb "Schmidt, Adriaan (T CED SES-DE)" <adriaan.schmidt@siemens.com>:

> Schild, Henning, Mittwoch, 13. April 2022 10:20
> > Am Wed, 13 Apr 2022 08:35:34 +0200
> > schrieb Adriaan Schmidt <adriaan.schmidt@siemens.com>:
> >   
> > > In `runtaskhashes`, the keys contain the absolute paths to the
> > > recipe. When working with shared sstate caches (where these
> > > absolute paths can be different) we see that compare_sigfiles
> > > does not identifiy a changed hash of a dependent task as
> > > "changed", but instead as "removed"&"added", preventing the
> > > function from recursing and continuing the comparison.
> > >
> > > By calling `clean_basepaths` before comparing the `runtaskhashes`
> > > dicts, we avoid this.
> > >
> > > Submitted upstream:
> > > https://lists.openembedded.org/g/bitbake-devel/message/13603  
> > 
> > Submitted does not count, we will have to wait till it is merged and
> > then we can think about a backport.  
> 
> It's now on bitbake's `master-next`:
> 
> https://git.openembedded.org/bitbake/commit/?h=master-next&id=01b2b300901dc8b93973318127f8eb3c29b9a168

Cool. I guess the commit message should be changed to include that
upstream sha.

Henning

> > How important is that? If it is just "nice to have" i think we
> > should wait and not even do the backport once merged into bitbake.  
> 
> In our specific setup on a GitLab K8s CI runner, we need this to get
> useful analysis. Other setups might be fine without.
> 
> Adriaan
> 
> > Henning
> >   
> > > Signed-off-by: Adriaan Schmidt <adriaan.schmidt@siemens.com>
> > > ---
> > >  bitbake/lib/bb/siggen.py | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> > > index 0d88c6ec..8b23fd04 100644
> > > --- a/bitbake/lib/bb/siggen.py
> > > +++ b/bitbake/lib/bb/siggen.py
> > > @@ -944,8 +944,8 @@ def compare_sigfiles(a, b, recursecb=None,
> > > color=False, collapsed=False):
> > >
> > >      if 'runtaskhashes' in a_data and 'runtaskhashes' in b_data:
> > > -        a = a_data['runtaskhashes']
> > > -        b = b_data['runtaskhashes']
> > > +        a = clean_basepaths(a_data['runtaskhashes'])
> > > +        b = clean_basepaths(b_data['runtaskhashes'])
> > >          changed, added, removed = dict_diff(a, b)
> > >          if added:
> > >              for dep in added:
> > > @@ -956,7 +956,7 @@ def compare_sigfiles(a, b, recursecb=None,
> > > color=False, collapsed=False): #output.append("Dependency on task
> > > %s was replaced by %s with same hash" % (dep, bdep)) bdep_found =
> > > True if not bdep_found:
> > > -
> > > output.append(color_format("{color_title}Dependency on task %s was
> > > added{color_default} with hash %s") % (clean_basepath(dep),
> > > b[dep])) +
> > > output.append(color_format("{color_title}Dependency on task %s was
> > > added{color_default} with hash %s") % (dep, b[dep])) if removed:
> > > for dep in removed: adep_found = False
> > > @@ -966,11 +966,11 @@ def compare_sigfiles(a, b, recursecb=None,
> > > color=False, collapsed=False): #output.append("Dependency on task
> > > %s was replaced by %s with same hash" % (adep, dep)) adep_found =
> > > True if not adep_found:
> > > -
> > > output.append(color_format("{color_title}Dependency on task %s was
> > > removed{color_default} with hash %s") % (clean_basepath(dep),
> > > a[dep])) +
> > > output.append(color_format("{color_title}Dependency on task %s was
> > > removed{color_default} with hash %s") % (dep, a[dep])) if changed:
> > > for dep in changed: if not collapsed:
> > > -                    output.append(color_format("{color_title}Hash
> > > for dependent task %s changed{color_default} from %s to %s") %
> > > (clean_basepath(dep), a[dep], b[dep]))
> > > +                    output.append(color_format("{color_title}Hash
> > > for dependent task %s changed{color_default} from %s to %s") %
> > > (dep, a[dep], b[dep])) if callable(recursecb): recout =
> > > recursecb(dep, a[dep], b[dep]) if recout:  
>

Patch

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index 0d88c6ec..8b23fd04 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -944,8 +944,8 @@  def compare_sigfiles(a, b, recursecb=None, color=False, collapsed=False):
 
 
     if 'runtaskhashes' in a_data and 'runtaskhashes' in b_data:
-        a = a_data['runtaskhashes']
-        b = b_data['runtaskhashes']
+        a = clean_basepaths(a_data['runtaskhashes'])
+        b = clean_basepaths(b_data['runtaskhashes'])
         changed, added, removed = dict_diff(a, b)
         if added:
             for dep in added:
@@ -956,7 +956,7 @@  def compare_sigfiles(a, b, recursecb=None, color=False, collapsed=False):
                             #output.append("Dependency on task %s was replaced by %s with same hash" % (dep, bdep))
                             bdep_found = True
                 if not bdep_found:
-                    output.append(color_format("{color_title}Dependency on task %s was added{color_default} with hash %s") % (clean_basepath(dep), b[dep]))
+                    output.append(color_format("{color_title}Dependency on task %s was added{color_default} with hash %s") % (dep, b[dep]))
         if removed:
             for dep in removed:
                 adep_found = False
@@ -966,11 +966,11 @@  def compare_sigfiles(a, b, recursecb=None, color=False, collapsed=False):
                             #output.append("Dependency on task %s was replaced by %s with same hash" % (adep, dep))
                             adep_found = True
                 if not adep_found:
-                    output.append(color_format("{color_title}Dependency on task %s was removed{color_default} with hash %s") % (clean_basepath(dep), a[dep]))
+                    output.append(color_format("{color_title}Dependency on task %s was removed{color_default} with hash %s") % (dep, a[dep]))
         if changed:
             for dep in changed:
                 if not collapsed:
-                    output.append(color_format("{color_title}Hash for dependent task %s changed{color_default} from %s to %s") % (clean_basepath(dep), a[dep], b[dep]))
+                    output.append(color_format("{color_title}Hash for dependent task %s changed{color_default} from %s to %s") % (dep, a[dep], b[dep]))
                 if callable(recursecb):
                     recout = recursecb(dep, a[dep], b[dep])
                     if recout: