[v3,1/1] image.bbclass: fix non-reproducible file time-stamps inside rootfs

Message ID 20230105151241.21348-2-venkata.pyla@toshiba-tsip.com
State Superseded, archived
Headers show
Series Fix for reproducible build issue | expand

Commit Message

venkata.pyla@toshiba-tsip.com Jan. 5, 2023, 3:12 p.m. UTC
From: venkata pyla <venkata.pyla@toshiba-tsip.com>

As part of reproducible-build work, the rootfs images generated on same
source should be identical between two builds.

In this commit it tries to solve one of the non-reproducible problem
i.e. the rootfs file time-stamps generated during build time are not
reproducible, it uses one of the solution provided in the debian
live-build image project (refer [1]), it fixes by finding all the
files/folders that are gernerated newly and set the time-stamp provided
by `SOURCE_DATE_EPOCH` environment variable.

[1] https://salsa.debian.org/live-team/live-build/-/merge_requests/218

Signed-off-by: venkata pyla <venkata.pyla@toshiba-tsip.com>
---
 meta-isar/conf/local.conf.sample | 10 ++++++++++
 meta/classes/image.bbclass       | 10 ++++++++++
 2 files changed, 20 insertions(+)

Comments

MOESSBAUER, Felix Jan. 6, 2023, 9:45 a.m. UTC | #1
On Thu, 2023-01-05 at 20:42 +0530, venkata.pyla@toshiba-tsip.com wrote:
> From: venkata pyla <venkata.pyla@toshiba-tsip.com>
> 
> As part of reproducible-build work, the rootfs images generated on
> same
> source should be identical between two builds.
> 
> In this commit it tries to solve one of the non-reproducible problem
> i.e. the rootfs file time-stamps generated during build time are not
> reproducible, it uses one of the solution provided in the debian
> live-build image project (refer [1]), it fixes by finding all the
> files/folders that are gernerated newly and set the time-stamp
> provided
> by `SOURCE_DATE_EPOCH` environment variable.
> 
> [1] 
> https://salsa.debian.org/live-team/live-build/-/merge_requests/218
> 
> Signed-off-by: venkata pyla <venkata.pyla@toshiba-tsip.com>
> ---
>  meta-isar/conf/local.conf.sample | 10 ++++++++++
>  meta/classes/image.bbclass       | 10 ++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/meta-isar/conf/local.conf.sample b/meta-
> isar/conf/local.conf.sample
> index 57d0620..3c4a473 100644
> --- a/meta-isar/conf/local.conf.sample
> +++ b/meta-isar/conf/local.conf.sample
> @@ -255,3 +255,13 @@ USER_isar[flags] += "clear-text-password"
>  #CCACHE_TOP_DIR ?= "${TMPDIR}/ccache"
>  # Enable ccache debug mode
>  #CCACHE_DEBUG = "1"
> +
> +# Uncommnet and add value to it to build images reproducibly
> +#
> +# The value for `SOURCE_DATE_EPOCH` should be latest source change
> time in
> +# seconds since the Epoch.
> +# Git repository users can use value from 'git log -1 --pretty=%ct'
> +# Non git repository users can use value from 'stat -c%Y ChangeLog'
> +# To know more details about this variable and how to set the value
> refer below
> +# https://reproducible-builds.org/docs/source-date-epoch/
> +#SOURCE_DATE_EPOCH =
> diff --git a/meta/classes/image.bbclass b/meta/classes/image.bbclass
> index 813e1f3..8371ecd 100644
> --- a/meta/classes/image.bbclass
> +++ b/meta/classes/image.bbclass
> @@ -431,6 +431,16 @@ do_rootfs_finalize() {
>  
>          rm -f "${ROOTFSDIR}/etc/apt/sources-list"
>  EOSUDO
> +
> +    # Set same time-stamps to the newly generated file/folders in
> the
> +    # rootfs image for the purpose of reproducible builds.
> +    test ! -z "${SOURCE_DATE_EPOCH}" && \
> +        sudo find ${ROOTFSDIR} -newermt \
> +            "$(date -d@${SOURCE_DATE_EPOCH} '+%Y-%m-%d %H:%M:%S')" \
> +            -printf "%y %p\n" \
> +            -exec touch '{}' -h -d@${SOURCE_DATE_EPOCH} ';' >
> ${DEPLOY_DIR_IMAGE}/files.modified_timestamps && \
> +            bbwarn "$(cat
> ${DEPLOY_DIR_IMAGE}/files.modified_timestamps) \nModified above file
> timestamps to build image reproducibly"
> +

Hi, I just tested this code and found the following issues:

This does not rebuild cleanly, as in general the do_rootfs_finalize
cannot be re-executed. IMHO this is an ISAR bug and not a problem of
your patch:

do_rootfs_finalize
mv: cannot stat '/build/tmp/work/debian-bookworm-amd64/img-<...>/1.0-
r0/rootfs/etc/apt/sources-list': No such file or directory

Second, it is a bit hard to use with automatic data from git.
I solved it the following way, which works so far:

SOURCE_DATE_EPOCH := "${@ bb.process.run('git -C ${LAYERDIR_project}
log -1 --pretty=\%ct')[0].strip() }"

One problem here is, that this can hardly be generalized as we have to
feed in the path of the main layer - which is not known by ISAR.

The list of touched files is quite long, but basically most of it falls
in the following classes:

- PKG info: /var/lib/dpkg/info/
- Pycache: /usr/lib/python3.10/curses/__pycache__ This anyways needs to
be further investigated as my understanding is that we do not want to
distribute the pycache
- directories and symlinks:  These always have to be fixed as the
creation date depends on the install order of packages. I would not
warn on these

Best regards,
Felix

>  }
>  addtask rootfs_finalize before do_rootfs after do_rootfs_postprocess
>  
> -- 
> 2.20.1
> 
>
venkata.pyla@toshiba-tsip.com Jan. 6, 2023, 10:17 a.m. UTC | #2
>-----Original Message-----
>From: isar-users@googlegroups.com <isar-users@googlegroups.com> On Behalf
>Of Moessbauer, Felix
>Sent: 06 January 2023 15:15
>To: isar-users@googlegroups.com; pyla venkata(TSIP TMIEC ODG Porting)
><Venkata.Pyla@toshiba-tsip.com>
>Cc: amikan@ilbers.de; Kiszka, Jan <jan.kiszka@siemens.com>; hayashi kazuhiro(
>林 和宏 □SWC◯ACT) <kazuhiro3.hayashi@toshiba.co.jp>; dinesh kumar(T
>SIP TMIEC ODG Porting) <dinesh.kumar@toshiba-tsip.com>; Schild, Henning
><henning.schild@siemens.com>
>Subject: Re: [PATCH v3 1/1] image.bbclass: fix non-reproducible file time-stamps
>inside rootfs
>
>On Thu, 2023-01-05 at 20:42 +0530, venkata.pyla@toshiba-tsip.com wrote:
>> From: venkata pyla <venkata.pyla@toshiba-tsip.com>
>>
>> As part of reproducible-build work, the rootfs images generated on
>> same source should be identical between two builds.
>>
>> In this commit it tries to solve one of the non-reproducible problem
>> i.e. the rootfs file time-stamps generated during build time are not
>> reproducible, it uses one of the solution provided in the debian
>> live-build image project (refer [1]), it fixes by finding all the
>> files/folders that are gernerated newly and set the time-stamp
>> provided by `SOURCE_DATE_EPOCH` environment variable.
>>
>> [1]
>> https://salsa.debian.org/live-team/live-build/-/merge_requests/218
>>
>> Signed-off-by: venkata pyla <venkata.pyla@toshiba-tsip.com>
>> ---
>>  meta-isar/conf/local.conf.sample | 10 ++++++++++
>>  meta/classes/image.bbclass       | 10 ++++++++++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/meta-isar/conf/local.conf.sample b/meta-
>> isar/conf/local.conf.sample index 57d0620..3c4a473 100644
>> --- a/meta-isar/conf/local.conf.sample
>> +++ b/meta-isar/conf/local.conf.sample
>> @@ -255,3 +255,13 @@ USER_isar[flags] += "clear-text-password"
>>  #CCACHE_TOP_DIR ?= "${TMPDIR}/ccache"
>>  # Enable ccache debug mode
>>  #CCACHE_DEBUG = "1"
>> +
>> +# Uncommnet and add value to it to build images reproducibly # # The
>> +value for `SOURCE_DATE_EPOCH` should be latest source change
>> time in
>> +# seconds since the Epoch.
>> +# Git repository users can use value from 'git log -1 --pretty=%ct'
>> +# Non git repository users can use value from 'stat -c%Y ChangeLog'
>> +# To know more details about this variable and how to set the value
>> refer below
>> +# https://reproducible-builds.org/docs/source-date-epoch/
>> +#SOURCE_DATE_EPOCH =
>> diff --git a/meta/classes/image.bbclass b/meta/classes/image.bbclass
>> index 813e1f3..8371ecd 100644
>> --- a/meta/classes/image.bbclass
>> +++ b/meta/classes/image.bbclass
>> @@ -431,6 +431,16 @@ do_rootfs_finalize() {
>>
>>          rm -f "${ROOTFSDIR}/etc/apt/sources-list"
>>  EOSUDO
>> +
>> +    # Set same time-stamps to the newly generated file/folders in
>> the
>> +    # rootfs image for the purpose of reproducible builds.
>> +    test ! -z "${SOURCE_DATE_EPOCH}" && \
>> +        sudo find ${ROOTFSDIR} -newermt \
>> +            "$(date -d@${SOURCE_DATE_EPOCH} '+%Y-%m-%d %H:%M:%S')" \
>> +            -printf "%y %p\n" \
>> +            -exec touch '{}' -h -d@${SOURCE_DATE_EPOCH} ';' >
>> ${DEPLOY_DIR_IMAGE}/files.modified_timestamps && \
>> +            bbwarn "$(cat
>> ${DEPLOY_DIR_IMAGE}/files.modified_timestamps) \nModified above file
>> timestamps to build image reproducibly"
>> +
>
>Hi, I just tested this code and found the following issues:
>
>This does not rebuild cleanly, as in general the do_rootfs_finalize cannot be re-
>executed. IMHO this is an ISAR bug and not a problem of your patch:
>
>do_rootfs_finalize
>mv: cannot stat '/build/tmp/work/debian-bookworm-amd64/img-<...>/1.0-
>r0/rootfs/etc/apt/sources-list': No such file or directory

Yes, I also faced this problem, but in my case for reproducible build test I have to delete the build/tmp and build/sstate-cache, so this issue is not seen.

>
>Second, it is a bit hard to use with automatic data from git.
>I solved it the following way, which works so far:
>
>SOURCE_DATE_EPOCH := "${@ bb.process.run('git -C ${LAYERDIR_project} log -1 -
>-pretty=\%ct')[0].strip() }"
>
>One problem here is, that this can hardly be generalized as we have to feed in
>the path of the main layer - which is not known by ISAR.

Your solution is good, but as you mentioned this will be difficult get the layer path, I think it is better document in the user_manual.md and let the main layer defines this.

>
>The list of touched files is quite long, but basically most of it falls in the
>following classes:
>
>- PKG info: /var/lib/dpkg/info/
>- Pycache: /usr/lib/python3.10/curses/__pycache__ This anyways needs to be
>further investigated as my understanding is that we do not want to distribute
>the pycache
I think this should be deleted instead modifying the time, this may be generated while executing python scripts part of package installation (postinst script)

>- directories and symlinks:  These always have to be fixed as the creation date
>depends on the install order of packages. I would not warn on these
Filtering out will be little complex, but let me try.

>
>Best regards,
>Felix
>
>>  }
>>  addtask rootfs_finalize before do_rootfs after do_rootfs_postprocess
>>
>> --
>> 2.20.1
>>
>>
>
>--
>You received this message because you are subscribed to the Google Groups
>"isar-users" group.
>To unsubscribe from this group and stop receiving emails from it, send an email
>to isar-users+unsubscribe@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/isar-
>users/074dd25bf429e532368b0c93ed8ec908bafb7941.camel%40siemens.com.

Patch

diff --git a/meta-isar/conf/local.conf.sample b/meta-isar/conf/local.conf.sample
index 57d0620..3c4a473 100644
--- a/meta-isar/conf/local.conf.sample
+++ b/meta-isar/conf/local.conf.sample
@@ -255,3 +255,13 @@  USER_isar[flags] += "clear-text-password"
 #CCACHE_TOP_DIR ?= "${TMPDIR}/ccache"
 # Enable ccache debug mode
 #CCACHE_DEBUG = "1"
+
+# Uncommnet and add value to it to build images reproducibly
+#
+# The value for `SOURCE_DATE_EPOCH` should be latest source change time in
+# seconds since the Epoch.
+# Git repository users can use value from 'git log -1 --pretty=%ct'
+# Non git repository users can use value from 'stat -c%Y ChangeLog'
+# To know more details about this variable and how to set the value refer below
+# https://reproducible-builds.org/docs/source-date-epoch/
+#SOURCE_DATE_EPOCH =
diff --git a/meta/classes/image.bbclass b/meta/classes/image.bbclass
index 813e1f3..8371ecd 100644
--- a/meta/classes/image.bbclass
+++ b/meta/classes/image.bbclass
@@ -431,6 +431,16 @@  do_rootfs_finalize() {
 
         rm -f "${ROOTFSDIR}/etc/apt/sources-list"
 EOSUDO
+
+    # Set same time-stamps to the newly generated file/folders in the
+    # rootfs image for the purpose of reproducible builds.
+    test ! -z "${SOURCE_DATE_EPOCH}" && \
+        sudo find ${ROOTFSDIR} -newermt \
+            "$(date -d@${SOURCE_DATE_EPOCH} '+%Y-%m-%d %H:%M:%S')" \
+            -printf "%y %p\n" \
+            -exec touch '{}' -h -d@${SOURCE_DATE_EPOCH} ';' > ${DEPLOY_DIR_IMAGE}/files.modified_timestamps && \
+            bbwarn "$(cat ${DEPLOY_DIR_IMAGE}/files.modified_timestamps) \nModified above file timestamps to build image reproducibly"
+
 }
 addtask rootfs_finalize before do_rootfs after do_rootfs_postprocess