[v2] If expand last partition fails, unsupervised systems reboot

Message ID CAJGKYO4jQ=b43n+qASC0U8bAzON6R0WSgJhzWhR3Or9ZXyZBKA@mail.gmail.com
State Not Applicable, archived
Headers show
Series [v2] If expand last partition fails, unsupervised systems reboot | expand

Commit Message

Roberto A. Foglietta Dec. 14, 2022, 7:52 a.m. UTC
Hi all,

 in the expand last partition script I read this code

ROOT_DEV="$(findmnt / -o source -n)"
[...]
BOOT_DEV="$(echo "${ROOT_DEV}" | sed 's/p\?[0-9]*$//')"
if [ "${ROOT_DEV}" = "${BOOT_DEV}" ]; then
        echo "Boot device equals root device - no partitioning found" >&2
        trap - EXIT
        exit 1
fi

this means that there are no partitions because the boot partition
/dev/sda is equal to root partition /dev/sda, which happens only when
there is no partition table.

roberto$ echo /dev/sda2 | sed 's/p\?[0-9]*$//'
/dev/sda (which is different than /dev/sda2)
roberto$ echo /dev/sda | sed 's/p\?[0-9]*$//'
/dev/sda (there is no partition table)

That's ok but not exit 1 because IMHO, it will retry again the next
time but nothing will change the next time so it will continue to try
and fail at every boot. Because one-time-only in systemd means
one-successful-time-only and exit 1 prevents disabling the operation.

So, exit 0 would be the correct code to execute and this is the reason
because I removed the trap which would make an unsupervised system
reboot.

The patch v2 has been sent using the elastic mail SMTP and went in
moderation, here in attachment.

Best regards, R-

Comments

Henning Schild Dec. 14, 2022, 8:47 a.m. UTC | #1
Patch can not be reviewed so i can as well top-post.

We should maybe separate the discussion with the reboot trap and the
unsupervised file from the "aligning the nothing to do exit code to 0"

And i think we can only really have the discussion if the patches are
inline and not attached.

Good catch, please send a patch that just turns that exit 1 into an
exit 0. Ideally without elastically stretching git send-email workflow
rules.

Henning 

Am Wed, 14 Dec 2022 08:52:00 +0100
schrieb "Roberto A. Foglietta" <roberto.foglietta@gmail.com>:

> Hi all,
> 
>  in the expand last partition script I read this code
> 
> ROOT_DEV="$(findmnt / -o source -n)"
> [...]
> BOOT_DEV="$(echo "${ROOT_DEV}" | sed 's/p\?[0-9]*$//')"
> if [ "${ROOT_DEV}" = "${BOOT_DEV}" ]; then
>         echo "Boot device equals root device - no partitioning found"
> >&2 trap - EXIT
>         exit 1
> fi
> 
> this means that there are no partitions because the boot partition
> /dev/sda is equal to root partition /dev/sda, which happens only when
> there is no partition table.
> 
> roberto$ echo /dev/sda2 | sed 's/p\?[0-9]*$//'
> /dev/sda (which is different than /dev/sda2)
> roberto$ echo /dev/sda | sed 's/p\?[0-9]*$//'
> /dev/sda (there is no partition table)
> 
> That's ok but not exit 1 because IMHO, it will retry again the next
> time but nothing will change the next time so it will continue to try
> and fail at every boot. Because one-time-only in systemd means
> one-successful-time-only and exit 1 prevents disabling the operation.
> 
> So, exit 0 would be the correct code to execute and this is the reason
> because I removed the trap which would make an unsupervised system
> reboot.
> 
> The patch v2 has been sent using the elastic mail SMTP and went in
> moderation, here in attachment.
> 
> Best regards, R-
>
Henning Schild Dec. 14, 2022, 9:05 a.m. UTC | #2
Am Wed, 14 Dec 2022 09:47:07 +0100
schrieb Henning Schild <henning.schild@siemens.com>:

> Patch can not be reviewed so i can as well top-post.
> 
> We should maybe separate the discussion with the reboot trap and the
> unsupervised file from the "aligning the nothing to do exit code to 0"

I think it might also be a good idea to set

ExecStopPost=-/bin/systemctl disable expand-on-first-boot.service

in addition to ExecStartPost doing the same. So no matter if we succeed
or not, we only give it one try. But i did not test that.

Henning

> And i think we can only really have the discussion if the patches are
> inline and not attached.
> 
> Good catch, please send a patch that just turns that exit 1 into an
> exit 0. Ideally without elastically stretching git send-email workflow
> rules.
> 
> Henning 
> 
> Am Wed, 14 Dec 2022 08:52:00 +0100
> schrieb "Roberto A. Foglietta" <roberto.foglietta@gmail.com>:
> 
> > Hi all,
> > 
> >  in the expand last partition script I read this code
> > 
> > ROOT_DEV="$(findmnt / -o source -n)"
> > [...]
> > BOOT_DEV="$(echo "${ROOT_DEV}" | sed 's/p\?[0-9]*$//')"
> > if [ "${ROOT_DEV}" = "${BOOT_DEV}" ]; then
> >         echo "Boot device equals root device - no partitioning
> > found"  
> > >&2 trap - EXIT  
> >         exit 1
> > fi
> > 
> > this means that there are no partitions because the boot partition
> > /dev/sda is equal to root partition /dev/sda, which happens only
> > when there is no partition table.
> > 
> > roberto$ echo /dev/sda2 | sed 's/p\?[0-9]*$//'
> > /dev/sda (which is different than /dev/sda2)
> > roberto$ echo /dev/sda | sed 's/p\?[0-9]*$//'
> > /dev/sda (there is no partition table)
> > 
> > That's ok but not exit 1 because IMHO, it will retry again the next
> > time but nothing will change the next time so it will continue to
> > try and fail at every boot. Because one-time-only in systemd means
> > one-successful-time-only and exit 1 prevents disabling the
> > operation.
> > 
> > So, exit 0 would be the correct code to execute and this is the
> > reason because I removed the trap which would make an unsupervised
> > system reboot.
> > 
> > The patch v2 has been sent using the elastic mail SMTP and went in
> > moderation, here in attachment.
> > 
> > Best regards, R-
> >   
>
Roberto A. Foglietta Dec. 14, 2022, 9:24 a.m. UTC | #3
On Wed, 14 Dec 2022 at 10:05, Henning Schild <henning.schild@siemens.com> wrote:
>
> Am Wed, 14 Dec 2022 09:47:07 +0100
> schrieb Henning Schild <henning.schild@siemens.com>:
>
> > Patch can not be reviewed so i can as well top-post.
> >
> > We should maybe separate the discussion with the reboot trap and the
> > unsupervised file from the "aligning the nothing to do exit code to 0"
>
> I think it might also be a good idea to set
>
> ExecStopPost=-/bin/systemctl disable expand-on-first-boot.service
>
> in addition to ExecStartPost doing the same. So no matter if we succeed
> or not, we only give it one try. But i did not test that.

if it fails, it is a good thing to retry the next time. Some issues
might depend on boot timing which might be variable. if you want to
test what happens put a "false" as first instruction after set -e and
you will see. In any case the script should be written to be able to
cope with a different behaviour of the systemd between failure and
success because tomorrow we can change that. (reasonable
generalisation).

By the way my patch does not only change the exit 1 with exit 0 but
set a trap on exit to reboot the unsupervised systems to go working in
production without the planned filesystem resize = underprovided =
nasty failures at future unpredictable time. It is easy to revise a
patch that is just a text file attached to an e-mail... LOL

Best, R-
Roberto A. Foglietta Dec. 14, 2022, 9:26 a.m. UTC | #4
On Wed, 14 Dec 2022 at 08:52, Roberto A. Foglietta
<roberto.foglietta@gmail.com> wrote:
>
> Hi all,
>
>  in the expand last partition script I read this code
>
> ROOT_DEV="$(findmnt / -o source -n)"
> [...]
> BOOT_DEV="$(echo "${ROOT_DEV}" | sed 's/p\?[0-9]*$//')"
> if [ "${ROOT_DEV}" = "${BOOT_DEV}" ]; then
>         echo "Boot device equals root device - no partitioning found" >&2
>         trap - EXIT
>         exit 1
> fi
>
> this means that there are no partitions because the boot partition
> /dev/sda is equal to root partition /dev/sda, which happens only when
> there is no partition table.
>
> roberto$ echo /dev/sda2 | sed 's/p\?[0-9]*$//'
> /dev/sda (which is different than /dev/sda2)
> roberto$ echo /dev/sda | sed 's/p\?[0-9]*$//'
> /dev/sda (there is no partition table)
>
> That's ok but not exit 1 because IMHO, it will retry again the next
> time but nothing will change the next time so it will continue to try
> and fail at every boot. Because one-time-only in systemd means
> one-successful-time-only and exit 1 prevents disabling the operation.
>
> So, exit 0 would be the correct code to execute and this is the reason
> because I removed the trap which would make an unsupervised system
> reboot.
>
> The patch v2 has been sent using the elastic mail SMTP and went in
> moderation, here in attachment.

Hi all,

Moreover, I tried to resize a volume without a partition table and I
did successfully in both cases ext4 and btrfs.

So, if we are able to resize the last partition when that partition is
the rootfs on which the system is running, why are we not able to do
the same with the entire volume?

##### BTRFS #####
losetup -P $(losetup -f) image.wic
losetup -l | grep image.wic | cut -f1 -d' '
dd if=/dev/loop3p2 bs=1M of=/dev/sdb status=progress
mkdir /mnt/sdb
mount /dev/sdb /mnt/sdb
df -h | grep sdb
btrfs filesystem resize max /mnt/sdb
df -h | grep sdb
umount /mnt/sdb
losetup -d /dev/loop3

##### EXT4 #####
losetup -P $(losetup -f) image.wic
losetup -l | grep image.wic | cut -f1 -d' '
dd if=/dev/loop3p2 bs=1M of=/dev/sdb status=progress
resize2fs /dev/sdb
mount /dev/sdb /mnt/sdb
df -h | grep sdb
umount /mnt/sdb
losetup -d /dev/loop3

IMHO, that code above should be just set properly the LAST_PART
variable and then skip a lot of code useful only for a partitioned
disk

if [ "${ROOT_DEV}" = "${BOOT_DEV}" ]; then
LAST_PART="${BOOT_DEV}"
else
LAST_PART="$(sfdisk -d "${BOOT_DEV}" 2>/dev/null | tail -1 | cut -d ' ' -f 1)"
[ and everything else until the export of EXT2FS_NO_MTAB_OK ]
fi

This because trying to resize a volume full does not hurt nor fails:

root# resize2fs /dev/sdb
resize2fs 1.46.5 (30-Dec-2021)
The filesystem is already 30041088 (4k) blocks long.  Nothing to do!
root# echo $?
0

root# btrfs filesystem resize max /mnt/sdb
Resize device id 1 (/dev/sdb) from 2.04GiB to max
root# btrfs filesystem resize max /mnt/sdb; echo $?
Resize device id 1 (/dev/sdb) from 114.60GiB to max
0

However, to check these changes we need a system template with a
single volume to test the code. I left this task to you.

Best regards, R-

Patch

From 14cff2bc150b1be77d5de4253ac3e92df2b4b0f9 Mon Sep 17 00:00:00 2001
From: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
Date: Tue, 13 Dec 2022 07:14:25 +0100
Subject: [PATCH v2] If expand last partition fails, unsupervised systems
 reboot

Improvement for unsupervised embedded devices which need to reboot if expand last
partition fails because a under-provided system should not even complete the boot
in the field in order to avoid functioning failures at unexpected future time.
This patch do not affect the previous behaviour because requirs /etc/unsupervised

This patch applies after the following other patches:

 - In expand last partition script btrfs support added
 - In expand last partition wait for udev have finished

v2: if there is no partition table exit 0 instead of exit 1

Signed-off-by: Roberto A. Foglietta <roberto.foglietta@gmail.com>
---
 .../expand-on-first-boot/files/expand-last-partition.sh   | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
index f5bcad1..2c52b16 100755
--- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
+++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
@@ -9,6 +9,8 @@ 
 
 set -e
 
+test -e /etc/unsupervised && trap reboot EXIT
+
 udevadm settle
 
 ROOT_DEV="$(findmnt / -o source -n)"
@@ -21,7 +23,8 @@  fi
 BOOT_DEV="$(echo "${ROOT_DEV}" | sed 's/p\?[0-9]*$//')"
 if [ "${ROOT_DEV}" = "${BOOT_DEV}" ]; then
 	echo "Boot device equals root device - no partitioning found" >&2
-	exit 1
+	trap - EXIT
+	exit 0
 fi
 
 # this value is in blocks. Normally a block has 512 bytes.
@@ -37,6 +40,7 @@  done
 MINIMAL_SIZE=$((ALL_PARTS_SIZE + BUFFER_SIZE))
 if [ "$DISK_SIZE" -lt "$MINIMAL_SIZE" ]; then
 	echo "Disk is practically already full, doing nothing." >&2
+	trap - EXIT
 	exit 0
 fi
 
@@ -72,3 +76,5 @@  case $(lsblk -fno FSTYPE "${LAST_PART}") in
 		umount $tmpdir && rmdir $tmpdir
 		;;
 esac
+
+trap - EXIT
-- 
2.34.1