Last Saturday, I decided it was time to switch my NAS server from 22.04 LTS to 24.04 LTS. I’ve been putting it off for ages, worried that the upgrade might not go as planned and something could go wrong. Since 24.04 is already in its fourth point release, I figured the risks should be manageable and it’s time to take the plunge.
I backup my system nightly so the insurance was in place. After performing a final regular update to the system, I started with the following:
sudo apt update && sudo apt upgrade && sudo apt dist-upgrade
I then rebooted the system and executed:
sudo do-release-upgrade
After answering a few questions to save my custom configuration files for different services, it said the upgrade was done. I then rebooted the system, but BOOM! It won’t boot.
The BIOS knows the bootable drive, but when I tried to boot it, it just went back into the BIOS. It didn’t even give me a GRUB prompt or menu.
I figured this wasn’t a big deal, so I booted up the system with the 24.04 LTS Live USB. The plan is to just reinstall GRUB, and hopefully, that will fix the system.
Once I’ve booted into the Live USB and picked English as my language, I can jump into a command shell by pressing ALT-F2. Alternatively, you can press F1 and choose the shell option from the help menu. But, I found that the first method opens up a shell with command line completion, so I went with that.
The boot disk had the following layout (output from both fdisk and parted):
sudo fdisk -l /dev/nvme1n1
Disk /dev/nvme1n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 980 PRO 1TB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 90B9F208-2D05-484D-8C8C-B3AE71475167
Device Start End Sectors Size Type
/dev/nvme1n1p1 2048 2203647 2201600 1G EFI System
/dev/nvme1n1p2 2203648 1921875000 1919671353 915.4G Linux filesystem
/dev/nvme1n1p3 1921875968 1953523711 31647744 15.1G Linux swap
sudo parted /dev/nvme1n1
GNU Parted 3.4
Using /dev/nvme1n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: Samsung SSD 980 PRO 1TB (nvme)
Disk /dev/nvme1n1: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 1128MB 1127MB fat32 boot, esp
2 1128MB 984GB 983GB ext4
3 984GB 1000GB 16.2GB linux-swap(v1) swap swap
As I described in this post, we want to make sure that the first partition is marked for EFI boot. This can be done in parted with:
set 1 boot on
set 1 esp on
I didn’t have to perform the above since the first partition (/dev/nvme1n1p1) is already recognized as EFI System. We also need to ensure that this partition is formatted with FAT32. This can be done with:
sudo mkfs.vfat -F 32 /dev/nvme1n1p1
Since this was already the case, I also did not have to perform this formatting step.
The next step is to mount the root directory and the boot partition.
mount /dev/nvme1n1p2 /mnt
mount /dev/nvme1n1p1 /mnt/boot/efi
We now need to bind certain directories under /mnt in preparation for us to change our root directory to /mnt.
for i in /dev /dev/pts /proc /run; do sudo mount --bind $i /mnt$i; done
mount --rbind /dev /mnt/dev
mount --rbind /sys /mnt/sys
mount --rbind /run /mnt/run
mount -t proc /proc /mnt/proc
chroot /mnt
grub-install --efi-directory=/boot/efi /dev/nvme1n1
update-grub
mount --make-rslave /mnt/dev
umount -R /mnt
exit
If we do not use the –rbind option for /sys, then we may get an EFI error when running grub-install. There are two alternatives that solves the same issue, although used less often, you can also choose one of the following (but not BOTH):
mount --bind /sys/firmware/efi/efivars /mnt/sys/firmware/efi/efivars
mount -t efivarfs none /sys/firmware/efi/efivars
The reinstallation of GRUB did not solve the problem. I had to perform a full system restore using my backup. The backup was created using rsync as described on this post. However, I learned that this backup was done incorrectly! I excluded certain directories using the name instead of /name. This caused more exclusion than intended. The correct method of the backup should be:
sudo rsync --delete \
--exclude '/dev' \
--exclude '/proc' \
--exclude '/sys' \
--exclude '/tmp' \
--exclude '/run' \
--exclude '/mnt' \
--exclude '/media' \
--exclude '/cdrom' \
--exclude 'lost+found' \
-aAXv / ${BACKUP}
and the restoration command is very similar:
mount /dev/sdt1 /mnt/backup
mount /dev/nvme1n1p2 /mnt/system
sudo rsync --delete \
--exclude '/dev' \
--exclude '/proc' \
--exclude '/sys' \
--exclude '/tmp' \
--exclude '/run' \
--exclude '/mnt' \
--exclude '/media' \
--exclude '/cdrom' \
--exclude 'lost+found' \
-aAXv /mnt/backup/ /mnt/system/
After the restore, double check that /var/run is soft-linked to /run.
Once the restoration is completed, I follow the above instructions again to re-install GRUB, and I was able to boot back into my boot disk.
Since this upgrade attempt has failed, I now have to figure out a way to move my system forward. I think what I will do is to port all of my services on my NAS as podman root-less quadlets, and then just move the services into a brand new Ubuntu clean installation. This is probably easier to manage in the future.
