Replacing VDEV in a ZFS Pool

Several months ago I had an old 3TB hard drive (HDD) crashed on me. Luckily it was a hard drive that is primarily used for backup purposes, so the data lost can quickly be duplicated from source by performing another backup. Since it was not critical that I replace the damaged drive immediately, it was kind of left to fester until today.

Recently I acquired four additional WD Red 6TB HDD, and I wanted to install these new drives into my NAS chassis. Since I am opening the chassis, I will irradicate the damaged drive, and also take this opportunity to swap some old drives out of the ZFS pool that I created earlier and add these new drives into the pool.

I first use the following command to add two additional mirror vdev’s each composed of the two new WD Red drives.

sudo zpool add vault mirror {id_of_drive_1} {id_of_drive_2}

The drive id’s is located in the following path: /dev/disk/by-id and is typically prefixed with ata or wwn.

This created two vdev’s into the pool, and I can remove an existing vdev. Doing so will automatically start redistributing the data on the removing vdev to the other vdev’s in the pool. All of this is performed while the pool is still online and running to service the NAS. To remove the old vdev, I execute the following command:

sudo zpool remove vault {vdev_name}

In my case, the old vdev’s name is mirror-5.

Once the remove command is given, the copying of data from the old vdev to the other vdev’s begins. You can check the status with:

sudo zpool status -v vault

The above will show the copying status and the approximate time it will take to complete the job.

Once the removal is completed, the old HDD of mirror-5 is still labeled for ZFS use. I had to use the labelclear command to clean the drive so that I could repurpose the drives for backup duty. Below is an example of the command.

sudo zpool labelclear sdb1

The resulting pool now looks like this:

sudo zpool list -v vault

(Output truncated)
                                                                                                                     NAME                                                    SIZE  ALLOC   FREE
vault                                                  52.7T  38.5T  14.3T
  mirror-0                                             9.09T  9.00T  92.4G
    ata-ST10000VN0008-2JJ101_ZHZ1KMA0-part1                -      -      -
    ata-WDC_WD101EFAX-68LDBN0_VCG6VRWN-part1               -      -      -
  mirror-1                                             7.27T  7.19T  73.7G
    wwn-0x5000c500b41844d9-part1                           -      -      -
    ata-ST8000VN0022-2EL112_ZA1E8S0V-part1                 -      -      -
  mirror-2                                             9.09T  9.00T  93.1G
    wwn-0x5000c500c3d33191-part1                           -      -      -
    ata-ST10000VN0004-1ZD101_ZA2964KD-part1                -      -      -
  mirror-3                                             10.9T  10.8T   112G
    wwn-0x5000c500dc587450-part1                           -      -      -
    wwn-0x5000c500dcc525ab-part1                           -      -      -
  mirror-4                                             5.45T  1.74T  3.72T
    wwn-0x50014ee2b9f82b35-part1                           -      -      -
    wwn-0x50014ee2b96dac7c-part1                           -      -      -
  indirect-5                                               -      -      -
  mirror-6                                             5.45T   372G  5.09T
    wwn-0x50014ee265d315cd-part1                           -      -      -
    wwn-0x50014ee2bb37517e-part1                           -      -      -
  mirror-7                                             5.45T   373G  5.09T
    wwn-0x50014ee265d315b1-part1                           -      -      -
    wwn-0x50014ee2bb2898c2-part1                           -      -      -
cache                                                      -      -      -
  nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D   466G   462G  4.05G

The above indirect-5 can be safely ignored. It is just a reference to the old mirror-5.

This time we replaced the entire vdev, another technique is to replace the actual drives within the vdev. To do this, we will have to use the zpool replace command. We may also have to perform a zpool offline first before the replace command. This can be successively done on all the old drives in the mirror with newer drives with larger capacities to increase an existing vdev’s size.

LVM to ZFS Migration

In a previous post, I described the hardware changes that I made to facilitate additional drive slots on my NAS Media Server.

We now need to migrate from an LVM system consisting of 40TB of redundant mirrored storage using mdadm to a ZFS system consisting of a single pool and a dataset. Below is a diagram depicting the logical layout of the old and the intended new system.

40TB of Redundant Storage
40TB of Redundant Storage
Logical Volume (LV)
Logical Volume (LV)
EXT 4 File System for Media Storage
EXT 4 File System for Media Storage
Old Storage System
Old Storage System
50TB of Redundant Storage
50TB of Redundant Storage
ZFS Dataset
ZFS Dataset
New Storage System
New Storage System
LV Group of 4 Physical Volumes (PV)
LV Group of 4 Physical Volumes (PV)
PV
PV
10TB
mdadm mirror
10TB…
PV
PV
8TB
mdadm mirror
8TB…
PV
PV
10TB
mdadm mirror
10TB…
PV
PV
12TB
mdadm mirror
12TB…
ZFS Pool
ZFS Pool
12TB
Mirror VDEV
12TB…
10TB
Mirror VDEV
10TB…
8TB
Mirror VDEV
8TB…
10TB
Mirror VDEV
10TB…
6TB
Mirror VDEV
6TB…
4TB
Mirror VDEV
4TB…
512GB
L2ARC Cache
512GB…

Before the migration, we must backup all the data from the LVM system. I cobbled together a collection of old hard drives and then proceeded to create another LVM volume as the temporary storage of the content. This temporary volume will not have any redundancy capability, so if any one of the old hard drives fails, then out goes all the content. The original LVM system is mounted on /mnt/airvideo and the temporary LVM volume is mounted on /mnt/av2.

I used the command below to proceed with the backup.

sudo rsync --delete -aAXv /mnt/airvideo /mnt/av2 > ~/nohup.avs.rsync.out 2>&1 &

I can then monitor the progress of the backup with:

tail -f ~/nohup.avs.rsync.out

The backup took a little more than 7 days to copy around 32 TB of data from our NAS server. During this entire process, all of the NAS services continued to run, so that downtime was almost non-existent.

Once the backup is completed, I wanted to move all the services to the backup before I started to dismantle the old LVM volume. The following steps were done:

  • Stop all services on other machines that were using the NAS;
  • Stop all services on the NAS that were using the /mnt/airvideo LVM volume;
    • sudo systemctl stop apache2 smbd nmbd plexmediaserver
  • Unmount the /mnt/airvideo volume, and create a soft-link of the same name to the backup volume at /mnt/av2;
    • sudo umount /mnt/airvideo
    • sudo ln -s /mnt/av2 /mnt/airvideo
  • Restart all services on the NAS and the other machines;
    • sudo systemctl start apache2 smbd nmbd plexmediaserver
  • Once again, the downtime here was minimal;
  • Remove or comment out the entry in the /etc/fstab file that automatically mounts the old LVM volume on boot. This is no longer necessary because ZFS is remounted by default;

Now that the services are all up and running, we can then start destroying the old LVM volume (airvideovg2/airvideo) and volume group (airvideovg2). We can obtain a list of all the physical volumes that make up the volume group.

sudo pvdisplay -C --separator ' | ' -o pv_name,vg_name

  PV | VG
  /dev/md1 | airvideovg2
  /dev/md2 | airvideovg2
  /dev/md3 | airvideovg2
  /dev/md4 | airvideovg2
  /dev/nvme0n1p1 | airvideovg2

The /dev/mdX devices are the mdadm mirror devices, each consisting of a pair of hard drives.

sudo lvremove airvideovg2/airvideo
Do you really want to remove and DISCARD active logical volume airvideovg2/airvideo? [y/n]: y
  Flushing 0 blocks for cache airvideovg2/airvideo.
Do you really want to remove and DISCARD logical volume airvideovg2/lv_cache_cpool? [y/n]: y
  Logical volume "lv_cache_cpool" successfully removed
  Logical volume "airvideo" successfully removed

sudo vgremove airvideovg2
  Volume group "airvideovg2" successfully removed

At this point, both the logical volume and the volume group are removed. We say a little prayer to ensure nothing happens with our temporary volume (/mnt/av2), that is currently in operation.

We now have to disassociate the mdadm devices from LVM.

sudo pvremove /dev/md1
Labels on physical volume "/dev/md1" successfully wiped.
sudo pvremove /dev/md2
Labels on physical volume "/dev/md2" successfully wiped.
sudo pvremove /dev/md3
Labels on physical volume "/dev/md3" successfully wiped.
sudo pvremove /dev/md4
Labels on physical volume "/dev/md4" successfully wiped.
sudo pvremove /dev/nvme0n1p1
Labels on physical volume "/dev/nvme0n1p1" successfully wiped.

You can find the physical hard drives associated with each mdadm device using the following:

sudo mdadm --detail /dev/md1
#or
sudo cat /proc/mdstat

We then have to stop all the mdadm devices and zero their superblock so that we can reuse the hard drives to set up our ZFS pool.

sudo mdadm --stop /dev/md1
mdadm: stopped /dev/md1
sudo mdadm --stop /dev/md2
mdadm: stopped /dev/md2
sudo mdadm --stop /dev/md3
mdadm: stopped /dev/md3
sudo mdadm --stop /dev/md4
mdadm: stopped /dev/md4

# Normally you also need to do a --remove after the --stop,
# but it looks like the 6.5 kernel did the remove automatically.
#
# For all partitions used in the md device

for i in sdb1 sdc1 sdp1 sda1 sdo1 sdd1 sdg1 sdn1
do
	sudo mdadm --zero-superblock /dev/${i}
done

Now with all of the old hard drives freed up, we can repurpose them to create our ZFS pool. Instead of using the /dev/sdX reference of the physical device, it is recommended to use /dev/disk/by-id with the manufacturer’s model and serial number so that the ZFS pool can be moved to another machine in the future. We also used the -f switch to let ZFS know that it is okay to erase the existing content on those devices. The command to create the pool we named vault is this:

zpool create -f vault mirror /dev/disk/by-id/ata-ST10000VN0008-2JJ101_ZHZ1KMA0-part1 /dev/disk/by-id/ata-WDC_WD101EFAX-68LDBN0_VCG6VRWN-part1 mirror /dev/disk/by-id/ata-ST8000VN0022-2EL112_ZA1E8GW4-part1 /dev/disk/by-id/ata-ST8000VN0022-2EL112_ZA1E8S0V-part1 mirror /dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA2C69FN-part1 /dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA2964KD-part1 mirror /dev/disk/by-id/ata-ST12000VN0008-2YS101_ZRT008SC-part1 /dev/disk/by-id/ata-ST12000VN0008-2YS101_ZV701XQV-part1

# The above created the pool with the old drives from the old LVM volume group
# We then added 4 more drives, 2 x 6TB, and 2 x 4TB drives to the pool

# Adding another 6TB mirror:

sudo zpool add -f vault mirror /dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WX31D87HDU09-part1 /dev/disk/by-id/ata-WDC_WD60EZRZ-00GZ5B1_WD-WX11D374490J-part1

# Adding another 4TB mirror:

sudo zpool add -f vault mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0GTAK-part1 /dev/disk/by-id/ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E0354579-part1

We also want to add the old NVMe as ZFS L2ARC cache.

ls -lh /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D

lrwxrwxrwx 1 root root 13 Mar  2 16:02 /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D -> ../../nvme0n1

sudo zpool add vault cache /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D 

We can see the pool using this command:

sudo zpool list -v vault

NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
vault                                                  45.4T  31.0T  14.4T        -         -     0%    68%  1.00x    ONLINE  -
  mirror-0                                             9.09T  8.05T  1.04T        -         -     0%  88.5%      -    ONLINE
    ata-ST10000VN0008-2JJ101_ZHZ1KMA0-part1                -      -      -        -         -      -      -      -    ONLINE
    ata-WDC_WD101EFAX-68LDBN0_VCG6VRWN-part1               -      -      -        -         -      -      -      -    ONLINE
  mirror-1                                             7.27T  6.49T   796G        -         -     0%  89.3%      -    ONLINE
    ata-ST8000VN0022-2EL112_ZA1E8GW4-part1                 -      -      -        -         -      -      -      -    ONLINE
    ata-ST8000VN0022-2EL112_ZA1E8S0V-part1                 -      -      -        -         -      -      -      -    ONLINE
  mirror-2                                             9.09T  7.54T  1.55T        -         -     0%  82.9%      -    ONLINE
    ata-ST10000VN0004-1ZD101_ZA2C69FN-part1                -      -      -        -         -      -      -      -    ONLINE
    ata-ST10000VN0004-1ZD101_ZA2964KD-part1                -      -      -        -         -      -      -      -    ONLINE
  mirror-3                                             10.9T  8.91T  2.00T        -         -     0%  81.7%      -    ONLINE
    ata-ST12000VN0008-2YS101_ZRT008SC-part1                -      -      -        -         -      -      -      -    ONLINE
    ata-ST12000VN0008-2YS101_ZV701XQV-part1                -      -      -        -         -      -      -      -    ONLINE
  mirror-4                                             5.45T  23.5G  5.43T        -         -     0%  0.42%      -    ONLINE
    ata-WDC_WD60EFRX-68L0BN1_WD-WX31D87HDU09-part1         -      -      -        -         -      -      -      -    ONLINE
    ata-WDC_WD60EZRZ-00GZ5B1_WD-WX11D374490J-part1         -      -      -        -         -      -      -      -    ONLINE
  mirror-5                                             3.62T  17.2G  3.61T        -         -     0%  0.46%      -    ONLINE
    ata-ST4000DM004-2CV104_ZFN0GTAK-part1                  -      -      -        -         -      -      -      -    ONLINE
    ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E0354579-part1         -      -      -        -         -      -      -      -    ONLINE
cache                                                      -      -      -        -         -      -      -      -  -
  nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D   466G  3.58G   462G        -         -     0%  0.76%      -    ONLINE

Once the pool is created, we wanted to set some pool properties so that in the future when we replace these drives with bigger drives, the pool will automatically expand.

zpool set autoexpand=on vault

With the pool created, we can then create our dataset or filesystem and its associated mount point. We also want to ensure that the filesystem also supports posixacl.

zfs create vault/airvideo
zfs set mountpoint=/mnt/av vault/airvideo
zfs set acltype=posixacl vault
zfs set acltype=posixacl vault/airvideo

We mount the new ZFS filesystem on /mnt/av because the /mnt/airvideo is soft-linked to the temporary /mnt/av2 volume that is still in operation. We first have to re-copy all our content from the temporary volume to the new ZFS filesystem.

sudo rsync --delete -aAXv /mnt/av2/ /mnt/av > ~/nohup.avs.rsync.out 2>&1 &

This took around 4 days to complete. We can all breathe easy again because all the data now have redundancy again! We can now bring the new ZFS filesystem live.

sudo systemctl stop apache2.service smbd nmbd plexmediaserver.service
sudo rm /mnt/airvideo
sudo zfs set mountpoint=/mnt/airvideo vault/airvideo
sudo systemctl start apache2.service smbd nmbd plexmediaserver.service

zfs list

NAME             USED  AVAIL     REFER  MOUNTPOINT
vault           31.0T  14.2T       96K  /vault
vault/airvideo  31.0T  14.2T     31.0T  /mnt/airvideo

The above did not take long and the migration is completed!

df -h /mnt/airvideo

Filesystem      Size  Used Avail Use% Mounted on
vault/airvideo   46T   32T   15T  69% /mnt/airvideo

Getting the capacity of our new ZFS filesystem shows that we now have 46TB to work with! This should last for at least a couple of years I hope.

I also did a quick reboot of the system to ensure it can come back up with the ZFS filesystem in tack and without issues. It has now been running for 2 days. I have not collected any performance statistics, but the services all feel faster.

Wake-on-LAN for Linux

My sons upgraded their gaming computers last Christmas, and I ended up using their old parts to build a couple of Linux servers running Ubuntu servers. The idea is to use these extra servers as video encoders since they will have dedicated GPUs. However, the GPUs are also pretty power-hungry. Since they don’t need to be up 24 hours a day, I thought keeping these servers asleep until they are required would be good. At the same time, it would be pretty inconvenient to go to the servers to physically power them up when I needed them. The thought of configuring their Wake-on-LAN came to mind.

I found this helpful article online. I first found out that the Network Interface on the old motherboards supports Wake-on-LAN (WOL). Below is a series of commands that I executed to find out whether WOL is supported or not, and if so, then enable the support.

% sudo nmcli connection show
NAME                UUID                                  TYPE      DEVICE
Wired connection 1  d46c707a-307b-3cb2-8976-f127168f80e6  ethernet  enp2s0

% sudo ethtool enp2s0 | grep -i wake
	Supports Wake-on: pumbg
	Wake-on: d

The line that reads,

Supports Wake-on: pumbg

indicates the WOL capabilities, and the line that reads,

Wake-on: d

indicates its current status. Each letter has a meaning:

  • d (disabled), or
  • triggered by
    • p (PHY activity), 
    • u (unicast activity),
    • m (multicast activity),
    • b (broadcast activity), 
    • a (ARP activity),
    • g (magic packet activity)

We will use the magic packet method. Below are the commands used to enable WOL based on the magic packet trigger.

% sudo nmcli connection modify d46c707a-307b-3cb2-8976-f127168f80e6 802-3-ethernet.wake-on-lan magic

% sudo nmcli connection up d46c707a-307b-3cb2-8976-f127168f80e6
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)

% sudo ethtool enp2s0 | grep -i wake
	Supports Wake-on: pumbg
	Wake-on: g

The above changes will persist even after the machine reboots. We put the machine to sleep by using the following command:

% sudo systemctl suspend

We need the IP address and the MAC address of the machine to wake the computer up using the wakeonlan utility.

% ifconfig
enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.168.185  netmask 255.255.255.0  broadcast 192.168.168.255
        inet6 fd1a:ee9:b47:e840:6cd0:bf9b:2b7e:afb6  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::41bc:2081:3903:5288  prefixlen 64  scopeid 0x20<link>
        inet6 fd1a:ee9:b47:e840:21b9:4a98:dafd:27ee  prefixlen 64  scopeid 0x0<global>
        ether 1c:1b:0d:70:80:84  txqueuelen 1000  (Ethernet)
        RX packets 33852015  bytes 25769211052 (25.7 GB)
        RX errors 0  dropped 128766  overruns 0  frame 0
        TX packets 3724164  bytes 4730498904 (4.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

We use the above ifconfig to find the addresses highlighted in bold. Once we have the required information, we can then wake the computer up remotely by executing the wakeonlan command from another computer.

% wakeonlan -i 192.168.168.255 -p 4343 1c:1b:0d:70:80:84
Sending magic packet to 192.168.168.185:4343 with 1c:1b:0d:70:80:84

Note that the above IP address used is the broadcast address and not the machine’s direct IP address. Now I can place these servers to sleep and only turn them on remotely when I need them.

EXT4-fs Errors on NVME SSD

In my previous post, I replaced my NVME boot disk on our media server thinking that the disk was defective because the file system (EXT4-fs) was reporting numerous htree_dirblock_to_tree:1080 errors.

The errors continue to persist with the new disk, so I can eliminate the possibility of hardware as the cause of the issue.

I noticed that the htree_dirblock_to_tree:1080 errors were caused by the tar command and the time in which these errors occur coincided when the media server is being backed up. Apparently, the backup process is causing these errors with the tar command.

This backup process has remained unchanged for quite some time and has worked really well for us. I guess for some reason there is a bug in the kernel or in the tar command that is not quite compatible with NVME devices.

I had to resort to finding an alternative backup methodology. I ended up using the rsync method instead.

sudo rsync --delete \
  --exclude 'dev' \
  --exclude 'proc' \
  --exclude 'sys' \
  --exclude 'tmp' \
  --exclude 'run' \
  --exclude 'mnt' \
  --exclude 'media' \
  --exclude 'cdrom' \
  --exclude 'lost+found' \
  --exclude 'home/kang/log' \
  -aAXv / /mnt/backup

It looks like this method is faster and can perform incremental backup. However, instead of backing up to an archive file, which I later need to extract and prepare during the restoration process, I have to back it up to a dedicated backup device. Since the old NVME disk is perfectly fine, I reused it as my backup device. I have partitioned this backup device in the same layout as the current boot disk.

Device          Start        End    Sectors   Size Type
/dev/sdi1        2048    2203647    2201600     1G Microsoft basic data
/dev/sdi2     2203648 1921875967 1919672320 915.4G Linux filesystem
/dev/sdi3  1921875968 1953523711   31647744  15.1G Linux swap

The only exception is that the first partition is not marked as boot and esp, so during the restoration process I will have to mark that partition accordingly with the parted command by using the following commands:

set 1 boot on
set 1 esp on

The idea is that at 3am every night/morning, I will backup the root filesystem to the second partition of the backup drive. If anything happens with the current boot disk, the backup drive can act as an immediately available replacement, after a grub-install preparation as mentioned in the previous article.

Let us see how this new backup process works and hopefully, we can bid a final farewell to the htree_dirblock_to_tree:1080 errors!

Update: 2023-12-22

It looks like even with the rsync command, the htree_dirblock_to_tree:1080 errors still came back during the backup process. I decided to upgrade the kernel from vmlinuz-5.15.0-91-generic to vmlinuz-6.2.0-39-generic. Last night (2023-12-23 early morning) was the first backup after the kernel upgrade, and no errors were recorded. I hope this behavior persists and it is not a one-off.

Replacing NVME Boot Disk

A few months ago, the boot disk of our media server begin to incur some errors, such as the ones below:

Dec 17 03:01:35 avs kernel: [32515.068669] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #10354778: comm tar: Directory block failed checksum
Dec 17 03:02:35 avs kernel: [32575.183005] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #13500463: comm tar: Directory block failed checksum
Dec 17 03:02:35 avs kernel: [32575.183438] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #13500427: comm tar: Directory block failed checksum

The boot disk is a NVME device and I thought it may be due to over heating, so I purchased a heat sink and installed it. Unfortunately the errors persisted after the heat sink.

I decided to replace the boot disk with the exact same model which was the Samsung 980Pro 1TB. This should have been a pretty easy maintenance task. We clone the drive, and swap in the new drive. However, Murphy is sure to strike!

My usual goto cloning utility is Clonezilla, unfortunately this utility did not like cloning NVME drives. The utility resulted in a kernel panic after trying multiple versions. I am not sure what is the problem here. It could be Clonezilla or the USB 3.0 NVME enclosure that I was using for the new disk.

I resigned to using the dd command:

dd if=/dev/source of=/dev/target status=progress

Unfortunately this would have taken way too long something like 20+ hours, so I gave up with this approach.

I decided to do a good old restore of the nightly backup. I started by cloning the partition table:

sfdisk -d /dev/olddisk | sfdisk /dev/newdisk

I then proceeded with the restore of the nightly backup. Murphy strikes twice! The nightly backup was corrupted! I guess it is not surprising when the root directory’s integrity is in question. The whole reason why we are doing this exercise.

Without the nightly backup, I had to resort to a live backup. I booted system again, and performed:

sudo su -
mount /dev/new_disk_root_partition /mnt/newboot
cd /
tar -cvpf - --exclude=/tmp --exclude=/home/kang/log --exclude=/span --exclude="/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache" --one-file-system / | tar xvpzf - -C /mnt/newboot --numeric-owner

The above took about an hour. I then copy the /span directory manually, because this directory tends to change while the server is up and running.

With all the contents copied, I forgot how to install grub and had to re-teach myself again. I had to use a live copy Ubuntu USB and use that to boot up the machine, and then mount both the root and efi partitions respectively.

nvme1n1                              259:0    0 931.5G  0 disk
├─nvme1n1p1                          259:1    0     1G  0 part  /boot/efi
├─nvme1n1p2                          259:2    0 915.4G  0 part  /
└─nvme1n1p3                          259:3    0  15.1G  0 part  [SWAP]

And install GRUB.

sudo su -
mkdir /efi
mount /dev/nvme1n1p1 /efi
mount /dev/nvme1n1p2 /mnt
grub-install --efi-directory /efi --root-directory /mnt

I also have fix the /etc/fstab to ensure the root partition and /boot/efi partition are properly referenced by their corresponding, correct UUID. The blkid command came in handy to find the UUID. For the swap partition, I had to use the mkswap command before I get the UUID.

After I rebooted, I reinstalled GRUB one more time with the following as super user:

grub-install /dev/nvme1n1

I also updated the initramfs using:

update-initramfs -c -k all

For something that should have taken less than an hour, it took the majority of the day. The server is now running with the new NVME replacement disk. Hopefully this resolves the file system corruptions. We have to wait and see!

Update: The Day After

The same errors occurred again! I noticed that these corruptions occur when we do a system backup. How ironic! I later confirmed that performing the tar command on the root directory during the backup process can cause such an error. I now have to see why this is. I will disable the system backup for the next few days to see if the errors come back or not.

Media Server Upgrade 2022 (Part 2)

Part 1

In the first part of this post, I talked about making sure all the new hardware that I recently purchased works. Yesterday, upgrading from Ubuntu 20.04 LTS to 22.04 LTS was super simple. Unfortunately, that was the end of the easy part.

I thought I could just image by old boot drive and make a carbon copy of it on my new boot drive. My old boot drive is a simple SATA 512GB SSD, and my new boot drive is an NVMe M.2 1TB SSD plugged directly to the motherboard. The copying was pretty simple, but because the drives differ in size, I had to relayout the partition table with the new drive once the copy is completed. I did this with the parted command.

Unfortunately the new boot drive did not want to boot. At this point I had to do some research. The most helpful articles were:

Both of the above articles were an excellent refresher on how GRUB works. I have used GRUB since the beginning, but one gets super rusty when these types of tasks are only performed once every three or six years!

Instead of detailing what went wrong, I will just explain what I should have done. This way if I need it again in the future, it is here for my reference.

Step 1: Perform a backup of the old boot drive from a Live USB in shell mode. This is done on my server on a nightly basis. This method is clearly described on the Ubuntu Community Help Wiki.

Following this method I will end up with a compressed tar archive for my entire root directory, skipping some runtime and other unwanted directories.

Step 2: After installing a fresh install of the new Ubuntu LTS Server operating system on the new server and boot drive, I proceeded to backup the new boot with the same technique used in Step 1. I stored the backup of the new install on another external SSD drive that I have lying around. Also it is important that new boot drive partition layout of the new install contains a swap partition.

Step 3: I then restore the most recent backup (done in Step 1) of the old boot drive to the new boot drive. I then replaced the /boot/grub directory with the new contents from the new install which was backed up in Step 2. The new GRUB is already installed when we performed a brand new installation on the drive. We just want to make sure the boot partition matches the /boot/grub contents.

Step 4: We also need to fix up the /etc/fstab file because it contains references to drive devices from the old hardware. Paid special attention the main data partition and the swap partition. It should look something like this:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/nvme1n1p2 during curtin installation
UUID=fc939be4-5292-4252-8120-7ef59b177e5b / ext4 defaults 0 1

# /boot/efi was on /dev/nvme0n1p1 during curtin installation
UUID=5187-A8C6 /boot/efi vfat defaults 0 1

# Swap partition
UUID=512d611e-6944-4a57-9748-ea68e9ec3fad	none	swap	sw	0	0

# /dev/mapper/airvideovg2-airvideo /mnt/airvideo ext4 rw,noatime 0 0
UUID=9e78425c-c1f3-4285-9fa1-96cac9114c55 /mnt/airvideo ext4 rw,noatime 0 0

Noticed that I also added the LVM logical volume for /mnt/airvideo, which is my RAID-1 array. The UUID can be obtained by the blkid command. Below is a sample output:

% blkid
/dev/sdf1: UUID="60024298-9915-3ad8-ae6c-ed7adc98ee62" UUID_SUB="fe08d23c-8e11-e02b-63f9-1bb806046db7" LABEL="avs:4" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="552bdff7-182f-40f0-a378-844fdb549f07"
/dev/nvme0n1p1: UUID="r2rLMD-BEnc-wcza-yvro-chkB-1vB6-6Jtzgz" TYPE="LVM2_member" PARTLABEL="primary" PARTUUID="6c85af69-19a0-4720-9588-808bc0d818f7"
/dev/sdd1: UUID="34c6a19f-98ea-0188-bb3f-a5f5c3be238d" UUID_SUB="4174d106-cae4-d934-3ed4-5057531acb3c" LABEL="avs:3" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="2fc4e9ad-be4b-48aa-8115-f32472e61005"
/dev/sdb1: UUID="ac438ac6-344a-656b-387f-017036b0fafa" UUID_SUB="0924dc67-cd3f-dec5-1814-ab46ebdf2fbe" LABEL="avs:1" TYPE="linux_raid_member" PARTUUID="29e7cfce-9e7b-4067-a0ca-453b39e0bd3d"
/dev/md4: UUID="gjbtdL-homY-wyRG-rUBw-lFgm-t0vZ-Gi8gSz" TYPE="LVM2_member"
/dev/md2: UUID="0Nky5e-52t6-b1uZ-GAIl-4Ior-XWTz-wFpHh1" TYPE="LVM2_member"
/dev/sdi1: UUID="5b483ac2-5b7f-4951-84b2-08adc602f705" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="data" PARTUUID="e0515517-9fbb-4d8a-88ad-674622f20e00"
/dev/sdg1: UUID="3d1afb64-8785-74e6-f9be-b68600eebdd5" UUID_SUB="c146cd05-8ee8-5804-b921-6d87cdd4a092" LABEL="avs:2" TYPE="linux_raid_member" PARTLABEL="lvm" PARTUUID="2f25ec17-83c4-4c0b-8653-600283d58109"
/dev/sde1: UUID="34c6a19f-98ea-0188-bb3f-a5f5c3be238d" UUID_SUB="8aabfe5b-af16-6e07-17c2-3f3ceb1514e3" LABEL="avs:3" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="2fc4e9ad-be4b-48aa-8115-f32472e61005"
/dev/sdc1: UUID="ac438ac6-344a-656b-387f-017036b0fafa" UUID_SUB="c188f680-01a8-d5b2-f8bc-9f1cc1fc3598" LABEL="avs:1" TYPE="linux_raid_member" PARTUUID="29e7cfce-9e7b-4067-a0ca-453b39e0bd3d"
/dev/nvme1n1p2: UUID="fc939be4-5292-4252-8120-7ef59b177e5b" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="912e805d-fe68-48f8-b845-9bba0e3e8c78"
/dev/nvme1n1p3: UUID="512d611e-6944-4a57-9748-ea68e9ec3fad" TYPE="swap" PARTLABEL="swap" PARTUUID="04ac46ff-74f3-499a-814d-32082f6596d2"
/dev/nvme1n1p1: UUID="5187-A8C6" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="fe91a6b2-9cd3-46af-813a-b053a181af52"
/dev/sda1: UUID="3d1afb64-8785-74e6-f9be-b68600eebdd5" UUID_SUB="87fe80a1-4a79-67f3-273e-949e577dd5ee" LABEL="avs:2" TYPE="linux_raid_member" PARTUUID="c8dce45e-5134-4957-aee9-769fa9d11d1f"
/dev/md3: UUID="XEJI0m-PEmZ-VFiI-o4h0-bnQc-Y3Be-3QHB9n" TYPE="LVM2_member"
/dev/md1: UUID="usz0sA-yO01-tlPL-12j2-2C5r-Ukhc-9RLCaX" TYPE="LVM2_member"
/dev/mapper/airvideovg2-airvideo: UUID="9e78425c-c1f3-4285-9fa1-96cac9114c55" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdh1: UUID="60024298-9915-3ad8-ae6c-ed7adc98ee62" UUID_SUB="a1291844-6587-78b0-fcd1-65bc367068e5" LABEL="avs:4" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="ed0274b9-21dc-49bf-bdda-566b2727ddc2"

Step 4B (Potentially): If the system boots in the “grub>” prompt, then we will have persuade grub to manually boot by providing the following at the prompt:

grub> set root=(hd9,gpt2)
grub> linux /boot/vmlinuz root=/dev/nvme1n1p2
grub> initrd /boot/initrd.img
grub> boot

To find the root value on the first line, you have use the ls command which is explained in this article. The root parameter on the linux line references the partition which the root directory is mounted. In my case, it was /dev/nvme1n1p2.

After I rebooted, I reinstalled GRUB with the following as super user:

grub-install /dev/nvme1n1

It may also be required to update our initramfs using:

update-initramfs -c -k all

Step 5: At this point the system should reboot and all of the old server’s content should now be on the old hardware. Unfortunately we will need to fix the network interface.

First obtain the MAC address of the network interface using:

% sudo lshw -C network | grep serial   
    serial: 04:42:1a:05:d3:c4

And then we will have to edit the /etc/netplan/00-installer-config.yaml file.

% cat /etc/netplan/00-installer-config.yaml 
# This is the network config written by 'subiquity'
network:
  ethernets:
    enp6s0:
      dhcp4: true
      match:
        macaddress: 04:42:1a:05:d3:c4
      set-name: enp6s0
  version: 2

Ensuring the MAC address matches from lshw and that the name is the same as the old system. The name in this example is enp6s0. We then need to execute the following commands to generate the interface.

netplan generate
netplan apply

We need to ensure the name matches because many services on the server have configurations that references the interface name, such as:

  • Configurations in /var/network/interfaces
  • Samba (SMB) (/etc/samba/smb.conf)
  • Pihole (/etc/pihole/setupVars.conf)
  • Homebridge (/var/lib/homebridge/config.json)

Step 6: Fix the router provisioning DHCP IP addresses so that the new server has the same fixed IP address as the old server. This is important because there may be firewall rules referencing this IP address directly. The hostname should have been automatically restored when we restored the partition in Step 3.

Step 7: Our final step is to test the various services and ensure they are working properly. These include:

  • Mail
  • Our web site lufamily.ca
  • Homebridge
  • Plex
  • Pihole (DNS server)
  • SMB (File sharing)

Finally the new system is completed!

New system all up and running!

Media Server Upgrade 2022

On May 15th, 2019 (more than three years ago), I performed a performance boost to my media server by upgrading its CPU, Motherboard, and Memory. You can read that experience in this post.

Today, I am going to be doing the same. It looks like we are on a cadence of every 3 years or so to do a spec bump. This time around we are also changing the same items, but will include the power supply as well in the swap. I also decided to swap the boot drive hardware from an old SSD drive to an NVME drive. All of this resulted in the following hardware acquisitions, all from Amazon, which I find them to have lower pricing (when factoring free shipping through Prime), than Newegg even during Black Friday and Cyber Monday offers.

  • AMD Ryzen 7 5700G 8-Core, 16-Thread Unlocked Desktop Processor with Radeon Graphics
  • ASUS TUF GAMING B550-PLUS AMD AM4 (3rd Gen Ryzenâ„¢)
  • G.SKILL Ripjaws V Series DDR4 3600MHz 32GB(16GBx2) Memory Kit
  • ASUS ROG Strix 850W Gold PSU
  • Samsung 980 PRO SSD 1TB – M.2 NVMe

The above totalled $1045.60 CAD.

The plan is to spend the time today to roughly test out all the new hardware.

Test Setup

I quickly did a skeleton setup to make sure Ubuntu 22.04.01 Server Edition works with all hardware involved, especially the networking.

Memory Test

Once I know Ubuntu server is working good, I am now testing the server’s new 32GB DDR4 memory. This is running as I write this post and will let it run overnight.

The plan for tomorrow is to upgrade the current media server from Ubuntu 20.04.5 LTS to Ubuntu 22.04.1 LTS. Once this is done, I can then backup everything, and move the new hardware into the old casing and hope everything works.

Part 2

Another NAS Storage Upgrade

Our home Network Attached Storage (NAS) media server is going below 4 Terabytes of free space. The Seagate IronWolf 12TB hard drives were on sale with Amazon offering them below $300. I figure that I swap out two old 6TB drives with these new 12TB drives resulting in a net increase of a further 6TB of storage.

The last time this was done was around two years ago when I replaced 4TB and 6TB hard drives with 10TB hard drives.

So far the mdadm and LVM storage architecture has proven to be very flexible. I am able to mix drives of different sizes and able to grow our media storage volume over time.

Previously I had to make two swaps, each swap for each drive in the array. Effectively I am changing two 6TB drives for two 12TB drives because they are in a Raid 1 array. I cannot swap both at the same time, because I have to incrementally sync the data from the old drives to the new ones.

This has always been inconvenient because it means opening the physical server twice. However, this time I used my USB 3.0 HDD dock. I inserted one of two 12TB new drives into the dock, and then I temporarily created a three disks Raid 1 array. Once the sync is completed, which took 10+ hours, I remove one 6TB drive from the array configuration and I then physically replace both 6TB drives with both 12TB new drives in the server chassis, and place one old 6TB drive into the dock. The 6TB drive in the dock is the one that is still in the array configuration. I then add the second 12TB drive that is already in the server chassis to the three disk array. Once again, a sync is required to accommodate the second 12TB drive. This also took 10+ hours. Once the second sync is completed, I can finally remove the second 6TB drive in the dock from the array and have the array returned back to a two disk Raid 1 array.

The above description is probably quite confusing, but this technique allowed me to just have a single down time for the server instead of two when swapping hard drives in the server chassis.

There will be an additional downtime when I grow or resize the LVM volume and file system.

After this upgrade I should have the following Raid 1 (fully mirrored) arrays:

  • An array with 2 x 8TB
  • An array with 2 x 10TB
  • An array with 2 x 10TB
  • An array with 2 x 12TB

The above four arrays are combined into a logical volume using LVM that results in a total volume size of 40TB (fully mirrored) or a little over 36TiB of usable space (increasing from the old 31TiB).

% df -h
Filesystem                        Size  Used Avail Use% Mounted on
udev                              7.7G     0  7.7G   0% /dev
tmpfs                             1.6G  3.4M  1.6G   1% /run
/dev/sdj1                         454G   64G  367G  15% /
tmpfs                             7.7G   37M  7.7G   1% /dev/shm
tmpfs                             5.0M  4.0K  5.0M   1% /run/lock
tmpfs                             7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/mapper/airvideovg2-airvideo   37T   26T  9.1T  74% /mnt/airvideo
tmpfs                             1.6G     0  1.6G   0% /run/user/997
tmpfs                             1.6G     0  1.6G   0% /run/user/1000
/dev/sda1                         5.5T  548G  4.7T  11% /mnt/6tb

As you can see from above, the /mnt/airvideo now has 9.1TiB free!

The NAS motherboard and CPU is now over three years old. I may give it a couple of more years before considering another hardware upgrade.

Upgrading USG to UDM Pro

As I indicated in a previous post, I wanted to upgrade my old USG firewall to the new UDM Pro. In that post, I have outlined my reasons, so I will not repeat them here.

I wanted to perform the upgrade with a minimum amount of downtime. With several household members who are online constantly for both school work as well as for entertainment, any perceived downtime will result in a typical earful, which I would rather avoid.

In preparation for the upgrade, I performed the required backups for both the Unifi Video and the Unifi Network Controller applications that are both previously running on my NAS server. I copy the backups to an old MacBook Air that I will use as my console to setup the UDM Pro. I wanted to make sure that I stop both the unifi and unifi-video services on my Ubuntu NAS. We do not want to risk some sort of conflicts when we plug in the UDM Pro.

I also configured the old USG LAN2 port so that it will provision a network that has a separate IP subnet different from all of my existing networks. In my case, it was 10.10.11.1. I did this so that I can connect the UDM Pro into the LAN2 port of the old USG, which mimics my Internet Service Provider (ISP). This way, while I configure and setup the UDM Pro, my existing home network can continue to function and provide the require services to my household.

Connections used for UDM Pro setup

The setup connection layout looks like the above diagram. After the UDM Pro booted up, I used Safari to point to 192.168.1.1, which hosted the UDM Pro’s web administration interface. Since I already had a Unifi account, I used that to login initially. Once logged in, I proceeded to upgrade both the Network and Protect applications on the device. The Network application was upgraded to 6.5.53, and the Protect application was upgraded to 1.20.0. I am not planning to use the Access and Talk applications, but upgraded those any ways in case of any outstanding security holes.

I then open the Network application and performed a restore from the backup that I previously copied on to the MacBook Air. This restore worked without a hitch. All of my configurations and network and WiFi settings were ported over. The UDM Pro also automatically created another user from my old restore, which I promptly switched over, and disabled the external access to the UDM Pro, call me paranoid. I did a restart of the device to make sure that it boots back up smoothly. Once I have confirmed that everything is still okay, then I proceeded to shutdown the UDM Pro.

Before I make the physical swap, replacing the USG with the UDM Pro, I ssh into each Unifi managed device, switches and access points to ensure that the inform URL is reachable. Once I was satisfied of this, I then proceeded with the swap. Now my network looks like this:

Post upgrade layout (click to enlarge)

My expectation was that once the UDM Pro boots up and the Network application comes online, all the Unifi networking devices will automatically be managed by the UDM Pro. The reality however was slightly different. On the plus side, the network is fully functional and the only down time was the time it took to perform the swap and the boot up process of the UDM Pro, no more than 5 minutes. However on the down side, when I inspected all the devices (not clients) within the Network application, only one of all the devices successfully registered with the new Network application. I was a bit puzzled and miffed.

After about an hour’s investigation, apparently some devices did not like the inform URL to be http://unifi:8080/inform. I specifically picked the hostname instead of the IP address because it was changing from my NAS server to the UDM Pro. Unfortunately for some unknown reason, only a single Unifi access point UAP-AC-Pro made the jump and was successfully adopted by the UDM Pro. All the other devices went into a state of a repeated adoption loop. Even rebooting the devices did not help.

To remedy the situation, I had to ssh into each of the remaining devices and manually performed the following command (twice):

set-inform http://192.168.168.1:8080/inform

I am uncertain why I had to do it twice, but the first time did not work with the Network application. Once all the devices were adopted, I then proceed to configure the Override Inform Host settings on the UDM Pro. See below:

Note that the above settings were only in the New UI and missing form the Class UI

I had to set the above and not the default, because the will ubnt did not work, perhaps this had something to do with my Pi-hole configuration. Also, although by this point all the devices were connected, there were two remaining access points complaining about a STUN issue. When I check their respective logs, the STUN URL was incorrect because it retained the old unifi hostname. It was at this point, I decided to let the Network controller to push out an IP address based inform URL for all the devices.

I perform another reboot of the UDM Pro just to make sure we have transition stability over a power outage, and everything seems to be running smoothly now.

The restore of my Unifi Video configuration with the new Unifi Protect application on the UDM Pro was way more smoother. I dear say flawless. However, there was a minor hiccup. I run Homebridge at home to connect my Unifi Protect G3 Flex cameras to my HomeKit environment. This relied on an RSTP stream that was previously supported by Unifi Video. However the new Protect application used RSTPS, which is an encrypted version of the stream. Long story short, I had to switch from the Camera FFmpeg to the Unifi Protect plugin. Not a big deal, but I was super glad that the Unifi Protect plugin existed for Homebridge.

Now our inside security footages are recorded (up to a month), and motion detection is a lot faster. The application to find and view the videos are now more convenient. On top of that, live previews are also available within my Home App on my iPhone, iPad, as well as the Apple TV.

A gallery view from my 4K TV using Apple TV and HomeKit

The outdoor footages are still using Kuna, but I plan to perform a future upgrade using Unifi G4 Bullet cameras for the outside as well.

The last configuration I did was finally turned on Unifi’s Threat Management. I set it to level 4 to begin with and we may adjust it in the future.

click to enlarge

Last and certainly not the least, my NAS server is now more of a NAS and less of a networking controller, since all networking related monitoring and control is now being performed by a dedicated UDM Pro device. The UDM Pro can also handle the threat management at my full ISP 1Gbps download speed.

I will continue to post my upgrades here. If anything else, keep my future self informed.