Tag: linux
Ubuntu 24.04 Upgrade
Wake-on-LAN for Linux
My sons upgraded their gaming computers last Christmas, and I ended up using their old parts to build a couple of Linux servers running Ubuntu servers. The idea is to use these extra servers as video encoders since they will have dedicated GPUs. However, the GPUs are also pretty power-hungry. Since they don’t need to be up 24 hours a day, I thought keeping these servers asleep until they are required would be good. At the same time, it would be pretty inconvenient to go to the servers to physically power them up when I needed them. The thought of configuring their Wake-on-LAN came to mind.
I found this helpful article online. I first found out that the Network Interface on the old motherboards supports Wake-on-LAN (WOL). Below is a series of commands that I executed to find out whether WOL is supported or not, and if so, then enable the support.
% sudo nmcli connection show
NAME UUID TYPE DEVICE
Wired connection 1 d46c707a-307b-3cb2-8976-f127168f80e6 ethernet enp2s0
% sudo ethtool enp2s0 | grep -i wake
Supports Wake-on: pumbg
Wake-on: d
The line that reads,
Supports Wake-on: pumbg
indicates the WOL capabilities, and the line that reads,
Wake-on: d
indicates its current status. Each letter has a meaning:
- d (disabled), or
- triggered by
- p (PHY activity),
- u (unicast activity),
- m (multicast activity),
- b (broadcast activity),
- a (ARP activity),
- g (magic packet activity)
We will use the magic packet method. Below are the commands used to enable WOL based on the magic packet trigger.
% sudo nmcli connection modify d46c707a-307b-3cb2-8976-f127168f80e6 802-3-ethernet.wake-on-lan magic
% sudo nmcli connection up d46c707a-307b-3cb2-8976-f127168f80e6
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
% sudo ethtool enp2s0 | grep -i wake
Supports Wake-on: pumbg
Wake-on: g
The above changes will persist even after the machine reboots. We put the machine to sleep by using the following command:
% sudo systemctl suspend
We need the IP address and the MAC address of the machine to wake the computer up using the wakeonlan
utility.
% ifconfig
enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.168.185 netmask 255.255.255.0 broadcast 192.168.168.255
inet6 fd1a:ee9:b47:e840:6cd0:bf9b:2b7e:afb6 prefixlen 64 scopeid 0x0<global>
inet6 fe80::41bc:2081:3903:5288 prefixlen 64 scopeid 0x20<link>
inet6 fd1a:ee9:b47:e840:21b9:4a98:dafd:27ee prefixlen 64 scopeid 0x0<global>
ether 1c:1b:0d:70:80:84 txqueuelen 1000 (Ethernet)
RX packets 33852015 bytes 25769211052 (25.7 GB)
RX errors 0 dropped 128766 overruns 0 frame 0
TX packets 3724164 bytes 4730498904 (4.7 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
We use the above ifconfig
to find the addresses highlighted in bold. Once we have the required information, we can then wake the computer up remotely by executing the wakeonlan
command from another computer.
% wakeonlan -i 192.168.168.255 -p 4343 1c:1b:0d:70:80:84
Sending magic packet to 192.168.168.185:4343 with 1c:1b:0d:70:80:84
Note that the above IP address used is the broadcast address and not the machine’s direct IP address. Now I can place these servers to sleep and only turn them on remotely when I need them.
EXT4-fs Errors on NVME SSD
In my previous post, I replaced my NVME boot disk on our media server thinking that the disk was defective because the file system (EXT4-fs) was reporting numerous htree_dirblock_to_tree:1080
errors.
The errors continue to persist with the new disk, so I can eliminate the possibility of hardware as the cause of the issue.
I noticed that the htree_dirblock_to_tree:1080
errors were caused by the tar
command and the time in which these errors occur coincided when the media server is being backed up. Apparently, the backup process is causing these errors with the tar
command.
This backup process has remained unchanged for quite some time and has worked really well for us. I guess for some reason there is a bug in the kernel
or in the tar
command that is not quite compatible with NVME devices.
I had to resort to finding an alternative backup methodology. I ended up using the rsync
method instead.
sudo rsync --delete \
--exclude 'dev' \
--exclude 'proc' \
--exclude 'sys' \
--exclude 'tmp' \
--exclude 'run' \
--exclude 'mnt' \
--exclude 'media' \
--exclude 'cdrom' \
--exclude 'lost+found' \
--exclude 'home/kang/log' \
-aAXv / /mnt/backup
It looks like this method is faster and can perform incremental backup. However, instead of backing up to an archive file, which I later need to extract and prepare during the restoration process, I have to back it up to a dedicated backup device. Since the old NVME disk is perfectly fine, I reused it as my backup device. I have partitioned this backup device in the same layout as the current boot disk.
Device Start End Sectors Size Type
/dev/sdi1 2048 2203647 2201600 1G Microsoft basic data
/dev/sdi2 2203648 1921875967 1919672320 915.4G Linux filesystem
/dev/sdi3 1921875968 1953523711 31647744 15.1G Linux swap
The only exception is that the first partition is not marked as boot
and esp
, so during the restoration process I will have to mark that partition accordingly with the parted
command by using the following commands:
set 1 boot on
set 1 esp on
The idea is that at 3am every night/morning, I will backup the root filesystem to the second partition of the backup drive. If anything happens with the current boot disk, the backup drive can act as an immediately available replacement, after a grub-install
preparation as mentioned in the previous article.
Let us see how this new backup process works and hopefully, we can bid a final farewell to the htree_dirblock_to_tree:1080
errors!
Update: 2023-12-22
It looks like even with the rsync
command, the htree_dirblock_to_tree:1080
errors still came back during the backup process. I decided to upgrade the kernel from vmlinuz-5.15.0-91-generic
to vmlinuz-6.2.0-39-generic
. Last night (2023-12-23 early morning) was the first backup after the kernel upgrade, and no errors were recorded. I hope this behavior persists and it is not a one-off.
Replacing NVME Boot Disk
A few months ago, the boot disk of our media server begin to incur some errors, such as the ones below:
Dec 17 03:01:35 avs kernel: [32515.068669] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #10354778: comm tar: Directory block failed checksum
Dec 17 03:02:35 avs kernel: [32575.183005] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #13500463: comm tar: Directory block failed checksum
Dec 17 03:02:35 avs kernel: [32575.183438] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #13500427: comm tar: Directory block failed checksum
The boot disk is a NVME device and I thought it may be due to over heating, so I purchased a heat sink and installed it. Unfortunately the errors persisted after the heat sink.
I decided to replace the boot disk with the exact same model which was the Samsung 980Pro 1TB. This should have been a pretty easy maintenance task. We clone the drive, and swap in the new drive. However, Murphy is sure to strike!
My usual goto cloning utility is Clonezilla, unfortunately this utility did not like cloning NVME drives. The utility resulted in a kernel panic after trying multiple versions. I am not sure what is the problem here. It could be Clonezilla or the USB 3.0 NVME enclosure that I was using for the new disk.
I resigned to using the dd
command:
dd if=/dev/source of=/dev/target status=progress
Unfortunately this would have taken way too long something like 20+ hours, so I gave up with this approach.
I decided to do a good old restore of the nightly backup. I started by cloning the partition table:
sfdisk -d /dev/olddisk | sfdisk /dev/newdisk
I then proceeded with the restore of the nightly backup. Murphy strikes twice! The nightly backup was corrupted! I guess it is not surprising when the root directory’s integrity is in question. The whole reason why we are doing this exercise.
Without the nightly backup, I had to resort to a live backup. I booted system again, and performed:
sudo su -
mount /dev/new_disk_root_partition /mnt/newboot
cd /
tar -cvpf - --exclude=/tmp --exclude=/home/kang/log --exclude=/span --exclude="/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache" --one-file-system / | tar xvpzf - -C /mnt/newboot --numeric-owner
The above took about an hour. I then copy the /span
directory manually, because this directory tends to change while the server is up and running.
With all the contents copied, I forgot how to install grub and had to re-teach myself again. I had to use a live copy Ubuntu USB and use that to boot up the machine, and then mount both the root and efi partitions respectively.
nvme1n1 259:0 0 931.5G 0 disk
├─nvme1n1p1 259:1 0 1G 0 part /boot/efi
├─nvme1n1p2 259:2 0 915.4G 0 part /
└─nvme1n1p3 259:3 0 15.1G 0 part [SWAP]
And install GRUB.
sudo su -
mkdir /efi
mount /dev/nvme1n1p1 /efi
mount /dev/nvme1n1p2 /mnt
grub-install --efi-directory /efi --root-directory /mnt
I also have fix the /etc/fstab
to ensure the root partition and /boot/efi
partition are properly referenced by their corresponding, correct UUID
. The blkid
command came in handy to find the UUID
. For the swap partition, I had to use the mkswap
command before I get the UUID
.
After I rebooted, I reinstalled GRUB one more time with the following as super user:
grub-install /dev/nvme1n1
I also updated the initramfs
using:
update-initramfs -c -k all
For something that should have taken less than an hour, it took the majority of the day. The server is now running with the new NVME replacement disk. Hopefully this resolves the file system corruptions. We have to wait and see!
Update: The Day After
The same errors occurred again! I noticed that these corruptions occur when we do a system backup. How ironic! I later confirmed that performing the tar
command on the root directory during the backup process can cause such an error. I now have to see why this is. I will disable the system backup for the next few days to see if the errors come back or not.
Old Media Server with OpenVPN
I am in the process of building and configuring a media server for my parents. After my recent media server upgrade, I have extra gear lying around. By purchasing a power supply and a small case, I can cobble together another media server with my old processor and motherboard. I will call this my parent’s media server. The goal is to replace the current Raspberry PI unit that is currently running OSMC acting as their media server. Although the OSMC solution with Raspberry PI has been working really well, it is under powered to play any HEVC encoded video at full 1080p HD resolution.
I wanted to convert the majority of our video media to HEVC simply to save storage space. If I do this with my media library, I will not be able to share our media with them because of their under powered Raspberry PI.
To solve this issue, I installed Ubuntu 18.04 along with Kodi on my parent’s media server that I just created. I have been testing this solution for the past couple of weeks and both the hardware and media player works really well.
I also configured the box to auto mount USB disks, and installed SAMBA so that both videos and music files can be shared with other devices on the same network. The SAMBA is primarily used by my parents with their SONOS speakers.
With this media server at their location, I can also consider future upgrades such as replacing their WiFi network with a Ubiquiti solution, and even ponder on a site-to-site VPN solution with both of our networks.
Perhaps that is looking too far into the future. My immediate concern is how to remotely administer the box. With the Raspberry PI, I just had a simple SSH setup. However with the extra horse power, and a full blown Ubuntu distribution, I can now setup OpenVPN.
I followed these instructions on the DigitalOcean site, and it worked flawlessly. During the setup, I made a major error. I skipped the firewall (ufw) setup on the box, thinking that I don’t need a firewall because an external firewall already exists. However, OpenVPN will not route external traffic to the internal private network if IP masquerading (NAT) is not setup properly. Thanks to a coworker’s advice, I configured the firewall with IP forwarding NAT, but also change all default actions to ACCEPT so that the firewall only function as a NAT router. Lesson learned!
Since this VPN will only be used by me for remote management, I will not configure any HTTPS tunnelling or install and configure ObfsProxy. We will continue to use UDP and stick with the default 1194 port.
We will do some final testing before finally deploying it to my parent’s place.