Found Two HBA Cards for My NAS

About three weeks ago, I was casually browsing eBay and found this little gem, a Host Bus Adapter that can do PCIe 2.0 x8 (~4 to 8GB/s). This is way better than the one that I purchased earlier (GLOTRENDS SA3116J PCIe SATA Adapter Card) which can operate on a single lane of PCIe 3.0 yielding only 1GB/s. I could not pass it up at a price of only $ 40 CAD, so I purchased two of these to replace the old adapter card I had.

LSI 6Gbps SAS HBA 9200-8i IT Mode ZFS FreeNAS unRAID + 2*SFF-8087 SATA

This new card LSI 6Gbps SAS HBA 9200-8i only supports 8 SATA ports per card, so I had to get two of them to support all of the hard drives that I have. These SAS HBA cards must have the IT (initiator target) mode firmware because the default firmware (IR mode) supports a version of hardware RAID, which I did not want. With the IT mode, the hard drives will be logically separated on the card and only share the physical bandwidth of the PCIe bus. This is a must for ZFS.

With these new cards, my write throughput to my NAS hard drives now averages around 500MB/s. Previously, I was only getting about half of this.

I wish I would have found these sooner. Now I have two spare PCIe SATA expansion cards, one supporting 8 ports, and the other supporting 16 ports. I will place them on another server. Perhaps in a future Proxmox cluster project.

Tesla FSD Trial

Thanks to my cousins who also purchased Tesla using my Tesla referral code, I have some referral credits that I can use to select a 3 months trial of Tesla’s Full Self Driving (FSD) capability. On March 1st, 2024, I turned this feature on.

Today, we went out for lunch at Mr. Congee in Richmond Hill, Ontario. I thought I give this FSD a try.

It was pretty easy to set up. After agreeing with all the legal stuff and enable the feature, I just set the destination, set it to drive mode, wait for the autopilot icon to show, and double tap the right stalk and away we go!

There are three settings to FSD: chill, average, aggressive/assertive. I just left it on the default, average mode.

Pulling out of the driveway, the car was a bit jerky, but once it got on the road, it made all the right decisions. I override the mode on our neighbourhood feeder road from Leslie Street just to make sure that I can override the mode, and then quickly re-engage the FSD.

On this occasion, all traveling was done on regular roads, no highways, so it was more challenging for the car. It made all the turns correctly, but I did have to override it once when it did not recognize the restricted Via Bus Lane on Yonge Street. It even pulled into the parking area at Mr. Congee, but did not fully complete the trip by parking the car. I had to park it manually.

On the way back, it hesitated too much on a left-hand turn. I had to press the accelerator to help it along. Doing this did not override the FSD mode.

I will be driving to Montreal in about 4 weeks, so I will be looking forward to testing FSD on the highway.

My initial assessment is that I probably would not have paid any more money to gain this feature. Once again, thanks to my cousins who allowed me to experience this through the use of my Tesla redemption credits.

LVM to ZFS Migration

In a previous post, I described the hardware changes that I made to facilitate additional drive slots on my NAS Media Server.

We now need to migrate from an LVM system consisting of 40TB of redundant mirrored storage using mdadm to a ZFS system consisting of a single pool and a dataset. Below is a diagram depicting the logical layout of the old and the intended new system.

40TB of Redundant Storage
40TB of Redundant Storage
Logical Volume (LV)
Logical Volume (LV)
EXT 4 File System for Media Storage
EXT 4 File System for Media Storage
Old Storage System
Old Storage System
50TB of Redundant Storage
50TB of Redundant Storage
ZFS Dataset
ZFS Dataset
New Storage System
New Storage System
LV Group of 4 Physical Volumes (PV)
LV Group of 4 Physical Volumes (PV)
PV
PV
10TB
mdadm mirror
10TB…
PV
PV
8TB
mdadm mirror
8TB…
PV
PV
10TB
mdadm mirror
10TB…
PV
PV
12TB
mdadm mirror
12TB…
ZFS Pool
ZFS Pool
12TB
Mirror VDEV
12TB…
10TB
Mirror VDEV
10TB…
8TB
Mirror VDEV
8TB…
10TB
Mirror VDEV
10TB…
6TB
Mirror VDEV
6TB…
4TB
Mirror VDEV
4TB…
512GB
L2ARC Cache
512GB…

Before the migration, we must backup all the data from the LVM system. I cobbled together a collection of old hard drives and then proceeded to create another LVM volume as the temporary storage of the content. This temporary volume will not have any redundancy capability, so if any one of the old hard drives fails, then out goes all the content. The original LVM system is mounted on /mnt/airvideo and the temporary LVM volume is mounted on /mnt/av2.

I used the command below to proceed with the backup.

sudo rsync --delete -aAXv /mnt/airvideo /mnt/av2 > ~/nohup.avs.rsync.out 2>&1 &

I can then monitor the progress of the backup with:

tail -f ~/nohup.avs.rsync.out

The backup took a little more than 7 days to copy around 32 TB of data from our NAS server. During this entire process, all of the NAS services continued to run, so that downtime was almost non-existent.

Once the backup is completed, I wanted to move all the services to the backup before I started to dismantle the old LVM volume. The following steps were done:

  • Stop all services on other machines that were using the NAS;
  • Stop all services on the NAS that were using the /mnt/airvideo LVM volume;
    • sudo systemctl stop apache2 smbd nmbd plexmediaserver
  • Unmount the /mnt/airvideo volume, and create a soft-link of the same name to the backup volume at /mnt/av2;
    • sudo umount /mnt/airvideo
    • sudo ln -s /mnt/av2 /mnt/airvideo
  • Restart all services on the NAS and the other machines;
    • sudo systemctl start apache2 smbd nmbd plexmediaserver
  • Once again, the downtime here was minimal;
  • Remove or comment out the entry in the /etc/fstab file that automatically mounts the old LVM volume on boot. This is no longer necessary because ZFS is remounted by default;

Now that the services are all up and running, we can then start destroying the old LVM volume (airvideovg2/airvideo) and volume group (airvideovg2). We can obtain a list of all the physical volumes that make up the volume group.

sudo pvdisplay -C --separator ' | ' -o pv_name,vg_name

  PV | VG
  /dev/md1 | airvideovg2
  /dev/md2 | airvideovg2
  /dev/md3 | airvideovg2
  /dev/md4 | airvideovg2
  /dev/nvme0n1p1 | airvideovg2

The /dev/mdX devices are the mdadm mirror devices, each consisting of a pair of hard drives.

sudo lvremove airvideovg2/airvideo
Do you really want to remove and DISCARD active logical volume airvideovg2/airvideo? [y/n]: y
  Flushing 0 blocks for cache airvideovg2/airvideo.
Do you really want to remove and DISCARD logical volume airvideovg2/lv_cache_cpool? [y/n]: y
  Logical volume "lv_cache_cpool" successfully removed
  Logical volume "airvideo" successfully removed

sudo vgremove airvideovg2
  Volume group "airvideovg2" successfully removed

At this point, both the logical volume and the volume group are removed. We say a little prayer to ensure nothing happens with our temporary volume (/mnt/av2), that is currently in operation.

We now have to disassociate the mdadm devices from LVM.

sudo pvremove /dev/md1
Labels on physical volume "/dev/md1" successfully wiped.
sudo pvremove /dev/md2
Labels on physical volume "/dev/md2" successfully wiped.
sudo pvremove /dev/md3
Labels on physical volume "/dev/md3" successfully wiped.
sudo pvremove /dev/md4
Labels on physical volume "/dev/md4" successfully wiped.
sudo pvremove /dev/nvme0n1p1
Labels on physical volume "/dev/nvme0n1p1" successfully wiped.

You can find the physical hard drives associated with each mdadm device using the following:

sudo mdadm --detail /dev/md1
#or
sudo cat /proc/mdstat

We then have to stop all the mdadm devices and zero their superblock so that we can reuse the hard drives to set up our ZFS pool.

sudo mdadm --stop /dev/md1
mdadm: stopped /dev/md1
sudo mdadm --stop /dev/md2
mdadm: stopped /dev/md2
sudo mdadm --stop /dev/md3
mdadm: stopped /dev/md3
sudo mdadm --stop /dev/md4
mdadm: stopped /dev/md4

# Normally you also need to do a --remove after the --stop,
# but it looks like the 6.5 kernel did the remove automatically.
#
# For all partitions used in the md device

for i in sdb1 sdc1 sdp1 sda1 sdo1 sdd1 sdg1 sdn1
do
	sudo mdadm --zero-superblock /dev/${i}
done

Now with all of the old hard drives freed up, we can repurpose them to create our ZFS pool. Instead of using the /dev/sdX reference of the physical device, it is recommended to use /dev/disk/by-id with the manufacturer’s model and serial number so that the ZFS pool can be moved to another machine in the future. We also used the -f switch to let ZFS know that it is okay to erase the existing content on those devices. The command to create the pool we named vault is this:

zpool create -f vault mirror /dev/disk/by-id/ata-ST10000VN0008-2JJ101_ZHZ1KMA0-part1 /dev/disk/by-id/ata-WDC_WD101EFAX-68LDBN0_VCG6VRWN-part1 mirror /dev/disk/by-id/ata-ST8000VN0022-2EL112_ZA1E8GW4-part1 /dev/disk/by-id/ata-ST8000VN0022-2EL112_ZA1E8S0V-part1 mirror /dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA2C69FN-part1 /dev/disk/by-id/ata-ST10000VN0004-1ZD101_ZA2964KD-part1 mirror /dev/disk/by-id/ata-ST12000VN0008-2YS101_ZRT008SC-part1 /dev/disk/by-id/ata-ST12000VN0008-2YS101_ZV701XQV-part1

# The above created the pool with the old drives from the old LVM volume group
# We then added 4 more drives, 2 x 6TB, and 2 x 4TB drives to the pool

# Adding another 6TB mirror:

sudo zpool add -f vault mirror /dev/disk/by-id/ata-WDC_WD60EFRX-68L0BN1_WD-WX31D87HDU09-part1 /dev/disk/by-id/ata-WDC_WD60EZRZ-00GZ5B1_WD-WX11D374490J-part1

# Adding another 4TB mirror:

sudo zpool add -f vault mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0GTAK-part1 /dev/disk/by-id/ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E0354579-part1

We also want to add the old NVMe as ZFS L2ARC cache.

ls -lh /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D

lrwxrwxrwx 1 root root 13 Mar  2 16:02 /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D -> ../../nvme0n1

sudo zpool add vault cache /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D 

We can see the pool using this command:

sudo zpool list -v vault

NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
vault                                                  45.4T  31.0T  14.4T        -         -     0%    68%  1.00x    ONLINE  -
  mirror-0                                             9.09T  8.05T  1.04T        -         -     0%  88.5%      -    ONLINE
    ata-ST10000VN0008-2JJ101_ZHZ1KMA0-part1                -      -      -        -         -      -      -      -    ONLINE
    ata-WDC_WD101EFAX-68LDBN0_VCG6VRWN-part1               -      -      -        -         -      -      -      -    ONLINE
  mirror-1                                             7.27T  6.49T   796G        -         -     0%  89.3%      -    ONLINE
    ata-ST8000VN0022-2EL112_ZA1E8GW4-part1                 -      -      -        -         -      -      -      -    ONLINE
    ata-ST8000VN0022-2EL112_ZA1E8S0V-part1                 -      -      -        -         -      -      -      -    ONLINE
  mirror-2                                             9.09T  7.54T  1.55T        -         -     0%  82.9%      -    ONLINE
    ata-ST10000VN0004-1ZD101_ZA2C69FN-part1                -      -      -        -         -      -      -      -    ONLINE
    ata-ST10000VN0004-1ZD101_ZA2964KD-part1                -      -      -        -         -      -      -      -    ONLINE
  mirror-3                                             10.9T  8.91T  2.00T        -         -     0%  81.7%      -    ONLINE
    ata-ST12000VN0008-2YS101_ZRT008SC-part1                -      -      -        -         -      -      -      -    ONLINE
    ata-ST12000VN0008-2YS101_ZV701XQV-part1                -      -      -        -         -      -      -      -    ONLINE
  mirror-4                                             5.45T  23.5G  5.43T        -         -     0%  0.42%      -    ONLINE
    ata-WDC_WD60EFRX-68L0BN1_WD-WX31D87HDU09-part1         -      -      -        -         -      -      -      -    ONLINE
    ata-WDC_WD60EZRZ-00GZ5B1_WD-WX11D374490J-part1         -      -      -        -         -      -      -      -    ONLINE
  mirror-5                                             3.62T  17.2G  3.61T        -         -     0%  0.46%      -    ONLINE
    ata-ST4000DM004-2CV104_ZFN0GTAK-part1                  -      -      -        -         -      -      -      -    ONLINE
    ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E0354579-part1         -      -      -        -         -      -      -      -    ONLINE
cache                                                      -      -      -        -         -      -      -      -  -
  nvme-Samsung_SSD_970_EVO_Plus_500GB_S4P2NF0M419555D   466G  3.58G   462G        -         -     0%  0.76%      -    ONLINE

Once the pool is created, we wanted to set some pool properties so that in the future when we replace these drives with bigger drives, the pool will automatically expand.

zpool set autoexpand=on vault

With the pool created, we can then create our dataset or filesystem and its associated mount point. We also want to ensure that the filesystem also supports posixacl.

zfs create vault/airvideo
zfs set mountpoint=/mnt/av vault/airvideo
zfs set acltype=posixacl vault
zfs set acltype=posixacl vault/airvideo

We mount the new ZFS filesystem on /mnt/av because the /mnt/airvideo is soft-linked to the temporary /mnt/av2 volume that is still in operation. We first have to re-copy all our content from the temporary volume to the new ZFS filesystem.

sudo rsync --delete -aAXv /mnt/av2/ /mnt/av > ~/nohup.avs.rsync.out 2>&1 &

This took around 4 days to complete. We can all breathe easy again because all the data now have redundancy again! We can now bring the new ZFS filesystem live.

sudo systemctl stop apache2.service smbd nmbd plexmediaserver.service
sudo rm /mnt/airvideo
sudo zfs set mountpoint=/mnt/airvideo vault/airvideo
sudo systemctl start apache2.service smbd nmbd plexmediaserver.service

zfs list

NAME             USED  AVAIL     REFER  MOUNTPOINT
vault           31.0T  14.2T       96K  /vault
vault/airvideo  31.0T  14.2T     31.0T  /mnt/airvideo

The above did not take long and the migration is completed!

df -h /mnt/airvideo

Filesystem      Size  Used Avail Use% Mounted on
vault/airvideo   46T   32T   15T  69% /mnt/airvideo

Getting the capacity of our new ZFS filesystem shows that we now have 46TB to work with! This should last for at least a couple of years I hope.

I also did a quick reboot of the system to ensure it can come back up with the ZFS filesystem in tack and without issues. It has now been running for 2 days. I have not collected any performance statistics, but the services all feel faster.

Media Server Storage Hardware Reconfiguration

Our media server has reached 89% utilization and needs storage expansion. The storage makeup on the server uses Logical Volume Manager (LVM) and software RAID called mdadm. I can expand the storage by swapping out the hard drives with the least capacity with new hard drives with a larger capacity like I have previously done.

I thought I try something different this time around. I would like to switch from LVM to ZFS, an LVM alternative that is very popular with modern mass storage systems, especially with TrueNAS.

All filled with 8 drives

Before I can attempt the conversion, I will first need to backup all of the content from the media server. The second issue is that I needed more physical expansion space on the server to house more hard drives. The existing housings are all filled except for a single slot, which is going to be insufficient.

A related issue is that I no longer have any free SATA slots available for the new hard drives, so I purchased GLOTRENDS SA3116J PCIe SATA Adapter Card with 16 SATA Ports. Once this is installed, I have more than enough SATA ports for additional storage.

PCIe SATA Card Installed

One downside of the SATA card is that it is limited to PCIe 3.0 x1 speed. This means data transfer is limited to a theoretical maximum of 1GB/s. Given that the physical hard drives top out at 200MB/s, I don’t think we need to be too concerned about this bottleneck. We will see in terms of practical usage in the future.

I am so lucky to have extra SATA power cables and extension cables laying around and my 850W existing power supply has ample power for the additional hard drives.

How do we store the additional hard drives with a full cabinet? I went to Amazon again and purchased a hard drive cage, Jaquiain 3.5 Inch HDD Hard Drive Cage 8X3.5 Inch HDD Cage. I did not have to buy any new hard drives yet, because I had plenty of old hard drives laying around. After I put together the cage with 8 really old and used hard drives, it looks something like this:

With this new additional storage, I am now able to backup the media content from my media server. However, before I do that there is one last thing that I need to do, and that is to experiment with an optimal ZFS pool configuration that will work with my content and usage. I will perform this experimentation with the additional storage before reconfiguring the old storage with ZFS. Please stay tuned for my findings.

After booting the system with 16 hard drives, I measured the power usage and it was hovering around 180W. This is not too bad, less than 2 traditional incandescent light bulbs.


Addendum:

During my setup, I had to spend hours deciphering an issue. My system did not recognize my old hard drives. After many trials, I finally narrowed down that the GLOTRENDS card is not compatible with an old 2TB Western Digital Enterprise Drive. This is the first time that I came across SATA incompatibilities.

There is another possibility that these drives were damaged by the usage of an incorrect modular power cable. I found that these drives also do not work with my USB3.0 HDD external dock as well. This gives additional credence that the physical drive has been damaged.

The troubled drive with the SATA card

All my other drives worked fine with the card.

Another discovery is that not all modular power cables will work with my ASUS ROG STRIX 850W power supply. Initially, I thought I would use an 8-pin PCIe to 6-pin adapter along with a 6-pin to SATA power cable designed for Corsair power supplies.

PCIe Power Adapter from Amazon
OwlTree PCI-e 6 Pin Male to 4 SATA 1 to 4 SATA Female Power Supply Splitter Supply Cable for Corsair Modular RM650X RM750X RM850X RM1000X

Using the above cables will cause the power supply not to start. I had to hunt for the original cables that came with the STRIX power supply.

Learned a lot from rejigging this media server. My reward is to see my server boot up with 16 hard drives and 2 NVMe SSD drives recognized. I have never built a system with so many drives and storage before.

Wake-on-LAN for Linux

My sons upgraded their gaming computers last Christmas, and I ended up using their old parts to build a couple of Linux servers running Ubuntu servers. The idea is to use these extra servers as video encoders since they will have dedicated GPUs. However, the GPUs are also pretty power-hungry. Since they don’t need to be up 24 hours a day, I thought keeping these servers asleep until they are required would be good. At the same time, it would be pretty inconvenient to go to the servers to physically power them up when I needed them. The thought of configuring their Wake-on-LAN came to mind.

I found this helpful article online. I first found out that the Network Interface on the old motherboards supports Wake-on-LAN (WOL). Below is a series of commands that I executed to find out whether WOL is supported or not, and if so, then enable the support.

% sudo nmcli connection show
NAME                UUID                                  TYPE      DEVICE
Wired connection 1  d46c707a-307b-3cb2-8976-f127168f80e6  ethernet  enp2s0

% sudo ethtool enp2s0 | grep -i wake
	Supports Wake-on: pumbg
	Wake-on: d

The line that reads,

Supports Wake-on: pumbg

indicates the WOL capabilities, and the line that reads,

Wake-on: d

indicates its current status. Each letter has a meaning:

  • d (disabled), or
  • triggered by
    • p (PHY activity), 
    • u (unicast activity),
    • m (multicast activity),
    • b (broadcast activity), 
    • a (ARP activity),
    • g (magic packet activity)

We will use the magic packet method. Below are the commands used to enable WOL based on the magic packet trigger.

% sudo nmcli connection modify d46c707a-307b-3cb2-8976-f127168f80e6 802-3-ethernet.wake-on-lan magic

% sudo nmcli connection up d46c707a-307b-3cb2-8976-f127168f80e6
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)

% sudo ethtool enp2s0 | grep -i wake
	Supports Wake-on: pumbg
	Wake-on: g

The above changes will persist even after the machine reboots. We put the machine to sleep by using the following command:

% sudo systemctl suspend

We need the IP address and the MAC address of the machine to wake the computer up using the wakeonlan utility.

% ifconfig
enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.168.185  netmask 255.255.255.0  broadcast 192.168.168.255
        inet6 fd1a:ee9:b47:e840:6cd0:bf9b:2b7e:afb6  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::41bc:2081:3903:5288  prefixlen 64  scopeid 0x20<link>
        inet6 fd1a:ee9:b47:e840:21b9:4a98:dafd:27ee  prefixlen 64  scopeid 0x0<global>
        ether 1c:1b:0d:70:80:84  txqueuelen 1000  (Ethernet)
        RX packets 33852015  bytes 25769211052 (25.7 GB)
        RX errors 0  dropped 128766  overruns 0  frame 0
        TX packets 3724164  bytes 4730498904 (4.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

We use the above ifconfig to find the addresses highlighted in bold. Once we have the required information, we can then wake the computer up remotely by executing the wakeonlan command from another computer.

% wakeonlan -i 192.168.168.255 -p 4343 1c:1b:0d:70:80:84
Sending magic packet to 192.168.168.185:4343 with 1c:1b:0d:70:80:84

Note that the above IP address used is the broadcast address and not the machine’s direct IP address. Now I can place these servers to sleep and only turn them on remotely when I need them.

EXT4-fs Errors on NVME SSD

In my previous post, I replaced my NVME boot disk on our media server thinking that the disk was defective because the file system (EXT4-fs) was reporting numerous htree_dirblock_to_tree:1080 errors.

The errors continue to persist with the new disk, so I can eliminate the possibility of hardware as the cause of the issue.

I noticed that the htree_dirblock_to_tree:1080 errors were caused by the tar command and the time in which these errors occur coincided when the media server is being backed up. Apparently, the backup process is causing these errors with the tar command.

This backup process has remained unchanged for quite some time and has worked really well for us. I guess for some reason there is a bug in the kernel or in the tar command that is not quite compatible with NVME devices.

I had to resort to finding an alternative backup methodology. I ended up using the rsync method instead.

sudo rsync --delete \
  --exclude 'dev' \
  --exclude 'proc' \
  --exclude 'sys' \
  --exclude 'tmp' \
  --exclude 'run' \
  --exclude 'mnt' \
  --exclude 'media' \
  --exclude 'cdrom' \
  --exclude 'lost+found' \
  --exclude 'home/kang/log' \
  -aAXv / /mnt/backup

It looks like this method is faster and can perform incremental backup. However, instead of backing up to an archive file, which I later need to extract and prepare during the restoration process, I have to back it up to a dedicated backup device. Since the old NVME disk is perfectly fine, I reused it as my backup device. I have partitioned this backup device in the same layout as the current boot disk.

Device          Start        End    Sectors   Size Type
/dev/sdi1        2048    2203647    2201600     1G Microsoft basic data
/dev/sdi2     2203648 1921875967 1919672320 915.4G Linux filesystem
/dev/sdi3  1921875968 1953523711   31647744  15.1G Linux swap

The only exception is that the first partition is not marked as boot and esp, so during the restoration process I will have to mark that partition accordingly with the parted command by using the following commands:

set 1 boot on
set 1 esp on

The idea is that at 3am every night/morning, I will backup the root filesystem to the second partition of the backup drive. If anything happens with the current boot disk, the backup drive can act as an immediately available replacement, after a grub-install preparation as mentioned in the previous article.

Let us see how this new backup process works and hopefully, we can bid a final farewell to the htree_dirblock_to_tree:1080 errors!

Update: 2023-12-22

It looks like even with the rsync command, the htree_dirblock_to_tree:1080 errors still came back during the backup process. I decided to upgrade the kernel from vmlinuz-5.15.0-91-generic to vmlinuz-6.2.0-39-generic. Last night (2023-12-23 early morning) was the first backup after the kernel upgrade, and no errors were recorded. I hope this behavior persists and it is not a one-off.

Replacing NVME Boot Disk

A few months ago, the boot disk of our media server begin to incur some errors, such as the ones below:

Dec 17 03:01:35 avs kernel: [32515.068669] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #10354778: comm tar: Directory block failed checksum
Dec 17 03:02:35 avs kernel: [32575.183005] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #13500463: comm tar: Directory block failed checksum
Dec 17 03:02:35 avs kernel: [32575.183438] EXT4-fs error (device nvme1n1p2): htree_dirblock_to_tree:1080: inode #13500427: comm tar: Directory block failed checksum

The boot disk is a NVME device and I thought it may be due to over heating, so I purchased a heat sink and installed it. Unfortunately the errors persisted after the heat sink.

I decided to replace the boot disk with the exact same model which was the Samsung 980Pro 1TB. This should have been a pretty easy maintenance task. We clone the drive, and swap in the new drive. However, Murphy is sure to strike!

My usual goto cloning utility is Clonezilla, unfortunately this utility did not like cloning NVME drives. The utility resulted in a kernel panic after trying multiple versions. I am not sure what is the problem here. It could be Clonezilla or the USB 3.0 NVME enclosure that I was using for the new disk.

I resigned to using the dd command:

dd if=/dev/source of=/dev/target status=progress

Unfortunately this would have taken way too long something like 20+ hours, so I gave up with this approach.

I decided to do a good old restore of the nightly backup. I started by cloning the partition table:

sfdisk -d /dev/olddisk | sfdisk /dev/newdisk

I then proceeded with the restore of the nightly backup. Murphy strikes twice! The nightly backup was corrupted! I guess it is not surprising when the root directory’s integrity is in question. The whole reason why we are doing this exercise.

Without the nightly backup, I had to resort to a live backup. I booted system again, and performed:

sudo su -
mount /dev/new_disk_root_partition /mnt/newboot
cd /
tar -cvpf - --exclude=/tmp --exclude=/home/kang/log --exclude=/span --exclude="/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache" --one-file-system / | tar xvpzf - -C /mnt/newboot --numeric-owner

The above took about an hour. I then copy the /span directory manually, because this directory tends to change while the server is up and running.

With all the contents copied, I forgot how to install grub and had to re-teach myself again. I had to use a live copy Ubuntu USB and use that to boot up the machine, and then mount both the root and efi partitions respectively.

nvme1n1                              259:0    0 931.5G  0 disk
├─nvme1n1p1                          259:1    0     1G  0 part  /boot/efi
├─nvme1n1p2                          259:2    0 915.4G  0 part  /
└─nvme1n1p3                          259:3    0  15.1G  0 part  [SWAP]

And install GRUB.

sudo su -
mkdir /efi
mount /dev/nvme1n1p1 /efi
mount /dev/nvme1n1p2 /mnt
grub-install --efi-directory /efi --root-directory /mnt

I also have fix the /etc/fstab to ensure the root partition and /boot/efi partition are properly referenced by their corresponding, correct UUID. The blkid command came in handy to find the UUID. For the swap partition, I had to use the mkswap command before I get the UUID.

After I rebooted, I reinstalled GRUB one more time with the following as super user:

grub-install /dev/nvme1n1

I also updated the initramfs using:

update-initramfs -c -k all

For something that should have taken less than an hour, it took the majority of the day. The server is now running with the new NVME replacement disk. Hopefully this resolves the file system corruptions. We have to wait and see!

Update: The Day After

The same errors occurred again! I noticed that these corruptions occur when we do a system backup. How ironic! I later confirmed that performing the tar command on the root directory during the backup process can cause such an error. I now have to see why this is. I will disable the system backup for the next few days to see if the errors come back or not.

Experimental Machine for AI

The advent of the Large Language Model (LLM) is in full swing within the tech community since the debut of ChatGPT by openAI. Platforms such as Google Colab, and similar variants from Amazon and Facebook allows software developer to experiment with LLM’s. The hosted model of the data center based GPU’s makes training and refinement of LLM’s tolerable.

What about using LLM on a local computer away from the cloud?

Projects such as llama.cpp by Georgi Gerganov makes it possible to run the Facebook open sourced Llama 2 model on a single MacBook. The existence of llama.cpp gives hope on creating a desktop that is powerful enough to some local development with LLM’s away from the cloud. This post documents an experimental procedure in building a desktop machine using parts readily available from the Internet to see if we can do some AI development with LLM’s.

Below is a list of sourced parts from EBay, Amazon and CanadaComputers, a local computer store. All prices are in Canadian dollars and includes relevant taxes.

ItemsPrice
NVIDIA Tesla P40 24GB GDDR5 Graphics Card (sourced from EBay)$275.70
Lian-Li Case O11D Mini -X Mid Tower Black (sourced from Amazon)$168.49
GDSTIME 7530 75mm x 30mm 7cm 3in 12V DC Brushless Small Mini Blower Cooling Fan for Projector, Sleeve Bearing 2PIN (sourced from Amazon)$16.94
CORSAIR Vengeance LPX 64GB (4 x 32GB) DDR4 3200 (PC4-25600) C16 1.35V Desktop Memory – Black (sourced from Amazon)$350.28
AMD Ryzen 7 5700G 8-Core, 16-Thread Unlocked Desktop Processor with Radeon Graphics (sourced from Amazon)$281.35
Noctua NH-D15 chromax.Black, Dual-Tower CPU Cooler (140mm, Black) (sourced from Amazon)$158.14
Asus AM4 TUF Gaming X570-Plus (Wi-Fi) ATX motherboard with PCIe 4.0, dual M.2, 12+2 with Dr. MOS power stage, HDMI, DP, SATA 6Gb/s, USB 3.2 Gen 2 and Aura Sync RGB lighting (sourced from Amazon)$305.09
Samsung 970 EVO Plus 2TB NVMe M.2 Internal SSD (MZ-V7S2T0B/AM) (sourced from Amazon)$217.72
Lian Li PS SP850 850W APFC 80+ GOLD Full modular SFX Power Supply, Black (sourced from CanadaComputers)$225.99
Miscellaneous 120mm case fans and cables purchased from CanadaComputers$63.17

The total cost of the above materials is $2,062.87 CAD.

The Nvidia Tesla P40 (Pascal Architecture) specializes for Inferencing limited to INT8 based operations and does not support any FP related operations, so it may not be optimal for machine learning. However recent claims have been made that INT8 / Q8_0 quantization can yield some promising results. Let us see what our experimentation will yield once the machine is built.

A custom design 3D fan shroud has to be designed and 3D printed because the P40 does not natively come with active cooling. The P40 is originally designed to operate in a data center so cooling is provided by the server chassis. The custom shroud design is posted on Thingiverse and some photos of the finished shroud is shown below.

Note that M3 screws were used to secure the shroud to the P40 GPU card. The GDSTIME fan came with the screws.

I also made a mistake by initially getting a 1000W ATX power supply that ended not fitting the case, because the case is built for SFX and SFX-L power supplies. Lesson learned!

Once the machine is built I performed a 12 hours MemTest86+. It turned out that running the memory at the XMP profile was a bit unstable. I had to clock the memory back from its 3200MHz rating to 3000MHz.

After more than 12 hours with 3 passes.

The BIOS settings had to be configured so that Resize BAR is ON. This is required for the P40 to function properly.

Turn on Resize BAR

The next step is to install Ubuntu 22.04.3 LTS with Nvidia GPU and CUDA drivers. The latter was quite challenging. The traditional way of installing using the package manager did not work. The best way is to goto this site, and pick the run file like below:

Beside to use the runfile

The run file had to be run in recovery mode using the console because the installation will fail if an X11 window manager is running. Also all previous Nvidia drivers had to be removed and purged. The Ubuntu default installation process may have installed them.

A detail that was left out of the instructions is to set the appropriate shell paths once the installation is completed. The following changes were made with /etc/profile.d so that all users can benefit. If the login shell is using zsh, then /etc/zsh/zshenv has to be changed. Without this change, commands such as nvcc and other CUDA toolkit commands will not be found. The same is true for CUDA related share libraries.

$cat /etc/profile.d/cuda-path.sh

export CUDA_HOME="/usr/local/cuda"

if [[ ! ${PATH} =~ .*cuda/bin.* ]]
then
    export PATH="${PATH}:/usr/local/cuda/bin"
fi

if [[ ! ${LD_LIBRARY_PATH} =~ .*cuda/lib64.* ]]
then
    export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/cuda/lib64"
fi

if [[ ! ${LD_LIBRARY_PATH} =~ .*/usr/local/lib.* ]]
then
    export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"
fi

In this hardware configuration the AMD CPU has integrated graphics, and the P40 does not have any HDMI or DisplayPort connections. We need to change the X11 configuration so that it will only use the AMD CPU while dedicating the P40 GPU for CUDA based computation. The following configurations have to be made in /etc/X11/xorg.conf:

$cat /etc/X11/xorg.conf

Section "Device"
    Identifier      "AMD"
    Driver          "amdgpu"
    BusId           "PCI:10:0:0"
EndSection

Section "Screen"
    Identifier      "AMD"
    Device          "AMD"
EndSection

The BusId can be obtained using the lspci command and be sure to change any hexadecimal notations to decimal in the configuration file. Without this xorg.conf configuration, the Ubuntu desktop will not start properly.

When everything is done properly, the command nvidia-smi should show the following:

Fri Aug 25 17:33:31 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      Off | 00000000:01:00.0 Off |                  Off |
| N/A   22C    P8               9W / 250W |      0MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

The machine is now ready for user account configurations.

A quick video encoding using ffmpeg with hardware acceleration and CUDA was performed to test the GPU usage. It was a bit of a challenge to compile ffmpeg with CUDA support. This is when I found out that I was missing the PATH configurations made above.

For good measure, gpu-burn was run for an hour to ensure that the GPU is functioning correctly.

Next step is to download and setup the tool chain for LLM development. We will save that for another posting.

Update: The runfile (local) method did not preserve through a system update using apt. I had to re-perform the installation with deb (local) methodology. I guess after not using the GPU for the desktop, we no longer have to run the operating system in recovery mode to install using the deb (local) method.

Tesla Referral

The Tesla Electric Vehicles (EV’s) are getting really popular. Don’t take my word for it, check it out for yourself with the chart below:

Statistic: Number of Tesla vehicles delivered worldwide from 1st quarter 2016 to 2nd quarter 2023 (in 1,000 units) | Statista
Find more statistics at Statista

The above trajectory is pretty clear.

We have had our 2023 Model Y for almost three months now and are very happy with the driving and ownership experience. Many of our family, friends, and neighbours have been asking about the vehicle.

I have always referred them to my blog here for our account of the Tesla purchasing and driving experience.

Now for a very shameless plug. Anyone who is interested enough to start the path of purchase, below is my referral code:

Short Road Trip to University of Waterloo

This morning we wanted to show our son what the University of Waterloo campus looked like. This is also an excellent opportunity to get use to our new Tesla Model Y along with its Autopilot functionality.

As the above pictures show, the campus is a ghost town, but it was nice that most of the buildings were still opened and we took the opportunity to walk around.

Aside from the typical Toronto traffic, the drive itself was pretty uneventful. While on Highway 401, I had the car on both Auto Steer and Traffic Aware Cruise Control (TACC) for more than 80% of the trip. For the most part it worked well, but I did notice a few points:

  • Passing trucks seem to be a lot closer than I like when I manually pass a semi on the highway;
  • When in Auto Steer, the car seem to initiate the turn a little later than what I would normally do; and
  • Finally, one can get really use to TACC and forget to press the brake when TACC is off;
  • I love TACC when in traffic – it is a godsend!

We also took the opportunity to do a quick test supercharge in Cambridge, Ontario at 22 Pinebush Rd, Cambridge, ON N1R 8K5.

We really did not have to charge, since our Long Range model had enough juice to go there and back. We just wanted to experience what supercharging is like. It was super simple! We added 10kWh for $3.93, and that took about 6 minutes. During this time, I showed my wife how to do it.

Another experiment that we did was I ran Waze on my iPhone in concurrent with Tesla’s navigation routing and mapping. Albeit I currently have a sample size of one, but my feeling is that Waze is still better at this point in terms of traffic awareness and routing abilities. Also Tesla maps do not show obstructions, speed traps, and other goodness of Waze. I will be doing more of this comparison on our trip to Montreal next Friday.