AI Server Bytes the Dust

I have an old Raspberry Pi running Volumio to stream my music library to my living room home theatre. This morning, I needed to perform an update from Volumio 3 to Volumio 4. After I did the upgrade, the Raspberry Pi acquired a new IP address, which I need to discover through my Unifi Dream Machine Pro (UDMPro) Max web based user interface. It is then that I noticed that all the virtual machines hosted using Proxmox running on our AI Server have dropped off from my network. This is the AI Server that I built back in August of 2023, and discussed in this post.

I thought all I needed to do was a reboot, still no network connection. The networking interface seems to be off. I plug in a keyboard into the server, and added a monitor. No video signal, and the keyboard did not respond, not even the NUMLOCK LED worked. This is not good. All signs point to a hardware failure.

I pulled out PCIe cards one by one and try to resuscitate the server. No good. With a bare-bones motherboard, memory, and CPU, it still did not respond. I couldn’t even get into the BIOS. The fans were spinning, and the motherboard diagnostic LED’s point to some error when it is trying to initiate video / VGA.

I ended up finding a possible replacement motherboard, Gigabyte B550 Gaming X V2, at a local Canada Computers for $129 (before tax), and some thermal paste for $9.99 (before tax) to reseat the CPU and the cooler.

Replacement Board

The good news is that after replacing the motherboard, I was able to get into the BIOS. However when I try to boot the machine with the used Nvidia P40 card, it failed to boot again. I had to forego this card. The GPU could have been damaged by the old mainboard, or the GPU could have been damaged first and caused the mainboard to fail. At this point I am too tired to play the chicken or the egg game. I simply left the card out, and restore Proxmox on the server. It will no longer be an AI server, but at least the virtual machines on the server can be recovered.

Proxmox boots but will not shutdown. I had to undo the PCIe passthrough configurations that I did when I build the AI Server. This involved editing the GRUB configuration so that all the special options are removed in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT=""

Before it had configurations containing options to make use of IOMMU and the vfio modules. After this update, I had to perform the following commands:

update-grub
update-initramfs -u -k all

I then proceed to reboot the system, and the system behaved normally. During this process I also found out that Proxmox will not start normally if any of the mounts configured in /etc/fstab are not available. This threw me for a loop because the regular USB backup drive was disconnected when I was trying to resolve this issue.

Since the PCIe bus has different peripherals, I knew from my past experience which I detailed here, I have to edit the /etc/network/interfaces file with the new interface name. The following command really helped me identify the new name and which NIC I should pick, because there were multiple interfaces, and I wanted to pick the 2.Gbps one.

lshw -class network

In the end, all of the virtual hosts are now up and running. I hope this new motherboard proves to be more stable without the used P40 GPU. Fingers crossed!

Linux Boot with No Networking

GLOTRENDS PA09-HS M.2 NVMe to PCIe 4.0 X4 Adapter

I recently wanted to install an M.2 NVMe to PCIe 4.0 X4 Adapter on an existing server. The idea was to install a new NVMe SSD drive, and the motherboard had no more M.2 sockets available.

The server is running Proxmox with Linux Kernel 6.8.12. I thought this should be a 15-minute exercise. How wrong I was. After installing all the hardware, the system booted up but there was no networking access. This was especially painful because I could no longer remote into the server. I had to go pull out an old monitor and keyboard and perform diagnostics.

I used the journalctl command to diagnose the issue, and found the following entry:

Feb 01 13:36:21 pvproxmox networking[1338]: error: vmbr0: bridge port enp6s0 does not exist
Feb 01 13:36:21 pvproxmox networking[1338]: warning: vmbr0: apply bridge ports settings: bridge configuration failed (missing ports)
Feb 01 13:36:21 pvproxmox /usr/sbin/ifup[1338]: error: vmbr0: bridge port enp6s0 does not exist
Feb 01 13:36:21 pvproxmox /usr/sbin/ifup[1338]: warning: vmbr0: apply bridge ports settings: bridge configuration failed (missing ports)

The above error message indicates that enp6s0 no longer exists. When I looked at earlier messages, I noticed this one:

Feb 01 13:36:15 pvproxmox kernel: r8169 0000:07:00.0 enp7s0: renamed from eth0

It looks like the interface name has been changed from enp6s0 to enp7s0. Therefore the correct remedy is to edit the /etc/network/interfaces to reflect the name change. Below is the new content of the file.

# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp7s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.188.2/24
        gateway 192.168.188.1
        bridge-ports enp7s0
        bridge-stp off
        bridge-fd 0

iface wlp5s0 inet manual

This would be very annoying if the old interface name was used in many other configuration files. There is one other reference that I found on the Internet (https://www.baeldung.com/linux/rename-network-interface) detailing a way to change the network interface name using the udev rules. I did not try this, but something to keep in mind in the future.

In a previous post and on another home server, I did fix the name using netplan, but Proxmox is not using it.