Adventures in Virtualization

I currently pay ~$150/mo for my ISP. It’s a legacy commercial account through my local cable provider, and is 50mbit down, and 10mbit up, with a static IP. Gigabit fiber is available for < $100/mo, but without a static IP. The cable provider has “helpfully” suggested I update my account to higher speeds at higher prices, but it occurred to me that there’s no reason my home needs a static IP if I can set up a tunnel from home to my VPS, which also has a static IP. I take my public services and instead of reverse-proxying them to my local router, I proxy them over VPN to my VPS.

The first step of course is updating my VPS to the “latest and greatest” (it’s still running a very svelte, firewalled, and outdated NetBSD-7) but also to get some hopefully improved VPN capability, because while it would be neat to get an IPSEC tunnel running, (possibly over IPv6,) I suspect it will be a lot easier in practice to get the newly available wireguard support working.

My VPS is one of the few who doesn’t mind running NetBSD, and started out with 32-bit PV instances running on a NetBSD dom0. They shifted to Debian Linux as the company grew, but the lowest-spec instances were still 32-bit PV when I first signed up. They didn’t directly advertise them; you had to do a “view source” in the HTML to find the obscured link, but the single vCPU, 1.4G memory, and 18G storage is plenty for my needs as a DNS server, and soon as a VPN server. (The disk space and the physical CPU backing the vCPU have grown over the years.)

Linux dropped 32-bit PV support in Linux 5.9, and Xen in general has been moving towards PVH from PV, since it utilizes hardware features which were introduced after PV, like page table updates. I was a bit surprised to discover my home server running Ubuntu 22.04, kernel 5.15.0, with Xen 4.16.0 supports PVH, but no longer supports 32-bit PV. A freshly compiled NetBSD-9 32-bit PV kernel completely crashed on my VPS, NetBSD-10 32-bit PV kernel got a little farther but crashed with MMU errors, and the lack of support for 32-bit PV on my home server prevents me from doing more extensive debugging. The solution seems to be to update my VPS to 64-bit PV, which makes things more consistent between my VPS and home.

PVH seems the future, though. A directly specified NetBSD-10 GENERIC kernel seemed to boot, but the Xen docs and some searching indicated that there is a version of OVMF which can be used as well? As a UEFI developer who has been hacking on the build system to greatly improve build times as a side-project, this is interesting. It looks like I have to build it from scratch, though, which I do frequently at work, but doing it under NetBSD instead of Linux or Windows (gasp) has some challenges.

Instead of booting UEFI, it looks like there is the possibility of pvhgrub. Ubuntu includes a 32-bit PVH version of grub, but I can’t seem to find a 64-bit version? Maybe since it’s a naked HVM it runs grub2 in 32-bit mode and then flips to 64-bit as part of the boot process? I have no idea how to boot a NetBSD kernel from a grub2> interactive prompt — it seems to reject the magic number of my kernel, but at least it can see the GPT partition and knows about the UFS2 filesystem. Maybe a menu.lst or grub.cfg and it will magically start working.

For now I’m booting a kernel directly from the xl.cfg side and get on to bootstrapping everything. Maybe ye olde pygrub bootloader will still work here?

coalescing the VMs

I got a Romley (dual e5-2670 Jaketowns) last November with the plan to pull in the VMs from the three Xen hosts I currently run. I’ve named it “Luxor.” It idles at around 150W, which should save me some power bill, and even though it only currently has 1TB of mirrored storage, thin LVM provisioning should allow me to stretch that a bit. It’s easily the fastest system in my house now, with the possible exception of my wife’s haswell macbook pro for single-threaded performance.

Luxor has 96GiB [now 128GiB] of memory. I think this may exceed the combined sum of all other systems I have in the house. I figured that the price of the RAM alone justified the purchase. Kismet. Looking at the memory configuration, I have six 8GiB DIMMS per socket, but the uneven DIMMs-per-channel prevents optimal interleaving across the four channels. Adding two identical DIMMs or moving two DIMMs from one socket to another should alleviate this. (I doubt it’s causing performance regressions, but given that the DIMMs are cheap and available and I plan on keeping this machine around until it becomes uneconomical to run (or past that point if history is an indicator), DIMMs to expand it to 128GiB should be arriving soon.

In mid-December, the first olde sun x2200m2 opteron (“Anaximander”) had its two VMs migrated and was shut down. A second x2200m2 (“Anaximenes,” which hosts the bulk of my infrastructure, including this site,) remains. While writing this post, a phenom II x2 545 (“Pythagoras”), had its 2TB NFS/CIFS storage migrated to my FreeBSD storage server (“Memphis”) along with some pkgsrc build VMs and secondary internal services.

Bootloader barf-bag for x86 is still in full effect. I couldn’t figure out how to PXE without booting the system in legacy BIOS mode, and I gave up trying to get the Ubuntu installer to do a GPT layout, let alone boot it. I figure I can migrate LVM volumes to new disk(s) on GPT-backed disks, install EFI grub, switch system to EFI mode, and Bob’s your uncle. (He’s my brother-in-law, but close enough.) At least that’s the plan.

The VMs on Anaximenes have been a little trickier to move, since I need to make sure I’m not creating any circular dependencies between infrastructure VMs and being able to boot Luxor itself. Can we start VMs without DHCP and DNS being up, for instance?

Systemd is a huge PITA, and isn’t able to shut down VMs cleanly, even after fiddling with the unit files to add some dependency ordering. Current theory is that it’s killing off underlying qemu instances so the VMs essentially get stuck. Running the shutdown script manually works fine and the VMs come down cleanly.

sunset

The first of three sun V20zs was decommissioned tonight. These were all surplus, and they sat for probably a year until I actually got around to dispositioning them. I think the machine failures of my sparc sun hardware were lining up. The sparc 2 I had been using as my gateway router failed due to bad cache, and I had replaced it with an ultra 5. The dual 50MHz sparc 20 was already long in the tooth, and a drive was starting to fail. I needed some place to land a new mail server, and I figured I’d create a virtual one, so I could collapse it to another VM host when the time came.

It served well while I got faster and more capable machines online. I even ran Xen on it for a time until I got newer x2200m2s online. Although the Opteron 250s in the v20z were 64-bit with a whopping 8GB of RAM, they didn’t support hardware virtualization, so I was paravirtualized only.

I recall the I/O under Xen PV being decent, with build.sh times comparable between bare metal and a DomU. Build times for NetBSD-6.0 were a little over 2hrs on bare metal, and under 2hrs on a Linux DomU running under NetBSD Dom0. Obviously I/O bound, and some help from Linux’ FS caching subsystem doing a better job than NetBSD. 🙂

The service processor (remote management module) on the v20z was an embedded PPC running Linux, with an oddball CLI on it. I still had to deal with stupid PCisms like having to attach to the console and go to BIOS to change settings, but it was definitely an improvement over run-of-the-mill PC hardware.

One of the two remaining V20zs is running Joyent’s SmartOS system, primarily for fiddling with ZFS on a pile of SCA drives. The other V20z is unpowered and will be examined before being added to the stack later this week to see if it has any 2GB DIMMs to donate to the SmartOS cause. (and maybe get a dmesg and openssl benchmark.)

The V20z was an example of PC architecture taking a step up into serverhood, with the first generation Opteron kicking Intel while it was floundering with the Pentium 4. These machines still seem to be plenty fast to me, and one of the reasons I’m ditching them is that I’m finally getting a handle on just how much more CPU power newer systems have, not to mention power efficiency gains. I/O continues to be a sore point, and these are still SCA systems, so they are not trivially upgraded to SSDs. My kill-a-watt measurements showed 230W idle, and 275W while active.

Heh. The second system on the kill-list has only four 512M DIMMs. It will head onto the ice floe tomorrow after a disk yank and SP reset. But not before yanking 4GB of RAM from the first system to put into the remaining V20z. 🙂