Plenty of technical baggage for Xeon

Reading through Xeon Bang Per Buck, Nehalem to Broadwell, and came across this gem:

Nehalem is the touchstone against which all Xeons should be measured because this is when Intel got rid of so much technical baggage that had been holding the Xeons back.

Sure, FSB was ditched for QPI, and the memory controllers were brought on-die, but there’s still a heck of a lot of technical baggage being carried around. I started writing this in 2016. It’s now 2020. Do modern Xeons really need to continue carrying around legacy in hardware from the original 8086 and other x86 “mileposts” (80286, 80386, 80486, pentium, pentium pro, etc…)? I bet there’s a Greenlow or Denlow board somewhere in embedded land with ISA slots plumbed in through eSPI, booting DOS 6.

EFI has been shipping since 2000, and Apple started shipping Intel-based Macs with EFI in 2006. Why are we still booting in 16-bit real mode? Why did it take so long for option ROMs for modern on-board ethernet on Intel reference hardware (and PCSD’s commercial versions) to be re-coded for EFI in the intervening decade? (Whitley finally dropped real-mode opROM support.) Add a software-based 16-bit emulator for legacy option ROMs optionally loaded during DXE phase? (DEC had a MIPS emulator in turbochannel Alphas for running turbochannel option ROMs. Sun and other openfirmware systems went one better and used architecture-independent FORTH bytecode.)

The necessity for direct hardware support of legacy 32-bit support is questionable — BIOS claims it needs 32-bit to meet code-size requirements, but the code is not exactly tidy or concise. There seems to be a vicious circle between IBVs and BIOS developers unwilling to update crufty code, with long-aged code bases suffering from a tragedy of the commons, with no clear stewardship or short-term monetary payout for cleaning things up. I do have a few BIOS buddies who grok this and are trying to do what they can, but there’s a lot of momentum in technical debt that’s coming up on two decades old in a company that has historically viewed software as an enabler for hardware.

Surely 32-bit backwards compatibility could be handled at the OS and software layer, not the hardware? IA64 bungled implementation of IA32 backwards compatibility by not making it performant, but that doesn’t make it an inherently bad idea.

Don’t get me started on SMM… or maybe that’s a rant for another post.

(posted from the depths of my drafts years after it was started…)

coalescing the VMs

I got a Romley (dual e5-2670 Jaketowns) last November with the plan to pull in the VMs from the three Xen hosts I currently run. I’ve named it “Luxor.” It idles at around 150W, which should save me some power bill, and even though it only currently has 1TB of mirrored storage, thin LVM provisioning should allow me to stretch that a bit. It’s easily the fastest system in my house now, with the possible exception of my wife’s haswell macbook pro for single-threaded performance.

Luxor has 96GiB [now 128GiB] of memory. I think this may exceed the combined sum of all other systems I have in the house. I figured that the price of the RAM alone justified the purchase. Kismet. Looking at the memory configuration, I have six 8GiB DIMMS per socket, but the uneven DIMMs-per-channel prevents optimal interleaving across the four channels. Adding two identical DIMMs or moving two DIMMs from one socket to another should alleviate this. (I doubt it’s causing performance regressions, but given that the DIMMs are cheap and available and I plan on keeping this machine around until it becomes uneconomical to run (or past that point if history is an indicator), DIMMs to expand it to 128GiB should be arriving soon.

In mid-December, the first olde sun x2200m2 opteron (“Anaximander”) had its two VMs migrated and was shut down. A second x2200m2 (“Anaximenes,” which hosts the bulk of my infrastructure, including this site,) remains. While writing this post, a phenom II x2 545 (“Pythagoras”), had its 2TB NFS/CIFS storage migrated to my FreeBSD storage server (“Memphis”) along with some pkgsrc build VMs and secondary internal services.

Bootloader barf-bag for x86 is still in full effect. I couldn’t figure out how to PXE without booting the system in legacy BIOS mode, and I gave up trying to get the Ubuntu installer to do a GPT layout, let alone boot it. I figure I can migrate LVM volumes to new disk(s) on GPT-backed disks, install EFI grub, switch system to EFI mode, and Bob’s your uncle. (He’s my brother-in-law, but close enough.) At least that’s the plan.

The VMs on Anaximenes have been a little trickier to move, since I need to make sure I’m not creating any circular dependencies between infrastructure VMs and being able to boot Luxor itself. Can we start VMs without DHCP and DNS being up, for instance?

Systemd is a huge PITA, and isn’t able to shut down VMs cleanly, even after fiddling with the unit files to add some dependency ordering. Current theory is that it’s killing off underlying qemu instances so the VMs essentially get stuck. Running the shutdown script manually works fine and the VMs come down cleanly.

what happened to the minicomputer?

In a presentation by Gordon Bell (formatting his):

Minicomputers (for minimal computers) are a state of mind; the current logic technology, …, are combined into a package which has the smallest cost. Almost the sole design goal is to make the cost low; …. Alternatively stated: the hardware-software tradeoffs for minicomputer design have, in the past, favored software.
HARDWARE CHARACTERISTICS
Minicomputer may be classified at least two ways:

  • It is the minimum computer (or very near it) that can be built with the state of the art technology
  • It is that computer that can be purchased for a given, relatively minimal, fixed cost (e.g., $10K in 1970.)

Does that still hold? $10k in 1970 dollars is over $61k in 2016 dollars, which would buy a comfortably equipped four-socket brickland (E7 broadwell) server, or two four-socket grantleys (E5 broadwell). We’re at least in the right order-of-magnitude.

Perhaps a better question is whether modern intel xeon platforms (like grantley or upcoming purley) are minimal computers? Bell had midi- and maxicomputer as identified categories past the minicomputer, with a supercomputer at the top.

We are definitely in the favoring-software world — modern x86 is microcoded these days, and microcontrollers are everywhere in modern server designs: power supplies; voltage regulators; fan controllers; BMC. The Xeon itself has the power control unit (PCU), and the chipset has the management engine (ME). Most of these are closed, and not directly programmable by a platform owner. Part of this is security-related — you don’t want an application being able to rewrite your voltage regulator settings or hanging the thermal management functions of your CPU. Part of it is keeping proprietary trade secrets, though. The bringup flow between the Xeon and chipset (ME) is heavily proprietary, and a deliberate decision to not support third-party chipsets by Intel has this continuing to stay in trade secret land.

However, I argue that modern servers have grown to the midi- if not maxicomputer level of complexity. Even in the embedded world, the level of integration on modern ARM parts seems to put most of them in the midicomputer category. Even AVRs seem to be climbing out of the microcomputer level.

On the server side, what if we could stop partitioning into multiple microcontrollers and coalesce their functionality? How minimal could we make a server system and still retain ring 3 ia32e (x64) compatibility? Would we still need the console-in-system BMC? Could a real-time OS on the main CPU handle its own power and thermal telemetry? What is minimally needed for bootstrapping in a secure fashion?

I’ll stop wondering about these things when I have answers, and I don’t see any. So I continue to jump down the platform architecture rabbit hole…