Why Bytemark didn’t use Xen

We’re circling in on a test release of our new hosting system, called BigV.  Bytemark have never thrown more hardware or programmers at a project for so long, so if you’re in our fan club you might like the results.  I’m a little shy of making promises just yet, but I’ll write about a few of the problems we’re solving, and give you a bit of history, and why I think Xen is fading away.

Bytemark got started in the hosting business with virtualisation, and knowing how emulators work.  My personal background was contributing to the PC emulator product on Acorn’s RISC OS (back in 1997), as well as a couple of Java virtual machines, so I know how to build an emulator, and what’s magic about them.

VMWare was king of the hill for years – they pulled off a genuinely amazing technical feat, and had a monopoly on practical PC-on-PC virtualisation from about 1998 to 2002, building a huge business from it.  We couldn’t afford their fees when we started in 2002, so we went with User-Mode Linux.  That was a very practical (and free) way of running Linux on Linux, and along with a few other companies, we sold lots of virtual machines to geeks.

In 2004, Xen looked a-mazing, providing virtualisation on Linux by some clever source modifications, and a new hypervisor.

If you jumped through the necessary hoops, did a lot of work rebuilding your kernel and selecting your hardware carefully, it absolutely flew.  We even built a trial version of our virtual machine platform in 2005 that used Xen, and tried it with real customers for a few months.  It almost worked, but we were never confident to roll it out.  There were a few reasons, and bear in mind I’m talking ancient history here:

  1. We’ve always been picky about our hardware, and habitually had to build our own Linux kernels to take advantage of newer chips.  Xen’s slow integration into Linux put the brakes on this, and made us nervous that we would have to stick with particular hardware.  For years, it was really hard to make boot reliably on a variety of hardware, and our virtual machine hosts changed fairly regularly.
  2. Xen was never happy providing its unique virtualisation magic separate from its other aspirations.  There was (still is?) a compulsory management layer, a program called xend which used to crash and left you with no idea what the hypervisor was doing.  It had undocumented layers of code, and the only safe way to interact with it was through the xm command line.  When xend died, we were lucky if it restarted, we normally had to reboot the whole system, and all the virtual machines.  So it seemed impossible to make our control software reliable, and I needed some help.
  3. I collared Ian Pratt, one of Xen’s creators, after a conference talk in 2005. I asked whether XenSource were interested in helping a real hosting provider solve some problems with Xen and our few hundred customers.  Knowing the answer I think, he asked “sure, how many data centres and how much money do you have?”  Pffft, thanks.  I’d chipped in a little but assumed they had all the help they needed.
  4. I spoke to a prominent kernel hacker at (I think) LUGRadio Live 2007.  I asked about virtualisation, and Redhat’s recently-announced integration of Xen into their operating system.  To paraphrase, he did not think the results pretty, and confirmed my suspicion that Redhat’s libvirt project was simply paving a migration path for something better to come along.  It was built to avoid a lock-in to a technology the developers simply didn’t like.  A year later, they bought Qumranet and the KVM developers.  That deal was probably already being prepared the year before.

It was after this last conversation that I felt OK about leaving our systems on User-Mode Linux for a bit longer; even though it was slower than Xen, I didn’t want to compromise our management tools.  We kicked off our migration to KVM in 2009 and apart from a scheduled reboot each, our customers only noticed a better speed compared to User-Mode Linux.  We got a few more VMs onto each host as well, and didn’t have to make huge changes to our tools either.

I’m not an economic crystal ball gazer, but I do know code, and coders and when something new like this comes along, a better-integrated solution will win over the fast one every time.

In 2011, Xen imposes too much overhead on developers and sysadmins compared to the alternatives.  It had two unique selling points, and they’ve not been unique for a while:

Firstly, Xen brought speed through paravirtualization, but integration into Linux came too late.  While it was still a unique feature, it was a pain to use.  Between virtio in 2008 and Intel/AMD’s on-chip virtualisation features, it’s now completely unnecessary, even if the integration work is done.

Secondly, live migration was brought by KVM, also in 2007/8.

Xen’s “missing” feature, unmodified operating system support (i.e. installing Windows etc. off a CD), was based on the exact same code that KVM used, the completely brilliant qemu project.  So with the state of those three features, there is just no technical advantage to Xen codebase. That’s why I think in the next few years, most companies with Xen deployments will be replacing them with KVM – because KVM can do everything that Xen does, but makes it easier to supervise and write tools around.

There’s even more options than that – VirtualBox works nicely for desktops and small setups, Hyper-V works better for Windows, and of course VMWare is a lot cheaper than it was; they clearly have the lead in management tools if you don’t mind paying for them.  At the hackier end of the spectrum, lguest appears to replace most of what User-Mode Linux did – I’d be very surprised if some hosting companies weren’t using that already.

So that’s where we are with Xen – it was a great idea, it didn’t work for us, but obviously works very nicely for Amazon and lots of smaller hosts.  I’m glad Bytemark gave it a miss; we intend to provide the absolute best of the KVM emulator to hosting customers when BigV launches later this year.