Ruby gems, and when we’ll be shot of them

There comes a time in every sufficiently ambitious Ruby programmer’s life that one will butt up against gems, Ruby’s own packaging system.  Outside of the cosy confines of one’s laptop, on servers and embedded systems, the system contains design mistakes that make them unmanageable.  Here’s how we cope at Bytemark.  But how did it get here?

The world of programmers

So computer languages have always had some idea of a library, a set of functions, classes and other extensions that make the language more useful  At the earliest and most basic, #include tells  a C compiler that you  wish to load a library of functions for input and output called stdio – printing lines of text, opening files and so on.

In Ruby, there are two sets of libraries which have shipped with the language.  These are the core and standard libraries, and Ruby programmers know that they’re always available.  If you want to program with network sockets, you say require ‘socket’ and you know that you can use TCPServer and friends to start writing high-level network programs.  In common with other scripting languages, if a programmer wanted to use another library on his system, he would install its files into /usr/local/lib/ruby, and could then ‘require’ those files in any program.

That’s the same simple method that Perl, Python and many other scripting languages have used, and it’s easy to describe and understand.  It’s also easy for Operating System vendors and Linux distributions to package.

The world of system administrators

Since the early 90s, the Debian project defined the state of the art in large-scale, long-term system administration.  Before Debian (and yum, and the other systems it inspired), admins either routinely did their own software builds, or ran crude packaging systems which were no more complex than a tar file.  Debian rejected the idea that system administrators needed to constantly reinstall to "blow away the cobwebs", and hammered out packages and a distribution structure that has allowed admins to avoid reinstalling some individual systems for 10+ years.

Debian have packages for everything from word processors to hardware drivers.  An administrator can define the state of a whole computer system just through listing what packages are installed, and a handful of configuration files – no clicking through installers, or manual stages.  Debian’s thousands of volunteers have built and tested stable packages based on the creme of free software.   Users of a Debian-based OS can have their pick of the best of it, and know it will work together.  So in 2009 you can say "I want a system with openoffice installed", and Debian’s package management system will perform a complex set of resolutions to find out what libraries openoffice depends on.  It will install those in the correct order, right down to the operating system kernel and graphics drivers, so that after downloading, unpacking and running setup scripts, the user goes from having an empty system to a working word processor.

This same process works the same on a hundred Debian servers, even when those servers have very different hardware underneath.  This is still an awesome achievement, and Debian are still leading the way in defining how software installation on computers should work.

Crucially, Debian package programming libraries for lots of languages, and authors writing for Debian systems can rely on scores of them being available in a predictable way.  But right now, in Debian and most other Linux distributions, there are some major gaps in coverage for widely-used Ruby libraries, which means dependent applications can’t be packaged.  How did this situation come about?

Blame the web app startups!

Well, at some point in the last 10 years, the rise of Google caused all programmers everywhere to lose their minds.  They decided that users shouldn’t install their own programs on their own computers, but instead should trust installation and data storage to the programmer exclusively.  Sophisticated, fast, reliable programs started being replaced by simple, slow flakey ones as a result.

Just a joke!  We all love web applications really.

No – more reasonably speaking, a handful of smart programmers, all at once, had found a way out of the dreary mire of web applications in 2004-5, and a frontier was formed with Rails and various other programming libraries leading the charge.

On this frontier, distributing and installing applications doesn’t matter.  Paul Graham was probably the first to say that when you’re selling a web-based application, you have an advantage that you can use any language you want because you don’t have to distribute it.  As long as the programmer can install his  program on his own little set of servers, and handle his users’ data storage, nobody else ever had to see his code, what libraries he used, or how it was put together.  But also it means he doesn’t have to conform to traditional system administration practices either, and I think this is the root of the problem.

Within the Ruby community, packaging for widespread distribution has been an afterthought to pushing forward the frontier of what the language can do.  Distributable Ruby applications are thin on the ground (because right now, most people writing web apps want to sell access to them, rather than distribute them for free), but there are lots of great libraries out there.

Problem solved?

But instead of settling on a simple method of managing local installations like Perl folk did with CPAN, Rubyists settled on Rubygems.

In theory, a system administrator types gem install hoopystuff to install a gem called ‘hoopystuff’.  The gem program goes away, finds the hoopystuff gem, and installs it on his system.  If there’s anything that needs compiling, gem compiles it and puts it in the right place.  Then any program that wants hoopystuff can start to use it.  There is a master list of gems maintained, a network of mirrors and a signing mechanism, a lot like other packaging systems.  There is a some wheel re-inventing going on, but it means Ruby programmers don’t have to worry about supporting every possible system, which seems like a win.  It even allows programmers to install libraries on a shared system, without needing full privileges over it, which is a useful feature.

But the biggest mistake made in Gems was to add to the language.  In Java, or C, or Python, or any other language, to include a library, you do the same thing, regardless of who installed the library, or where.  But in Ruby, a gem command was added to the language.  And you need the rubygems library included first in order to use that command.  So if a programmer wants to use the hoopystuff library he’s installed as a gem, the obvious doesn’t work any more:

require 'hoopystuff'

Instead he has to do:

require 'rubygems'
gem 'hoopystuff'
require 'hoopystuff'

But if he has installed hoopystuff through his system distribution, rather than Rubygems, this will fail!  So a thorough way of including the library has to be a full six lines of code:

begin
  require 'rubygems'
  gem 'hoopystuff'
rescue LoadError => no_gems_error
  # no Rubygems library installed, or no 'hoopystuff' gem
end
# either way we need to do this, if this fails the library definitely isn't here
require 'hoopystuff'

This is now the only portable way of asking for a library in Ruby – hardly the principle of least surprise.  And through the fog, many library authors assume that the ‘gem’ command is available where it’s not.  So packaging almost any contemporary Ruby application involves altering its code.  The Debian packagers are asking Ruby authors very nicely to bear this in mind, but to little effect.

To add to the confusion, the gem command also allows multiple versions of the same package on a single system.  This means that instead of simply asking for gem ‘hoopystuff’, the programmer can ask for ‘hoopystuff’ version 1.23.  Unfortunately another part of the same program can ask for ‘hoopystuff’ version 1.5, Ruby  will die at that point, saying that it can’t load two versions of the same gem in the same program.  I don’t think I’m going out on much of a limb when I say nobody needs this feature and if you think you do, you’re not clever enough to use it propely.  I have "fixed" plenty of conflicting gem invocations in live apps where both pieces of code are demanding different versions of a gem, where the same one works fine.

Why can’t we convert?

It’s this last requirement to allow multiple versions that makes gems fundamentally incompatible with every other package management system – Debian, Redhat, SuSE … all of them allow one version to be installed on a system.  So it’s impossible to do a clean mapping of gems onto .debs or .rpms or any other mature packaging structure, because once you add enough applications into the mix, they can all be demanding different versions of the same Gem, and these demands are expected to be met.

At the start of 2009, the talented guys at Phusion made an attempt called DebGem where they took every version of every gem they could find, and made a Debian package out of it, baking the Gem version number into the name of the package.  It looked weird, and appeared to work.  But the project has been silent since April, and the Linux distributions they supported are fading into irrelevance.  My guess is they couldn’t stomach the amount of manual work needed to tweak every Ruby programmer’s misunderstandings about Gems.  (but Phusion dudes, if it was just the expensive hosting, Bytemark will still donate as many mirrors & build hosts as you need).

In contrast, the Perl folk have a simple site that routinely converts Perl packages into compatible debs, and allows them to be installed, and integrated into official distributions easily.  All because the packaging system is simpler.

So what are the options left for a Ruby programmer who wants to ship portable software to a wide variety of users?

Option 1: A traditional build system

What I’m doing with a couple of our projects is to forget that Gems ever existed.  Every shared library is frozen and checked into the project under an external directory where they stay unless I need newer versions.  Then I have a Rakefile which runs two jobs more usually seen in compiled programs – a build, and an install.

build compiles any native extensions that the program needs, and install copies the whole program, and all its built dependencies into /usr/lib/myprogram.  Finally it adds the binaries that I want to run to /usr/bin/myprogram but these are just stubs which set up the load paths to my "pure" Ruby environment.

In addition, because I don’t want to have to fix the source code of all the gems I’m using (usually around 10-15), these loader stubs actually load a fake Rubygems library.  The library just ensures that the gem command does nothing, and stops me having to worry about changes to the libraries’ code.

The down sides are pretty well understood – my 2000 lines of code which would have been a tiny, architecture-independent package, has to be built a one large package, once for each architecture (we need two at Bytemark).  If I wanted to distribute it any further I would probably want a wide variety more packages, for a few more distributions.  But I have all the usual down sides of managing my own packages – checking for security bugs and doing my own code updates, much larger & slower-to-install packages, and so on.  But the major up side: I can use Debian and apt-get to install and maintain it reliably on hundreds of servers.

Option 2: Repackage each library

For the long term, Patrick is working out how to gently modify the source of around 50 popular Ruby libraries so that they form normal Debian packages without needing Rubygems installed.  That will help me kick all this duplicated library code out of individual projects and back into packages where they belong.

This is semi-automated but still has many manual elements that Pat is working through.  The Debgems folk had given up at doing this in the general case, so we’re focussing on just the set of Gems we need to run all our code, and trying to integrate our work with Debian’s.  I notice also an Ubuntu team has also got a bunch of reasonably new packages into Ubuntu’s universe package list – I hope this will make it back into Debian, or that we can use them for Bytemark’s systems. 

I can even see scope for going all the way back to the start, and making a small fork of the core Ruby language & interpreter to fix the packaging problems – that is an extreme possibility, but with at least two new Ruby implementations becoming more relevant lately, it might not be the MRI (Matz’s Ruby Implementation, the original one) that stays relevant in the long run, and leadership on packaging could easily change.

But right now this approach is going against the grain of what almost all Ruby authors are doing; surgery is needed to library authors’ code, which is the cause of all this rot.  But unfortunately that’s the price that we have to pay to go back to a simpler, working library system.

Option 3: Wait… about five years?

I think that the worst mistake of Rubygems is being undone – from the next major Ruby version, the gem command is no longer necessary in the common case.  So if you require  ‘hoopystuff’ in Ruby 1.9, the require statement will implicitly look for a gem called ‘hoopystuff’.  This might seem like a trivial timesaver for library and application authors, after all, it was only six lines I was complaining about.  But but but… it means that the sanctioned way of including libraries is back to how it used to be, just one require statement, one namespace.

That mean a lot less work in repackaging gems, but only when library authors have got the message and Ruby 1.8 installations cease to be relevant – so that’s where my five year estimate comes from.

Start now, avoid the gem traps

If you’re a Ruby author who cares about distributing your software to more than just other programmers’ laptops, you only need to take some simple action with your existing Gems to make them compatible that the Debian folk wrote years ago.  I’d add the following to these tips though:

 

  1. don’t use the gem command in your main code at all, use a loader program that pulls it in if you need it.  In almost all cases it is going away in 1.9, and good riddance;
  2. if you provide a Ruby library called foobar, make sure your gem is also called foobar, and preferably only provides a single module called Foobar;
  3. don’t use capital letters in your gem name – amazingly there are already some gems in the namespace that differ only by case!

And finally, try building a native package for your favourite system!  Debian at least is quite easy.  It will take you a few more hours, but your library will simple be easier to manage for system administrators when you’re done.  Let’s not wait years for these design mistakes to atrophy away – fix your libraries and help make Ruby a first-class component of every OS.