Ruby gems, and when we'll be shot of them
Posted by Matthew Bloch
There comes a time in every sufficiently ambitious Ruby programmer's life that one will butt up against gems, Ruby's own packaging system. Outside of the cosy confines of one's laptop, on servers and embedded systems, the system contains design mistakes that make them unmanageable. Here's how we cope at Bytemark. But how did it get here?
The world of programmers
So computer languages have always had some idea of a library, a set of functions, classes and other extensions that make the language more useful At the earliest and most basic, #include tells a C compiler that you wish to load a library of functions for input and output called stdio - printing lines of text, opening files and so on.
In Ruby, there are two sets of libraries which have shipped with the language. These are the core and standard libraries, and Ruby programmers know that they're always available. If you want to program with network sockets, you say require 'socket' and you know that you can use TCPServer and friends to start writing high-level network programs. In common with other scripting languages, if a programmer wanted to use another library on his system, he would install its files into /usr/local/lib/ruby, and could then 'require' those files in any program.
That's the same simple method that Perl, Python and many other scripting languages have used, and it's easy to describe and understand. It's also easy for Operating System vendors and Linux distributions to package.
The world of system administrators
Since the early 90s, the Debian project defined the state of the art in large-scale, long-term system administration. Before Debian (and yum, and the other systems it inspired), admins either routinely did their own software builds, or ran crude packaging systems which were no more complex than a tar file. Debian rejected the idea that system administrators needed to constantly reinstall to "blow away the cobwebs", and hammered out packages and a distribution structure that has allowed admins to avoid reinstalling some individual systems for 10+ years.
Debian have packages for everything from word processors to hardware drivers. An administrator can define the state of a whole computer system just through listing what packages are installed, and a handful of configuration files - no clicking through installers, or manual stages. Debian's thousands of volunteers have built and tested stable packages based on the creme of free software. Users of a Debian-based OS can have their pick of the best of it, and know it will work together. So in 2009 you can say "I want a system with openoffice installed", and Debian's package management system will perform a complex set of resolutions to find out what libraries openoffice depends on. It will install those in the correct order, right down to the operating system kernel and graphics drivers, so that after downloading, unpacking and running setup scripts, the user goes from having an empty system to a working word processor.
This same process works the same on a hundred Debian servers, even when those servers have very different hardware underneath. This is still an awesome achievement, and Debian are still leading the way in defining how software installation on computers should work.
Crucially, Debian package programming libraries for lots of languages, and authors writing for Debian systems can rely on scores of them being available in a predictable way. But right now, in Debian and most other Linux distributions, there are some major gaps in coverage for widely-used Ruby libraries, which means dependent applications can't be packaged. How did this situation come about?
Blame the web app startups!
Well, at some point in the last 10 years, the rise of Google caused all programmers everywhere to lose their minds. They decided that users shouldn't install their own programs on their own computers, but instead should trust installation and data storage to the programmer exclusively. Sophisticated, fast, reliable programs started being replaced by simple, slow flakey ones as a result.
Just a joke! We all love web applications really.
No - more reasonably speaking, a handful of smart programmers, all at once, had found a way out of the dreary mire of web applications in 2004-5, and a frontier was formed with Rails and various other programming libraries leading the charge.
On this frontier, distributing and installing applications doesn't matter. Paul Graham was probably the first to say that when you're selling a web-based application, you have an advantage that you can use any language you want because you don't have to distribute it. As long as the programmer can install his program on his own little set of servers, and handle his users' data storage, nobody else ever had to see his code, what libraries he used, or how it was put together. But also it means he doesn't have to conform to traditional system administration practices either, and I think this is the root of the problem.
Within the Ruby community, packaging for widespread distribution has been an afterthought to pushing forward the frontier of what the language can do. Distributable Ruby applications are thin on the ground (because right now, most people writing web apps want to sell access to them, rather than distribute them for free), but there are lots of great libraries out there.
Problem solved?
But instead of settling on a simple method of managing local installations like Perl folk did with CPAN, Rubyists settled on Rubygems.
In theory, a system administrator types gem install hoopystuff to install a gem called 'hoopystuff'. The gem program goes away, finds the hoopystuff gem, and installs it on his system. If there's anything that needs compiling, gem compiles it and puts it in the right place. Then any program that wants hoopystuff can start to use it. There is a master list of gems maintained, a network of mirrors and a signing mechanism, a lot like other packaging systems. There is a some wheel re-inventing going on, but it means Ruby programmers don't have to worry about supporting every possible system, which seems like a win. It even allows programmers to install libraries on a shared system, without needing full privileges over it, which is a useful feature.
But the biggest mistake made in Gems was to add to the language. In Java, or C, or Python, or any other language, to include a library, you do the same thing, regardless of who installed the library, or where. But in Ruby, a gem command was added to the language. And you need the rubygems library included first in order to use that command. So if a programmer wants to use the hoopystuff library he's installed as a gem, the obvious doesn't work any more:
require 'hoopystuff'
Instead he has to do:
require 'rubygems' gem 'hoopystuff' require 'hoopystuff'
But if he has installed hoopystuff through his system distribution, rather than Rubygems, this will fail! So a thorough way of including the library has to be a full six lines of code:
begin require 'rubygems' gem 'hoopystuff' rescue LoadError => no_gems_error # no Rubygems library installed, or no 'hoopystuff' gem end # either way we need to do this, if this fails the library definitely isn't here require 'hoopystuff'
This is now the only portable way of asking for a library in Ruby - hardly the principle of least surprise. And through the fog, many library authors assume that the 'gem' command is available where it's not. So packaging almost any contemporary Ruby application involves altering its code. The Debian packagers are asking Ruby authors very nicely to bear this in mind, but to little effect.
To add to the confusion, the gem command also allows multiple versions of the same package on a single system. This means that instead of simply asking for gem 'hoopystuff', the programmer can ask for 'hoopystuff' version 1.23. Unfortunately another part of the same program can ask for 'hoopystuff' version 1.5, Ruby will die at that point, saying that it can't load two versions of the same gem in the same program. I don't think I'm going out on much of a limb when I say nobody needs this feature and if you think you do, you're not clever enough to use it propely. I have "fixed" plenty of conflicting gem invocations in live apps where both pieces of code are demanding different versions of a gem, where the same one works fine.
Why can't we convert?
It's this last requirement to allow multiple versions that makes gems fundamentally incompatible with every other package management system - Debian, Redhat, SuSE ... all of them allow one version to be installed on a system. So it's impossible to do a clean mapping of gems onto .debs or .rpms or any other mature packaging structure, because once you add enough applications into the mix, they can all be demanding different versions of the same Gem, and these demands are expected to be met.
At the start of 2009, the talented guys at Phusion made an attempt called DebGem where they took every version of every gem they could find, and made a Debian package out of it, baking the Gem version number into the name of the package. It looked weird, and appeared to work. But the project has been silent since April, and the Linux distributions they supported are fading into irrelevance. My guess is they couldn't stomach the amount of manual work needed to tweak every Ruby programmer's misunderstandings about Gems. (but Phusion dudes, if it was just the expensive hosting, Bytemark will still donate as many mirrors & build hosts as you need).
In contrast, the Perl folk have a simple site that routinely converts Perl packages into compatible debs, and allows them to be installed, and integrated into official distributions easily. All because the packaging system is simpler.
So what are the options left for a Ruby programmer who wants to ship portable software to a wide variety of users?
Option 1: A traditional build system
What I'm doing with a couple of our projects is to forget that Gems ever existed. Every shared library is frozen and checked into the project under an external directory where they stay unless I need newer versions. Then I have a Rakefile which runs two jobs more usually seen in compiled programs - a build, and an install.
build compiles any native extensions that the program needs, and install copies the whole program, and all its built dependencies into /usr/lib/myprogram. Finally it adds the binaries that I want to run to /usr/bin/myprogram but these are just stubs which set up the load paths to my "pure" Ruby environment.
In addition, because I don't want to have to fix the source code of all the gems I'm using (usually around 10-15), these loader stubs actually load a fake Rubygems library. The library just ensures that the gem command does nothing, and stops me having to worry about changes to the libraries' code.
The down sides are pretty well understood - my 2000 lines of code which would have been a tiny, architecture-independent package, has to be built a one large package, once for each architecture (we need two at Bytemark). If I wanted to distribute it any further I would probably want a wide variety more packages, for a few more distributions. But I have all the usual down sides of managing my own packages - checking for security bugs and doing my own code updates, much larger & slower-to-install packages, and so on. But the major up side: I can use Debian and apt-get to install and maintain it reliably on hundreds of servers.
Option 2: Repackage each library
For the long term, Patrick is working out how to gently modify the source of around 50 popular Ruby libraries so that they form normal Debian packages without needing Rubygems installed. That will help me kick all this duplicated library code out of individual projects and back into packages where they belong.
This is semi-automated but still has many manual elements that Pat is working through. The Debgems folk had given up at doing this in the general case, so we're focussing on just the set of Gems we need to run all our code, and trying to integrate our work with Debian's. I notice also an Ubuntu team has also got a bunch of reasonably new packages into Ubuntu's universe package list - I hope this will make it back into Debian, or that we can use them for Bytemark's systems.
I can even see scope for going all the way back to the start, and making a small fork of the core Ruby language & interpreter to fix the packaging problems - that is an extreme possibility, but with at least two new Ruby implementations becoming more relevant lately, it might not be the MRI (Matz's Ruby Implementation, the original one) that stays relevant in the long run, and leadership on packaging could easily change.
But right now this approach is going against the grain of what almost all Ruby authors are doing; surgery is needed to library authors' code, which is the cause of all this rot. But unfortunately that's the price that we have to pay to go back to a simpler, working library system.
Option 3: Wait... about five years?
I think that the worst mistake of Rubygems is being undone - from the next major Ruby version, the gem command is no longer necessary in the common case. So if you require 'hoopystuff' in Ruby 1.9, the require statement will implicitly look for a gem called 'hoopystuff'. This might seem like a trivial timesaver for library and application authors, after all, it was only six lines I was complaining about. But but but... it means that the sanctioned way of including libraries is back to how it used to be, just one require statement, one namespace.
That mean a lot less work in repackaging gems, but only when library authors have got the message and Ruby 1.8 installations cease to be relevant - so that's where my five year estimate comes from.
Start now, avoid the gem traps
If you're a Ruby author who cares about distributing your software to more than just other programmers' laptops, you only need to take some simple action with your existing Gems to make them compatible that the Debian folk wrote years ago. I'd add the following to these tips though:
- don't use the gem command in your main code at all, use a loader program that pulls it in if you need it. In almost all cases it is going away in 1.9, and good riddance;
- if you provide a Ruby library called foobar, make sure your gem is also called foobar, and preferably only provides a single module called Foobar;
- don't use capital letters in your gem name - amazingly there are already some gems in the namespace that differ only by case!
And finally, try building a native package for your favourite system! Debian at least is quite easy. It will take you a few more hours, but your library will simple be easier to manage for system administrators when you're done. Let's not wait years for these design mistakes to atrophy away - fix your libraries and help make Ruby a first-class component of every OS.



Even though rubygems has its problems it’s definitely convenient. When I work in other languages I miss it.
I think Rip solves some of your problems: http://hellorip.com/
P.S. As a Ruby web app developer who hosts popular commercial projects on Bytemark, I’d like to see more posts like this!
Thanks for your post. I’ve often thought of this myself as FreeBSD has ports for a number of rubygems, and sometimes I just think “why do we even have rubygems”.
http://www.FreeBSD.org/cgi/ports.cgi?query=rubygem&stype=all&sektion=all
The advantages I see with rubygems is that it allows instant and easy access to the latest ruby libraries without waiting for the distro port/package to be updated. It also doesn’t require someone using each distro to maintain a port/package for the particular Ruby library.
Really, it makes one wonder if the problem isn’t that each distro/os has a different way of installing and maintaining software/libraries!
“sudo gem install rip”? shudder
rip wouldn’t solve *my* problems although looking over the summary, I can see it’s a lighter touch than Gems, and has no pretensions to managing anything system-wide. No resolving dependencies at run-time, hurrah. But mix & match package formats? Direct links to source control snapshots across lots of servers? This is not a recipe for stable, repeatable package installation on sensitive production systems. Those deps.rip files will rot within six months, though that’s about 10 years in Ruby programming-land :-)
No, developing a stable packaging environment does not suit the current strengths of the Ruby community. It needs strong leadership, conservative practices and looking to emulate rather stodgy work that has been done better in the past, none of which is very fashionable.
If the current library authors can just avoid throwing any more spanners in the works like Rubygems, they will find hundreds of potential volunteer packagers from Debian and other projects willing to preserve and stabilise their work for them.
Ruby application authors also need to spot that if they’re relying on bleeding-edge stuff from github, there is no sensible way for them to reference them as libraries - just copy them as they find them, and take them out again as the library stabilises. At present, there’s a significant chance they will end up as de-facto maintainers anyway.
thanks a lot.. it’s very good article.
A lot of the things you say about RubyGems is outdated. Since RubyGems 1.0 it’s no longer necessary to run the “gem” method. You just need to require ‘rubygems’ once. Then whenever you require ‘foo’, and foo isn’t available, then RubyGems will look in one of the installed gems automatically.
Basically this means you just need to alias the ‘ruby’ command with ‘ruby -rubygems’, and everything will work without manual app changes. So your example really becomes this:
require 'rubygems' # Not necessary if you run # ruby with the -rubygems option. require 'hoopystuff' # Will use the distro-installed library # if possible, and use RubyGems if not # installed through the distro.I don’t know why you think a manual “gem” call is necessary; my guess is that you’re using Debian’s packaged RubyGems version - 0.8.something. That’s a very, very old version of RubyGems and back then you did need to manually call “gem”. The most recent version of RubyGems is 1.3.5 and Rails doesn’t even work with older versions of RubyGems anymore.
As for DebGem, you’re misunderstanding something. The reason why it’s been silent is not because we have to manually tune every library or app. Manual tuning is required for some packages, but not for the reasons you mentioned: the manual tuning we do involves checking which package is Ruby 1.8 or 1.9-only or Windows only or whether the package format is too old to be supported by RubyGems 1.3, as well as correctly mapping RubyGem names to Debian names in case Debian packages the same gem, and stuff like that. No, the reason why it’s been silent is because of the lack of a sustainable business environment as well as lack of resources on our side. In other words, few people are willing to pay for the service, and if that’s so then we think we have better things to do with our time. Plus the fact that it’s a hassle to rebuild the entire repository from scratch for every possible distribution. Technically speaking however DebGem is entirely feasible.
Good to read a blog post about deploying Ruby in real-life production environments. So much of what is written about Ruby doesn’t scale to complex environments. As a fairly large ruby shop we are currently trying to get our head around best-practice for ruby deployments too.
Our preference was to use deb packages too. Mostly because everything else we were installing was a deb. We tried wrapping gems in deb packages, but got bitten by Ruby’s assumption that gems are used.
I’m still not clear though on what really is wrong with using Rubygems for ruby applications and libraries. We didn’t because:
A. All our non-ruby dependancies required debs. So seemed nice to keep things to one package system
B. Familiarity with deb for the ppl doing the deployments
Are those the only problems you see too? Or are there others. Can you talk a little about why you don’t/can’t use Rubygems?
Worth pointing out that you don’t have to require gems like this. See Ryan Tomayko’s article on this: http://tomayko.com/writings/require-rubygems-antipattern
You can run ruby -rubygems yourscript.rb or set the RUBYOPT environment variable.
Hi Hongli, good to hear from you. As I said, the hosting offer is still open if that makes business case easier for you. I did think at the time that even if you did find a bunch of people to pay, the service would be “too useful” to not inspire a duplication of effort.
I understand that it’s simple from an architecture point of view to say “that’s outdated practice now, zing!”, and I noted it’s been set in stone for Ruby 1.9. But you’ll know more than anyone that there are lots of gems on Rubyforge where nobody appears to realise this yet, and that’s the problem. It’s not the small minority of Ruby programmers who are completely au fait with packaging issues, but the ones who aren’t (like me ) and the code that has already been written.
Aisha, we don’t like using Rubygems because we maintain a large number of systems and libraries through cfengine & apt-get. We have installed “enough” Rails apps on a single host to make conflicting gem invocations something we need to deal with more often that we’d like. However for now we’re taking the same route most other people do, which is making sure we can reimage our production hosts with a quick recipe, gems and all. If ongoing maintenance gets too complicated, we just blow them away and start again.
We have been a bit slow to realise the latter approach, so I wouldn’t say we are a great example of a production Rails host ourselves, but we do manage a small cluster, and most of this is “best practice” work in anticipation of some more ambitious projects.
By the by it’s fair to say that Debian’s own attitude to packaging web applications is still stuck in 1990s Apache configurations - it wouldn’t do to have your sshd, bind, and ntp server all running in the same security context, but every web app on Debian still runs as www-data! I think I moaned about that a while ago though.
Things aren’t quite as rosy in the CPAN garden as you seem to think. There are at least three different sets of tools used for building and installing perl modules, several different ways of specifying version numbers which conflict in entertaining ways, metadata is somewhat lacking, and it’s far from being a package management system as there’s no way of uninstalling stuff.
You should not be requiring rubygems in reality. You should just require ‘crap’ and make sure rubygems is what is being used in your ruby environment.