Heroku's erosion resistance

Heroku is doing several things right. The Twelve-Factor App is a succinct, well-considered set of guidelines for building robust, scalable web applications, and Heroku has done a pretty good job of building a platform based on these guidelines. Of particular interest is Heroku's emphasis on erosion resistance and explicit contracts. Heroku has certainly done a better job of meeting these ideals than I would likely do in cobbling together a web app deployment system as part of my work. I am therefore seriously thinking about using Heroku at work.

However, having spent some time looking at the platform, I can think of some ways to significantly improve Heroku's erosion resistance. So I offer this post as constructive criticism in the hope that it will lead to a better platform.

The basic problem is that the GNU/Linux system underlying Heroku's current Celadon Cedar stack is made of an apparently haphazard collection of packages. Heroku does not appear to have paid close attention to its stated goals of erosion resistance and explicit contracts. As if it already recognizes this shortcoming, Heroku has not updated this GNU/Linux system since October 31, 2011.

What evidence do I have that Heroku paid insufficient attention to erosion resistance and explicit contracts while selecting packages for Cedar? Here are a few examples:

  • Cedar's base system includes OpenSSL 0.9.8, which did not have a stable ABI. My evidence that the ABI was not stable is that the Linux Standard Base project rejected OpenSSL. It seems to me that this will force Heroku to stick with OpenSSL 0.9.8, or patches based on it, for as long as Cedar is supported. What will happen when end-of-life is declared for the 0.9.8 branch?

  • Cedar's base system includes Ruby 1.9.2p290. I don't know much about Ruby, but I've seen evidence that a seemingly minor update to the Ruby implementation can introduce unexpected breakage. So I would not be comfortable incorporating Ruby into the bedrock of an erosion-resistant platform. Will Heroku need to continue including Ruby 1.9.2p290, or a carefully patched derivative thereof, in Cedar as long as that stack is supported? I notice that Heroku's Ruby buildpack provides a way to bundle a specific version of Ruby with the application slug, though this isn't yet the default behavior.

  • Cedar's base system includes ImageMagick, and not just the libraries, but the command-line utilities. This makes it all too easy to violate the twelve-factor methodology's rule against invoking command-line utilities that are not bundled with the application build. Ironically, ImageMagick is one of the utilities mentioned in factor II (Dependencies).

I believe I've now demonstrated that Heroku didn't pay enough attention to erosion resistance and explicit contracts while selecting packages for Cedar's base system. I suggested earlier that Heroku might already be aware of this, because it has not updated Cedar's base system since October 31, 2011. Incidentally, this was long before Heroku declared Cedar ready for general use. What are the consequences of this early freezing of the base system?

It seems to me that the consequence that truly matters is a complete lack of security updates for the packages in the base system. For example, Ubuntu has issued 4 updates to OpenSSL for Ubuntu 10.04 (on which Cedar is based) since Heroku froze the Cedar base system, and all of these updates are security-related. OpenSSL is quite well-known for requiring frequent security updates. So if I were in Heroku's position, I don't think I would incorporate OpenSSL into the bedrock of my platform unless I was sure that I could back it up with frequent security updates for as long as I supported the platform. Given the unstable ABI of OpenSSL, at least version 0.9.8, I would be doubly cautious about including OpenSSL in the base system of an erosion-resistant platform.

So what would I do? Well, I'm afraid that my proposed changes would require a new Heroku stack, and Heroku has stated that it has no plans to replace Cedar. So in the unlikely event that Heroku will have any interest in implementing my suggestions, I guess this statement might give Heroku a bit of a PR problem. Still, it wouldn't be right for me to criticize something without offering my ideas for improving it. Incidentally, I'm not imaginative enough to suggest a name for Cedar+1.

First, I would radically pare down the runtime base system. Let's start with the shared libraries. I would only provide Debian stable builds of these libraries: glibc (technically eglibc), libgcc_s, libstdc++, and NSS. All of these are in the Linux Standard Base, though they're a small subset of what the LSB offers. The first four form the minimum set of libraries needed to build C and C++ programs for GNU/Linux and have been ABI-stable for years. NSS handles the important and sensitive task of implementing cryptography, and we app developers really should not be responsible for crypto any more than necessary. A quick look at the search results for libnss3 at packages.ubuntu.com suggests that NSS has been ABI-stable since Ubuntu 8.04; this, the LSB project's decision to include NSS, and the Fedora project's decision to standardize on NSS are good enough for me. Why use Debian stable? Debian is well-known for being among the most conservative of major GNU/Linux distributions, and this is surely a good thing for the bedrock of an erosion-resistant platform. To keep the runtime base system lean and to avoid bringing in extraneous components which muddy the contract between platform and app, the runtime base system would only include the runtime libraries, not the corresponding development packages.

What about the shell and command-line utilities? I would provide BusyBox and nothing more. Heroku's dynos are based on LXC, and the process with PID 1 (currently called ps-run) seems to live outside the LXC container. So the base system for a Heroku dyno doesn't need all the components that are necessary to boot a full-blown GNU/Linux machine, whether physical or virtual. This minimalism will effectively enforce the aforementioned rule against invoking non-essential command-line utilities without bundling them in the app build. Besides that, it simply provides a more comprehensive contract between the platform and the application.

Of course, this runtime base system would be inadequate for running build tools such as Heroku's slug compiler and Vulcan. So I would provide a separate system image to use at build time. This one would include the full Debian stable base system (using the same stable version of Debian from which I got the runtime libraries), and at least these packages: build-essential and libnss3-dev. I'd probably throw in some build-time niceties such as the Autotools suite, Perl, curl, Git, and even Ruby and Python. But please note that I would be careful about which -dev packages are in the build-time system, to help ensure that the resulting build can be run on the runtime system. Web dynos and worker dynos would always use the runtime system. The runtime system would also be the default for one-off dynos, but I would also offer the build-time system for those, to enable build tools such as Vulcan.

To summarize, by being very explicit about what goes in the base system and what stays out, I believe it would be possible to create a much more erosion-resistant platform. I would be happy if any current or aspiring platform-as-a-service provider took these ideas and ran with them, but I would be most pleased if Heroku saw fit to do this itself. As I said at the outset, I think Heroku has some great ideas and is doing a pretty good job of implementing them. I look forward to using Heroku to relieve myself and my successors of some administrative chores as we build more robust web applications.