Matt's Musings

What?

Base system (BusyBox, shared libraries, and toolchain): 86.08 MB
Runtime system (BusyBox and shared libraries, no toolchain): 1.965 MB
Static runtime system (just BusyBox, no shared libraries or toolchain): 345.8 kB
nginx (with OpenSSL): 3.034 MB
PostgreSQL: 15.82 MB

What, exactly?

These are minimal images for Docker, based on musl libc and BusyBox. They don't use a conventional GNU/Linux distribution or even a package manager. Instead, they are built entirely from source. Their top-level source trees bring in components through Git submodules. Their build processes owe a lot to Linux From Scratch, Dragora (the upcoming musl-based version), and Sabotage Linux.

Why?

Why bother building images this way? After all, Solomon Hykes has said that Docker isn't intended to replace existing tools, such as package managers and their accompanying repositories, but to complement them. Sure, the typical base images have a hundred megabytes or more of stuff that most containers don't need at run time. But they don't waste much memory at run time, and disk space is cheap, especially on the scale of mere hundreds of megabytes. Better to just accept the waste, which isn't really waste, since it helps us developers get stuff done, right? Most Docker users don't seem to care how much unneeded stuff is in their images, so why should I?

It might be good enough to answer that I'm doing this because I want to. After all, I'm doing it in my spare time; nobody is paying me to do it. Still, it's reasonable to ask why, of all the things I could do in my spare time, I would bother with this. So I'll explain my reasons.

First, part of the philosophy of Docker, as I understand it, is that the whole userland software stack for an application -- everything above the kernel -- should be the developer's responsibility. As Mr. Hykes said in the previously referenced talk, "the system is part of the application. Your choice of distribution, your choice of system libraries, all of those choices, even if you didn't make them consciously -- maybe you just used the system that was lying around there -- that is affecting the behavior of the application, and if you swap it out, things will change." So if all of that, including things like libc and OpenSSL, will be my responsibility as a developer, I want to minimize that responsibility in any way I can.

In light of that, let's take a look at some of the bloat that we get in the widely used Docker images which are based on popular GNU/Linux distributions. Here are some packages that we get in the debian:wheezy image whether we really need them or not:

ext2 filesystem utilities: We don't mount or fsck inside a container.
SysV init and init scripts: We don't run init inside a Docker container.
Pluggable Authentication Modules: Plain /etc/passwd is enough inside a typical container.
Berkeley DB: This is used by a PAM module we don't need.
SELinux libraries and utilities: This might be useful on the host system, but it's rarely if ever used inside a container.
libusb: Inside a container we're not working directly with hardware, USB or otherwise.
ncurses, terminfo, and readline: These are nice to have in interactive sessions, but not needed in a production container.
bash: This is about 10 times as much as we usually need in a shell.
ping and other network utilities
OpenSSL: The only thing in the base system that needs this is ping6.

Granted, a lot of this isn't even loaded into memory when running a typical Debian-based Docker image. But it's there, on disk at least. As another example, let's look at some of the libraries referenced by the main postgres process in the Orchard PostgreSQL image:

libxml2: This is used by some XML-related features that most PostgreSQL users will never need.
PAM: All connections to a PostgreSQL database inside a container are over the network, so this is never needed.
Kerberos, GSSAPI, and LDAP: Most of us aren't using a Docker container like this in a legacy corporate network.
SQLite: It's amusing to me that this build of PostgreSQL depends on SQLite; in fact, this dependency is courtesy of the main Kerberos library.

None of the above is necessary for a working build of PostgreSQL, as my image demonstrates.

Does any of this matter? Maybe not; I might be obsessing over these gratuitous dependencies for no good reason. Still, I felt that this one-size-fits-all approach to packaging was leading to some undesirable waste, and decided to try to eliminate that waste by choosing different tradeoffs.

In particular, I believe glibc has a lot of cruft that isn't necessary in many environments, including most Docker containers. For example:

Name Service Switch: Did you even know this exists? A GNU/Linux system uses it whenever you log in or do a DNS lookup.
iconv implementation: This implementation of conversion between character encodings is overkill for most modern applications. It adds several megabytes of shared libraries to the base system, and requires a locale definition file just to handle UTF-8 correctly.
ONC RPC: Most modern applications, especially the kind that one typically runs inside Docker containers, don't use NFS or other ONC RPC-based services.

And there's probably more that I don't know about or have forgotten. In short, musl libc is much lighter than glibc, yet musl has everything that most modern applications need. musl also emphasizes correctness and robustness more than glibc historically has. So I want to use musl.

I also think the GNU core utilities are overkill for most Docker containers. The coreutils package in Debian Wheezy is about 13 MB. The cp utility alone is 128 kB. By contrast, my whole BusyBox build is 332 kB, statically linked. My BusyBox configuration is spartan (no editor, pager, wget, or command-line editing in the shell), but it's enough for building and running containers.

My final motivation for this project is that in my opinion, downloading packages from the Internet, as most Dockerfiles do, doesn't lead to very trustworthy builds, because these builds aren't deterministic. To me, a trustworthy build should be a pure function of the input source tree and the base image. Ideally, networking should be disabled inside the build-time containers. To achieve this, each top-level source tree needs to incorporate all of its dependencies directly. As far as I know, Git submodules are the best way to do this.

How?

You can learn everything there is to know about the build processes by browsing the Git repositories:

Feel free to ask me questions if anything is unclear.

What needs work?

Some scripts are more or less duplicated between the repositories. I should factor them out into a common repository which all of the others can pull in as another submodule.

The biggest problem is that due to limitations in the Docker build system, the build process for each image can't be done entirely inside one Dockerfile. This means that these images can't be trusted builds. I have some ideas about how this might be rectified, which I've previously raised on the docker-dev group. I plan to contribute a solution.

Finally, I haven't yet built images for any applications this way, only bits of infrastructure.

Conclusion

I doubt that many Docker users will use my images, or join me in building their own images this way, but I think it was a worthwhile experiment, for my own education if nothing else. I hope someone finds this work useful.

Discuss on Hacker Newws

Heroku is doing several things right. The Twelve-Factor App is a succinct, well-considered set of guidelines for building robust, scalable web applications, and Heroku has done a pretty good job of building a platform based on these guidelines. Of particular interest is Heroku's emphasis on erosion resistance and explicit contracts. Heroku has certainly done a better job of meeting these ideals than I would likely do in cobbling together a web app deployment system as part of my work. I am therefore seriously thinking about using Heroku at work.

However, having spent some time looking at the platform, I can think of some ways to significantly improve Heroku's erosion resistance. So I offer this post as constructive criticism in the hope that it will lead to a better platform.

The basic problem is that the GNU/Linux system underlying Heroku's current Celadon Cedar stack is made of an apparently haphazard collection of packages. Heroku does not appear to have paid close attention to its stated goals of erosion resistance and explicit contracts. As if it already recognizes this shortcoming, Heroku has not updated this GNU/Linux system since October 31, 2011.

What evidence do I have that Heroku paid insufficient attention to erosion resistance and explicit contracts while selecting packages for Cedar? Here are a few examples:

Cedar's base system includes OpenSSL 0.9.8, which did not have a stable ABI. My evidence that the ABI was not stable is that the Linux Standard Base project rejected OpenSSL. It seems to me that this will force Heroku to stick with OpenSSL 0.9.8, or patches based on it, for as long as Cedar is supported. What will happen when end-of-life is declared for the 0.9.8 branch?
Cedar's base system includes Ruby 1.9.2p290. I don't know much about Ruby, but I've seen evidence that a seemingly minor update to the Ruby implementation can introduce unexpected breakage. So I would not be comfortable incorporating Ruby into the bedrock of an erosion-resistant platform. Will Heroku need to continue including Ruby 1.9.2p290, or a carefully patched derivative thereof, in Cedar as long as that stack is supported? I notice that Heroku's Ruby buildpack provides a way to bundle a specific version of Ruby with the application slug, though this isn't yet the default behavior.
Cedar's base system includes ImageMagick, and not just the libraries, but the command-line utilities. This makes it all too easy to violate the twelve-factor methodology's rule against invoking command-line utilities that are not bundled with the application build. Ironically, ImageMagick is one of the utilities mentioned in factor II (Dependencies).

I believe I've now demonstrated that Heroku didn't pay enough attention to erosion resistance and explicit contracts while selecting packages for Cedar's base system. I suggested earlier that Heroku might already be aware of this, because it has not updated Cedar's base system since October 31, 2011. Incidentally, this was long before Heroku declared Cedar ready for general use. What are the consequences of this early freezing of the base system?

It seems to me that the consequence that truly matters is a complete lack of security updates for the packages in the base system. For example, Ubuntu has issued 4 updates to OpenSSL for Ubuntu 10.04 (on which Cedar is based) since Heroku froze the Cedar base system, and all of these updates are security-related. OpenSSL is quite well-known for requiring frequent security updates. So if I were in Heroku's position, I don't think I would incorporate OpenSSL into the bedrock of my platform unless I was sure that I could back it up with frequent security updates for as long as I supported the platform. Given the unstable ABI of OpenSSL, at least version 0.9.8, I would be doubly cautious about including OpenSSL in the base system of an erosion-resistant platform.

So what would I do? Well, I'm afraid that my proposed changes would require a new Heroku stack, and Heroku has stated that it has no plans to replace Cedar. So in the unlikely event that Heroku will have any interest in implementing my suggestions, I guess this statement might give Heroku a bit of a PR problem. Still, it wouldn't be right for me to criticize something without offering my ideas for improving it. Incidentally, I'm not imaginative enough to suggest a name for Cedar+1.

First, I would radically pare down the runtime base system. Let's start with the shared libraries. I would only provide Debian stable builds of these libraries: glibc (technically eglibc), libgcc_s, libstdc++, and NSS. All of these are in the Linux Standard Base, though they're a small subset of what the LSB offers. The first four form the minimum set of libraries needed to build C and C++ programs for GNU/Linux and have been ABI-stable for years. NSS handles the important and sensitive task of implementing cryptography, and we app developers really should not be responsible for crypto any more than necessary. A quick look at the search results for libnss3 at packages.ubuntu.com suggests that NSS has been ABI-stable since Ubuntu 8.04; this, the LSB project's decision to include NSS, and the Fedora project's decision to standardize on NSS are good enough for me. Why use Debian stable? Debian is well-known for being among the most conservative of major GNU/Linux distributions, and this is surely a good thing for the bedrock of an erosion-resistant platform. To keep the runtime base system lean and to avoid bringing in extraneous components which muddy the contract between platform and app, the runtime base system would only include the runtime libraries, not the corresponding development packages.

What about the shell and command-line utilities? I would provide BusyBox and nothing more. Heroku's dynos are based on LXC, and the process with PID 1 (currently called ps-run) seems to live outside the LXC container. So the base system for a Heroku dyno doesn't need all the components that are necessary to boot a full-blown GNU/Linux machine, whether physical or virtual. This minimalism will effectively enforce the aforementioned rule against invoking non-essential command-line utilities without bundling them in the app build. Besides that, it simply provides a more comprehensive contract between the platform and the application.

Of course, this runtime base system would be inadequate for running build tools such as Heroku's slug compiler and Vulcan. So I would provide a separate system image to use at build time. This one would include the full Debian stable base system (using the same stable version of Debian from which I got the runtime libraries), and at least these packages: build-essential and libnss3-dev. I'd probably throw in some build-time niceties such as the Autotools suite, Perl, curl, Git, and even Ruby and Python. But please note that I would be careful about which -dev packages are in the build-time system, to help ensure that the resulting build can be run on the runtime system. Web dynos and worker dynos would always use the runtime system. The runtime system would also be the default for one-off dynos, but I would also offer the build-time system for those, to enable build tools such as Vulcan.

To summarize, by being very explicit about what goes in the base system and what stays out, I believe it would be possible to create a much more erosion-resistant platform. I would be happy if any current or aspiring platform-as-a-service provider took these ideas and ran with them, but I would be most pleased if Heroku saw fit to do this itself. As I said at the outset, I think Heroku has some great ideas and is doing a pretty good job of implementing them. I look forward to using Heroku to relieve myself and my successors of some administrative chores as we build more robust web applications.