Tiny Docker images with musl libc and no package manager

What?

What, exactly?

These are minimal images for Docker, based on musl libc and BusyBox. They don't use a conventional GNU/Linux distribution or even a package manager. Instead, they are built entirely from source. Their top-level source trees bring in components through Git submodules. Their build processes owe a lot to Linux From Scratch, Dragora (the upcoming musl-based version), and Sabotage Linux.

Why?

Why bother building images this way? After all, Solomon Hykes has said that Docker isn't intended to replace existing tools, such as package managers and their accompanying repositories, but to complement them. Sure, the typical base images have a hundred megabytes or more of stuff that most containers don't need at run time. But they don't waste much memory at run time, and disk space is cheap, especially on the scale of mere hundreds of megabytes. Better to just accept the waste, which isn't really waste, since it helps us developers get stuff done, right? Most Docker users don't seem to care how much unneeded stuff is in their images, so why should I?

It might be good enough to answer that I'm doing this because I want to. After all, I'm doing it in my spare time; nobody is paying me to do it. Still, it's reasonable to ask why, of all the things I could do in my spare time, I would bother with this. So I'll explain my reasons.

First, part of the philosophy of Docker, as I understand it, is that the whole userland software stack for an application -- everything above the kernel -- should be the developer's responsibility. As Mr. Hykes said in the previously referenced talk, "the system is part of the application. Your choice of distribution, your choice of system libraries, all of those choices, even if you didn't make them consciously -- maybe you just used the system that was lying around there -- that is affecting the behavior of the application, and if you swap it out, things will change." So if all of that, including things like libc and OpenSSL, will be my responsibility as a developer, I want to minimize that responsibility in any way I can.

In light of that, let's take a look at some of the bloat that we get in the widely used Docker images which are based on popular GNU/Linux distributions. Here are some packages that we get in the debian:wheezy image whether we really need them or not:

  • ext2 filesystem utilities: We don't mount or fsck inside a container.
  • SysV init and init scripts: We don't run init inside a Docker container.
  • Pluggable Authentication Modules: Plain /etc/passwd is enough inside a typical container.
  • Berkeley DB: This is used by a PAM module we don't need.
  • SELinux libraries and utilities: This might be useful on the host system, but it's rarely if ever used inside a container.
  • libusb: Inside a container we're not working directly with hardware, USB or otherwise.
  • ncurses, terminfo, and readline: These are nice to have in interactive sessions, but not needed in a production container.
  • bash: This is about 10 times as much as we usually need in a shell.
  • ping and other network utilities
  • OpenSSL: The only thing in the base system that needs this is ping6.

Granted, a lot of this isn't even loaded into memory when running a typical Debian-based Docker image. But it's there, on disk at least. As another example, let's look at some of the libraries referenced by the main postgres process in the Orchard PostgreSQL image:

  • libxml2: This is used by some XML-related features that most PostgreSQL users will never need.
  • PAM: All connections to a PostgreSQL database inside a container are over the network, so this is never needed.
  • Kerberos, GSSAPI, and LDAP: Most of us aren't using a Docker container like this in a legacy corporate network.
  • SQLite: It's amusing to me that this build of PostgreSQL depends on SQLite; in fact, this dependency is courtesy of the main Kerberos library.

None of the above is necessary for a working build of PostgreSQL, as my image demonstrates.

Does any of this matter? Maybe not; I might be obsessing over these gratuitous dependencies for no good reason. Still, I felt that this one-size-fits-all approach to packaging was leading to some undesirable waste, and decided to try to eliminate that waste by choosing different tradeoffs.

In particular, I believe glibc has a lot of cruft that isn't necessary in many environments, including most Docker containers. For example:

  • Name Service Switch: Did you even know this exists? A GNU/Linux system uses it whenever you log in or do a DNS lookup.
  • iconv implementation: This implementation of conversion between character encodings is overkill for most modern applications. It adds several megabytes of shared libraries to the base system, and requires a locale definition file just to handle UTF-8 correctly.
  • ONC RPC: Most modern applications, especially the kind that one typically runs inside Docker containers, don't use NFS or other ONC RPC-based services.

And there's probably more that I don't know about or have forgotten. In short, musl libc is much lighter than glibc, yet musl has everything that most modern applications need. musl also emphasizes correctness and robustness more than glibc historically has. So I want to use musl.

I also think the GNU core utilities are overkill for most Docker containers. The coreutils package in Debian Wheezy is about 13 MB. The cp utility alone is 128 kB. By contrast, my whole BusyBox build is 332 kB, statically linked. My BusyBox configuration is spartan (no editor, pager, wget, or command-line editing in the shell), but it's enough for building and running containers.

My final motivation for this project is that in my opinion, downloading packages from the Internet, as most Dockerfiles do, doesn't lead to very trustworthy builds, because these builds aren't deterministic. To me, a trustworthy build should be a pure function of the input source tree and the base image. Ideally, networking should be disabled inside the build-time containers. To achieve this, each top-level source tree needs to incorporate all of its dependencies directly. As far as I know, Git submodules are the best way to do this.

How?

You can learn everything there is to know about the build processes by browsing the Git repositories:

Feel free to ask me questions if anything is unclear.

What needs work?

Some scripts are more or less duplicated between the repositories. I should factor them out into a common repository which all of the others can pull in as another submodule.

The biggest problem is that due to limitations in the Docker build system, the build process for each image can't be done entirely inside one Dockerfile. This means that these images can't be trusted builds. I have some ideas about how this might be rectified, which I've previously raised on the docker-dev group. I plan to contribute a solution.

Finally, I haven't yet built images for any applications this way, only bits of infrastructure.

Conclusion

I doubt that many Docker users will use my images, or join me in building their own images this way, but I think it was a worthwhile experiment, for my own education if nothing else. I hope someone finds this work useful.

Discuss on Hacker Newws

Heroku's erosion resistance

Heroku is doing several things right. The Twelve-Factor App is a succinct, well-considered set of guidelines for building robust, scalable web applications, and Heroku has done a pretty good job of building a platform based on these guidelines. Of particular interest is Heroku's emphasis on erosion resistance and explicit contracts. Heroku has certainly done a better job of meeting these ideals than I would likely do in cobbling together a web app deployment system as part of my work. I am therefore seriously thinking about using Heroku at work.

However, having spent some time looking at the platform, I can think of some ways to significantly improve Heroku's erosion resistance. So I offer this post as constructive criticism in the hope that it will lead to a better platform.

The basic problem is that the GNU/Linux system underlying Heroku's current Celadon Cedar stack is made of an apparently haphazard collection of packages. Heroku does not appear to have paid close attention to its stated goals of erosion resistance and explicit contracts. As if it already recognizes this shortcoming, Heroku has not updated this GNU/Linux system since October 31, 2011.

What evidence do I have that Heroku paid insufficient attention to erosion resistance and explicit contracts while selecting packages for Cedar? Here are a few examples:

  • Cedar's base system includes OpenSSL 0.9.8, which did not have a stable ABI. My evidence that the ABI was not stable is that the Linux Standard Base project rejected OpenSSL. It seems to me that this will force Heroku to stick with OpenSSL 0.9.8, or patches based on it, for as long as Cedar is supported. What will happen when end-of-life is declared for the 0.9.8 branch?

  • Cedar's base system includes Ruby 1.9.2p290. I don't know much about Ruby, but I've seen evidence that a seemingly minor update to the Ruby implementation can introduce unexpected breakage. So I would not be comfortable incorporating Ruby into the bedrock of an erosion-resistant platform. Will Heroku need to continue including Ruby 1.9.2p290, or a carefully patched derivative thereof, in Cedar as long as that stack is supported? I notice that Heroku's Ruby buildpack provides a way to bundle a specific version of Ruby with the application slug, though this isn't yet the default behavior.

  • Cedar's base system includes ImageMagick, and not just the libraries, but the command-line utilities. This makes it all too easy to violate the twelve-factor methodology's rule against invoking command-line utilities that are not bundled with the application build. Ironically, ImageMagick is one of the utilities mentioned in factor II (Dependencies).

I believe I've now demonstrated that Heroku didn't pay enough attention to erosion resistance and explicit contracts while selecting packages for Cedar's base system. I suggested earlier that Heroku might already be aware of this, because it has not updated Cedar's base system since October 31, 2011. Incidentally, this was long before Heroku declared Cedar ready for general use. What are the consequences of this early freezing of the base system?

It seems to me that the consequence that truly matters is a complete lack of security updates for the packages in the base system. For example, Ubuntu has issued 4 updates to OpenSSL for Ubuntu 10.04 (on which Cedar is based) since Heroku froze the Cedar base system, and all of these updates are security-related. OpenSSL is quite well-known for requiring frequent security updates. So if I were in Heroku's position, I don't think I would incorporate OpenSSL into the bedrock of my platform unless I was sure that I could back it up with frequent security updates for as long as I supported the platform. Given the unstable ABI of OpenSSL, at least version 0.9.8, I would be doubly cautious about including OpenSSL in the base system of an erosion-resistant platform.

So what would I do? Well, I'm afraid that my proposed changes would require a new Heroku stack, and Heroku has stated that it has no plans to replace Cedar. So in the unlikely event that Heroku will have any interest in implementing my suggestions, I guess this statement might give Heroku a bit of a PR problem. Still, it wouldn't be right for me to criticize something without offering my ideas for improving it. Incidentally, I'm not imaginative enough to suggest a name for Cedar+1.

First, I would radically pare down the runtime base system. Let's start with the shared libraries. I would only provide Debian stable builds of these libraries: glibc (technically eglibc), libgcc_s, libstdc++, and NSS. All of these are in the Linux Standard Base, though they're a small subset of what the LSB offers. The first four form the minimum set of libraries needed to build C and C++ programs for GNU/Linux and have been ABI-stable for years. NSS handles the important and sensitive task of implementing cryptography, and we app developers really should not be responsible for crypto any more than necessary. A quick look at the search results for libnss3 at packages.ubuntu.com suggests that NSS has been ABI-stable since Ubuntu 8.04; this, the LSB project's decision to include NSS, and the Fedora project's decision to standardize on NSS are good enough for me. Why use Debian stable? Debian is well-known for being among the most conservative of major GNU/Linux distributions, and this is surely a good thing for the bedrock of an erosion-resistant platform. To keep the runtime base system lean and to avoid bringing in extraneous components which muddy the contract between platform and app, the runtime base system would only include the runtime libraries, not the corresponding development packages.

What about the shell and command-line utilities? I would provide BusyBox and nothing more. Heroku's dynos are based on LXC, and the process with PID 1 (currently called ps-run) seems to live outside the LXC container. So the base system for a Heroku dyno doesn't need all the components that are necessary to boot a full-blown GNU/Linux machine, whether physical or virtual. This minimalism will effectively enforce the aforementioned rule against invoking non-essential command-line utilities without bundling them in the app build. Besides that, it simply provides a more comprehensive contract between the platform and the application.

Of course, this runtime base system would be inadequate for running build tools such as Heroku's slug compiler and Vulcan. So I would provide a separate system image to use at build time. This one would include the full Debian stable base system (using the same stable version of Debian from which I got the runtime libraries), and at least these packages: build-essential and libnss3-dev. I'd probably throw in some build-time niceties such as the Autotools suite, Perl, curl, Git, and even Ruby and Python. But please note that I would be careful about which -dev packages are in the build-time system, to help ensure that the resulting build can be run on the runtime system. Web dynos and worker dynos would always use the runtime system. The runtime system would also be the default for one-off dynos, but I would also offer the build-time system for those, to enable build tools such as Vulcan.

To summarize, by being very explicit about what goes in the base system and what stays out, I believe it would be possible to create a much more erosion-resistant platform. I would be happy if any current or aspiring platform-as-a-service provider took these ideas and ran with them, but I would be most pleased if Heroku saw fit to do this itself. As I said at the outset, I think Heroku has some great ideas and is doing a pretty good job of implementing them. I look forward to using Heroku to relieve myself and my successors of some administrative chores as we build more robust web applications.

Why I reject Christianity; what I now believe

(Edited and expanded on September 7, 2013)

As my immediate family and closest friends already know, I decided last year to reject the Christian world view. My reason is simple: not enough evidence. The Bible is not consistent with itself, let alone with what we can observe of reality, including history, science, and the reality of suffering all over the world. There is no strong evidence that Jesus is who the Bible claims he is, or that he rose from the dead. Some Christians point to a personal experience of positive change in their lives as evidence. But that doesn't lend any weight to Christianity, because people of all religions claim to have experiences which they interpret as validation of their religion. And faith, which is belief without sufficient evidence, is a cop-out.

Of course, these are just assertions, and my rejection of Christianity would not be rational if I didn't back those assertions up. The nice thing about the Web is that I can simply link to some more detailed treatments of these subjects. So here are a few links to get you started. I may not agree with all of these authors on every single point, but I'm in substantial agreement with them.

Yes, I'm unabashedly linking to the writings of vocal atheists, and I've read many more such writings than I've linked to here (along with Christian apologetics). My Christian parents and other family members have expressed their concern that I've opened my mind to the influence of the devil, and have asked that I refrain from doing so. I agreed to refrain for a time last year, but not anymore. I believe that the very idea of Satanic influence is simply a way that Christians silence skepticism, so that they can feel all right about not questioning their own beliefs, and so that the Christians who claim a position of leadership (priests, ministers, pastors, etc.) can retain their power to tell the rest of us how to live our lives. If Christianity were true, then my belief should have withstood my study of arguments from both sides.

Some Christians may claim that I am rejecting Christianity because I want to live a selfish, immoral life, or because I don't want to be accountable to my creator. That is emphatically not the case. I'm not perfect, but I want to live an upright, moral life. In fact, if you ever notice that I'm using my new beliefs to justify an act that is clearly immoral or overly selfish, then please call me out on it. Furthermore, if there is evidence that we humans were created by some intelligent being, that this God is still alive, and that this God cares how we live our lives, then I'd love to receive guidance from this God. But I don't know of any compelling evidence for such a God. Instead, we have many contradictory claims that people have made about God; not only do we have multiple religions, but Christianity itself is divided on numerous points, and the canonical scriptures of Christianity are contradictory. So it seems foolish to organize my life around the belief that the Bible is true, whatever that means.

So what do I believe instead?

At the most basic level, I believe that beliefs should be backed by logic and evidence. In other words, beliefs should be internally coherent and should be consistent with observable reality. What are my logic and evidence for this belief? Science, which is based on logic and evidence, has improved our lives dramatically over the past several centuries; we know that it works. And we use logic and evidence to determine the truth or falsehood of ordinary factual claims. So why should we grant privileged status to a set of writings that are supposedly the word of God, as Christians say we should? Why should we automatically accept these writings as truth? Why not discover whether these writings are true, based on logic and evidence, as we would for anything else? To do otherwise is inconsistent. The way I see it, the Christian exhortation to accept the Bible as truth, overriding reason, science, and everything else, is just another way of silencing skepticism.

I believe that morality is all about maximizing overall well-being for all conscious beings, most notably humans but also including many animals. No human is smart enough to really pull this off, so for the most part, we have to rely on rules to approximate this goal. On a more personal level, I believe it's fine to pursue one's personal goals, but only to a point; we need to be considerate of others. In particular, we need to avoid doing to others what we would not want done to ourselves. Many moral guidelines can be derived from these principles. Some moral choices are pretty clear-cut; some are vexing. And it seems to me that the Bible is of little value as a guide to morality. Of course, I'm not setting myself up as some kind of authority on morality; this is just my current understanding. And I'm still working on practicing these principles in my own life.

The topic of origins has been notoriously contentious, but I don't believe that it's as crucial as many make it out to be. I'll grant that there might have been a creator, who set up the universe, tuned it such that life could develop in some parts of it, and got the process of evolution started on Earth. Maybe the creator even intervened in the case of humans, to give us consciousness and intelligence, while not caring that our bodies are suboptimal in many ways because of evolution. But it doesn't automatically follow that the creator is the Christian God, and that I should therefore go running back to my old religion, automatically accepting everything the Bible says is true. If the Christian God is real, I need reasons to believe that, not just reasons to believe that there must have been a creator.

We can't know for sure what we will experience, if anything, after we die. However, there is overwhelming evidence that intelligence, memory, and personality are all dependent on the brain. All of these things can be seriously damaged when a brain is injured. Therefore, it's reasonable to believe that when a person's brain stops functioning altogether at death, that person ceases to exist. So heaven is probably just man-made wishful thinking. And hell, a place that some people I love are now worried that I will eventually go, is just something that someone thought up to scare people into submission. That strategy has proven to be an effective way of spreading more than one religion, but it's irrational to decide what to believe on the basis of fear. For starters, I don't think it's possible to "decide to believe" something unless one is actually convinced that it's true; at best, I could pretend to believe, and a God who knows my heart and decides my eternal destiny based on what I believe would see right through all pretense. Besides, more than one religion claims eternal damnation in hell for those who don't believe, so which one am I supposed to believe? The one I was raised in? No; I insist that logic and evidence are the only reasonable foundation for our beliefs. So both the promise of an eternal reward and the threat of eternal punishment are irrelevant to me. Though I can't be certain, I think it's most likely that when I die, I will cease to exist. What I believe in the meantime will be based on logic and evidence, as I understand them, not on hope or fear of an afterlife.

I believe that, in the absence of evidence for a god or an afterlife, we should live as though we're on our own and this life is all there is. This, I think, is where my beliefs collide most directly with Christianity. Christians believe that this world is ultimately doomed, and that it's most important to prepare for eternity, meaning the world and the life after this one. Frankly, I'm afraid that if the first part of that belief is taken to heart, it may become a self-fulfilling prophecy. A belief that God will eventually set things right also seems to encourage apathy about the state of this world; at least it did that in me for a time. I now believe that this world and this life are all we can be certain that we have, and when we see something wrong with this world, we should consider what we can do, if anything, to make it better. In particular, in a country like the US that has at least a semblance of democratic government, it really matters whom we vote for and what causes we support. We should all try to make this world a better one for all of us. It sounds trite, but I truly believe it's the best goal to which we can aspire.

Page 1 / 1