Charly Vega

Founder & CTO @BetaLabs. Academic. Writes stories on software architecture and design. Fueled by curiosity.

Why Docker makes sense for startups

Published Oct 11, 2017Last updated Apr 08, 2018

Mandatory container metaphor is mandatory. Credit: chuttersnap

I’ll go out on a limb and guess you’ve probably heard of Docker by now. Regardless of the answer, we can pretty safely acknowledge it’s steadily crawling its way up to become the de facto standard to develop and run containerized applications.

But while this brilliant little piece of technology might have long ago been making sense to sysadmins and PaaS providers, we’ve personally been hearing rather little from startups regarding their Docker adoption; particularly the 1–to-10-employee-strong ones. An impression that somewhat correlates with Datadog HQ’s recent research:

…guess this story could have been more timely written in 2015.

So in case you’re still unsure of whether it’s worth the trouble, we thought we’d throw our two cents on the matter and convey just how much adopting a container-friendly architecture has helped our startup, and why you should probably take Docker for a spin if you haven’t yet.

(TL;DR take the red pill.)

Development experience

If you work in a small two pizza startup, there’s a high chance people in your team are a highly multidisciplinary lot. But as soon as projects stop being siloed into things like one-person front ends and back ends, chances are you’ll be given a warm welcome into development environment hell.

Consider the simple scenario of a front-end engineer needing a not-yet-in-production API from a back end. Typically you’d overcome this by making do with mocked data, or setting up staging environments (which are great and all, mind you,) but often times nothing beats the agility of running integrations against the back end code itself.

Tools like docker-compose did wonders for us here. All a newcomer has to do is install a single thing — albeit a large thing — and one invocation of docker-compose up will have Docker setup everything for you so you can jump straight back into coding. And to top it all off, the declarative nature of these tools provides a dead simple description of how runtime components talk to each other, making reasoning about your top-level architecture all the easier.

services:
  redis:
    image: redis:alpine
    ports:
      - "6379"
    networks:
      - frontend
  db:
    image: postgres:9.4
    volumes:
      - db-data:/var/lib/postgresql/data
    networks:
      - backend
  vote:
    image: dockersamples/examplevotingapp_vote:before
    ports:
      - 5000:80
    networks:
      - frontend
    depends_on:
      - redis
  result:
    image: dockersamples/examplevotingapp_result:before
    ports:
      - 5001:80
    networks:
      - backend
    depends_on:
      - db
  worker:
    image: dockersamples/examplevotingapp_worker
    networks:
      - frontend
      - backend
networks:
  frontend:
  backend:
volumes:
  db-data:

Simplified version of Docker's example-voting-app compose file

Portability

Far from being useful in development only, Docker also brought us untold simplicity when packaging our code for production. And for a very simple reason: it makes development and production environments all the more symmetric — a point much more eloquently put by 12factor’s dev/prod parity. And while we’ve long had great language- specific tools like rbenv and nvm to safeguard us from things like runtime version mismatches, you would typically outpace their capabilities should your code depend on some obscure native binaries or a particular file system structure. Containers go the extra mile here, allowing us to package our application together with exactly the kind of environment they need.

And this same portability also shines on hybrid-cloud setups, a point for which I need not tell you much more than our story migrating our cloud.

Unhappy with our cloud provider’s poor reliability and support at the time, we decided to make the switch to the undisputed king of IaaS-and-beyond, AWS. Having foreseen this migration would take place sooner than later, we’d been quietly migrating our applications to run on Docker for a few months by then. So when the time came to say farewell to our old cloud, the whole transition process took little more than a couple of days. Granted, such a drastic transition could well be considered a rare event, but I’ve personally never found erring on the side of flexibility to be problematic.

It’s worth noting it’s not all about apps though: cross-cutting concerns such as monitoring and logging, while oh-so-easy to solve with hosted turnkey solutions, can easily be replaced with containerized open-source solutions that are becoming easier than ever to set up, leaving you in a much better position to avoid cloud jail.

Orchestration

Whether you need an orchestration system or not is frankly not the right question; it’s whether you want to have it be self-managed or have you be the human orchestrator fixing downtimes manually at 3 AM.

The somewhat cute analogy here is having to care for of a lot of moving parts: as software systems become more complex and fragmented at runtime, they also become increasingly fragile when faced with network partitioning. Now, containers on their own don’t solve this problem — quite the opposite in fact. Their intrinsically ephemeral nature makes your system ever so dynamic, making it difficult to set dependencies in stone at deploy time. Scale to a clustered infrastructure, and the situation worsens to the point of never being certain of where your processes might end up running, making locating and addressing them all the more difficult. But it is precisely the need to embrace this nature that gives way to a whole host of solutions.

Having tried several clustering systems (you might have heard of Google’s Kubernetes, Mesosphere’s Marathon, Hashicorp’s Nomad, etc.,) we ultimately settled on Docker’s own Docker Swarm for most of our deployments, using the brilliantly simple Docker for AWS CloudFormation template. After declaratively expressing the desired state of your system in terms of the services it should run, Swarm works on constantly monitoring the actual state of your containers, reconciliating the desired state by rescheduling the workload to other nodes in the event of a node failure, and self-healing the cluster by re-provisioning new servers should a node were to become unrecoverable.

And while provisioning your own container cluster may well escape your needs, new Containers-as-a-Service platforms are popping up left and right, often to no additional cost than your underlying infrastructure usage.

Who needs kittens when you’ve got cartoon whales.

In between service discovery, load-balancing, software-defined networks, persistent storage, task scheduling and RAFT consensus; a scary but fun ride through a whirlpool of cool-sounding jargon is guaranteed.

Cutting down your infrastructure bill

Surely you don’t need yet another article on “how we shaved our server costs by {{ rand_amount }} after switching to {{ rand_language }}”, so I’ll try and come up with something different.

Since microservices are all the rage these days, we’ve come to split our applications into several different services here at Beta Labs. This approach allows us to mix and match different languages and frameworks so we’re free to work with the best tool for the job, every time (please bear with me, trying to make a case for microservices in 10 words or less over here.)

But under this scenario, sticking by 12factor’s “one codebase, many deploys” means each service should get deployed as its own application in PaaS parlance; which funnily enough happens to be precisely how most PaaS pricing models scale.

Let’s throw some numbers at it. Running a highly available setup for a Ruby app in Heroku means running at least two web Standard 1X dynos, setting you back $50 per month for a total of 1 application constrained to 512MB of memory. So that’s $50 GB/mo for front-end services. Add one worker dyno for simple background processing, and that’s a further $25/mo. Let’s say you also have a couple of lightweight back-end services (e.g. a piece of middleware or broker with custom logic) that could make do with just 1 instance each, and you’ve gone over the $100/mo. mark with ease.

Oh, and that’s before we start talking add-ons: add a further $30/mo. for a basic Redis and a PostgreSQL instance. Should you wish to retain your logs for a bit longer (Heroku’s Logplex is designed for streaming only), you’ll also be adding a logging service that can hopefully be shared across apps.

Let’s see how we could do better.

One VM per (monolithic) service vs. multiple (micro) services per VM. Credit: Martin Fowler

Borrowing from Martin Fowler’s description of microservices, the combined use of containers with a clustering system provides a beautifully fitting platform for dynamically scaling your services. Our containers get intelligently placed on nodes with the most available resources, and because all nodes share an internal SDN, your services get to talk to each other without ever leaving the cluster.

A 3-node-strong Swarm cluster running the example-voting-app

Going back to our example from earlier, such a system would comfortably fit on a 3-node, t2.micro- based Docker Swarm cluster, which clocks in at roughly $50/mo. for a total of 3GB of memory, and even be left with extra headroom to run your own containerized Redis instances if you should feel so daring.

Granted, Heroku’s dynos are a lot more gifted in the CPU department (8 virtual cores against 1), but unless you’re running on a language with native threads, a multi-process-per-dyno setup could make you find 512MB of memory insufficient quite quickly. Not to mention it won’t make much of a difference if your workload is mostly I/O intensive.

Don’t get me wrong, as far as making DevOps a non-issue goes, it really doesn’t get much better than Heroku in my humble opinion. And of course I’m not suggesting you or anyone in your team should go it alone and spend their nights learning how to get high availability setups right in PostgreSQL — we’d be comparing apples to oranges here. But I do nonetheless feel it’s important to point out you are paying extra for all that reliability and ease of use, and with that you can judge for yourself what is actually worth the price and what you can probably get done yourself.

Oh, while we’re at it, don’t forget you can run your Docker containers in Heroku.

Inherent security

While this argument won’t hold much water when comparing the Docker platform to a PaaS, you’ll find the risk of certain vulnerabilities when compared to your good’ol Ubuntu box running multiple apps to be largely reduced.

Why is it any different? Enter Linux containers. To many of us, an intriguing concept once subtly presented by the likes of Heroku when reading through their guides, they now sit at the very core of Docker. And with them comes a much appreciated security feature: isolation.

Take the worst case scenario of someone executing code remotely inside your server. (Sounding too far-fetched? Check out ImageTragick.) As applications tend to have a one-to-one relationship with containers, you should at least be able to isolate the damage to that application’s domain, keeping whatever else you choose to run on your hosts safe(-ish).

This is a similar characteristic to what VMs have provided for quite some time now, only that due to their slightly more rigid nature (longer boot-up and provisioning times, running full operating systems), one could be forgiven for giving them longer lifecycles and treating them more like pets than cattle, running more apps and thus potentially allowing for more secrets to be compromised.

Of course, this is not to say you’ll be immune to developer malpractice (you certainly wouldn’t want access to the host’s Docker daemon be compromised,) but containerized environments do help in reducing your attack surface as an organization.

Just be cautious and don’t keep your images public (cheap shot, I know).

You feel like it

Okay this one might be completely biased by what our inner geeks find to be motivating, but…

While we can’t say we haven’t had to work around some rough edges early on, and while I may have to admit to being drawn to hipster tools rather easily, one should also be able to unapologetically add new tools to their arsenal if they feel it will contribute to their happiness as an engineer — wasn’t that part of the whole selling point for startups in the first place?

And in the event that you should decide to go Docker-less, you’ll almost certainly find being a little container-savvy to be handy in years to come.

Conclusion

So, was it a silky-smooth road to containerized paradise? Hell, no. Could we have settled with more stable tools until Docker’s rough edges were fully polished? Probably. Would we have completely failed as a startup if we hadn’t adopted Docker? Most definitely not.

Would we invest in adopting containers again? A resounding yes is in order.

The points we just rehearsed are far from being exclusive to startups; I’d even go as far as to say company size is nearly irrelevant (rest assured, my endorsement won’t jeopardize Docker’s reputation among the more corporate kind either way.)

And neither are we advocating that Docker is the only way to solve these timeless problems. And we haven’t talked much about its downfalls, admittedly. But for now, it does remain the closest one-stop shop solution to all of the arguably commonplace problems we presented above, and that’s saying something.

All in all, it’s pretty safe to say containers are here to stay — oh wait, did you hear about this whole serverless thing? Come to think of it, containers are so old-fashioned...

Docker Startups System administration Containers DevOps

Report

Enjoy this post? Give Charly Vega a like if it's helpful.

Charly Vega

Founder & CTO @BetaLabs. Academic. Writes stories on software architecture and design. Fueled by curiosity.

Discover and read more posts from Charly Vega

get started