After taking a look at companies using GraphQL last week, we decided to dive into the container orchestration phenomenon.
The rampant success of Kubernetes likely stems from the fact that, unlike most open source projects, it had the benefit of being refined for years in production at one of the world's largest tech companies. Before Kubernetes came Borg, Google's own internal "container-oriented cluster-management system". While there are differences between the two, Kubernetes is essentially the result of the lessons learned with Borg building clustered, containerized applications.
Since its release in 2014, over 1700 people have contributed to the Kubernetes project, and it's been added to over 2000 stacks on StackShare. This week, we're taking a look at some of the companies using Kubernetes in production and why.
Buffer
Buffer is a social media management platform with several fantastic blogs on engineering, company culture, and marketing.
After six years of pushing code to a monolith with a distributed team, Buffer decided to move to a service-oriented architecture in 2016. The team had already begun using Docker for their development environments, and they wanted a solution that worked well with containers.
After also evaluating Mesosphere and Amazon ECS, the Buffer crew settled on Kubernetes. They were drawn in by its relative maturity, large community, and cloud-provider-agnosticism.
They began the migration process by rebuilding a piece of the monolith's functionality into its own, self-contained service. The goal of the project was to set a shining example for other microservices at Buffer moving forward. After the success of this project, they looked towards breaking out other pieces of the monolith.
Two years later, Buffer is running 3 k8s clusters with over 140 services. They've faced some challenges with deployments along the way that they're solving by moving onto Helm, a package manager for Kubernetes that allows developers to define Charts to describe k8s configuration.
GitHub
GitHub also began their move to Kubernetes in 2016, and as of August 2017, all of their web and API requests were being served via k8s clusters.
Before the move, they were on an architecture that hadn't changed much in their eight years of existence: a Rails app running on Unicorn. This architecture worked fine in the early days with a small team and limited traffic, but as those things grew, so did their need for something more scalable.
GitHub had already begun to break out some of their services based on the teams managing them. However, these individual services required a lot of human support from SRE to maintain and provision. This became a major bottleneck in deploying new services, and it was clear they needed a self-service platform that product engineers could use.
Just like Buffer, GitHub evaluated the container-orchestration options and walked away feeling like Kubernetes had the most vibrant community. They were also impressed by the fact that their first small cluster and application only took a few hours to deploy. After some more small experiments, they were ready to hop on board the k8s-express.
Rather than starting with a smaller piece of the application like authentication, GitHub decided to go full-steam with Kubernetes and migrate their core application first. Their reasoning? They wanted to make sure their work was suitable for large applications - not just microservices. The engineers also knew that migrating the core app would lead to greater adoption of Kubernetes throughout GitHub.
The end result of the migration is product engineers are now able to deploy their own applications to Kubernetes clusters. Site-reliability engineers at GitHub no longer need to provide configuration management and provisioning support for things like these, and the whole organization can move faster.
Pokémon Go
One of the biggest testaments to the benefits of Kubernetes is the massive success of Pokémon Go. At its peak, the mobile game was supporting 25 million users, which was 50x their initial target. Although they did suffer from some performance issues at launch, the team was able to quickly scale up to accomodate the massive influx of traffic.
Unlike some of the other applications mentioned in this article, Pokémon Go was built from the ground up on Kubernetes and GCE (now Google Kubernetes Engine). At the time, it was the largest Kubernetes cluster ever deployed on Google Cloud.
One of the most interesting things about Pokémon Go's use of k8s is their bold decision to upgrade to GKE ahead of their launch in Japan. The switch would enable them to provision over a thousand additional nodes for their cluster, but they likened it to "swapping out the plane's engine mid-flight". Ultimately, this turned out to be a great idea when Japanese players signed up in numbers 3x larger than the US.
OpenAI
As its name suggests, OpenAI is a non-profit AI research company dedicated to "safe artificial general intelligence". Basically, they want to prevent a Terminator scenario.
The work they do is meant to be shared and distributed. Unlike many of the other companies we've mentioned who are running applications on k8s clusters, OpenAI is running deep learning experiments at a large scale. They primarily make use of Kubernetes for batch scheduling and autoscaling their experiments with low latency.
The team at OpenAI began using Kubernetes in 2016 after seeking a low-cost, highly portable. This solution allows them to run GPU-intensive experiments on-premises, CPU-intensive experiments in the cloud, and some experiments on whichever cluster has enough capacity.
Today, they've scaled their largest Kubernetes cluster to over 2500 nodes on Azure. Experiments that previously took months to deploy can now be done in weeks.
The New York Times
Just a few years ago, The New York Times was still running everything on its own private data centers. As their number of cloud services grew, they began to look to the future and began experimenting with Kubernetes running on GKE.
In 2017, they launched their first production Kubernetes application: the nytimes.com mobile site. Since dipping their toes in with this small experiment, they've completely dived into the k8s pool. Today the majority of their customer-facing applications are running on Kubernetes.
The biggest impact has been to speed of deployment and productivity. Legacy deployments that took up to 45 minutes are now pushed in just a few. It's also given developers more freedom and less bottlenecks. The Times has gone from a ticket-based system for requesting resources and weekly deploy schedules to allowing developers to push updates independently.
On an organizational level, the move to Kubernetes has also been part of a larger effort to get developers on the same page. Many teams were previously building custom deployment solutions and many architectures required storytelling to understand.
Box
Box began splitting their monolithic PHP application into microservices before Kubernetes even existed. Their process at that time for deploying a new service involved first requesting dedicated hardware (yes hardware) from the operations team, getting that set up over weeks or months, writing Puppet configs for a few weeks, and then eventually launching the application in production.
The process was so tedious that some teams didn't even bother setting up a staging environment, because much of the work had to be replicated for that.
When they started the hunt for an internal PaaS, Kubernetes was in its infancy and not an obvious choice. The Box team was already sold on Docker, though, and eventually the options narrowed to Kubernetes and Mesos. Kubernetes eventually won due to some shared philosophical views of what container orchestration should look like and the fact that the Box team really trusted the Kubernetes team.
They decided to start small and initially deployed the "Box API checker" that monitors uptime on Kubernetes. After the success of that project, they incrementally deployed more and more of ther services onto the Kubernetes cluster, starting with small processes and daemons and moving up to larger services. Since they were such an early adopter, the engineers sometimes had to wait for new features to be released before further migrations were even viable.
The bet on Kubernetes ultimately catapulted Box into the future. The workflow to launch a new service that previously took up to 6 months now takes less than a week.
Want more of these sorts of posts? Sign up for StackShare Weekly to get this hotness in your inbox once a week!