What is Docker Swarm and what are its top alternatives?
Docker Swarm is a container orchestration tool that allows users to manage a cluster of Docker hosts. Key features include easy scalability, automatic load balancing, service discovery, and rolling updates. However, Docker Swarm has limitations such as limited support for advanced networking features and lack of built-in health checks.
- Kubernetes: Kubernetes is one of the most popular container orchestration tools. Key features include automatic scaling, service discovery, self-healing, and declarative configuration management. Pros compared to Docker Swarm include a larger community, more advanced networking options, and support for multiple container runtimes. However, Kubernetes has a steeper learning curve compared to Docker Swarm.
- Apache Mesos: Apache Mesos is a distributed systems kernel that provides resource isolation and sharing across distributed applications. Key features include fault tolerance, scalability, and support for multiple frameworks such as Marathon and Chronos. Pros compared to Docker Swarm include better support for large-scale deployments and diverse workloads. However, Mesos has a more complex architecture than Docker Swarm.
- Amazon ECS: Amazon Elastic Container Service (ECS) is a managed container orchestration service provided by AWS. Key features include integration with other AWS services, scalability, and support for Docker containers. Pros compared to Docker Swarm include seamless integration with AWS services and easy deployment on AWS infrastructure. However, ECS ties users to the AWS ecosystem.
- Nomad: Nomad is a flexible and easy-to-use orchestration tool by HashiCorp. Key features include support for multiple workload types, dynamic scheduling, and scalable architecture. Pros compared to Docker Swarm include support for multiple datacenter deployments and a simpler architecture. However, Nomad may lack some advanced features compared to Kubernetes.
- Rancher: Rancher is a complete platform for container management that supports Kubernetes, Docker Swarm, and other orchestrators. Key features include multi-cluster management, monitoring, and centralized authentication. Pros compared to Docker Swarm include support for multiple orchestrators and a user-friendly UI. However, Rancher may introduce additional complexity in managing multiple orchestrators.
Top Alternatives to Docker Swarm
- Docker Compose
With Compose, you define a multi-container application in a single file, then spin your application up in a single command which does everything that needs to be done to get it running. ...
- Rancher
Rancher is an open source container management platform that includes full distributions of Kubernetes, Apache Mesos and Docker Swarm, and makes it simple to operate container clusters on any cloud or infrastructure platform. ...
- Ansible
Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. Ansible’s goals are foremost those of simplicity and maximum ease of use. ...
- Apache Mesos
Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers. ...
- CoreOS
It is designed for security, consistency, and reliability. Instead of installing packages via yum or apt, it uses Linux containers to manage your services at a higher level of abstraction. A single service's code and all dependencies are packaged within a container that can be run on one or many machines. ...
- Kubernetes
Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the users declared intentions. ...
- Compose
Compose makes it easy to spin up multiple open source databases with just one click. Deploy MongoDB for production, take Redis out for a performance test drive, or spin up RethinkDB in development before rolling it out to production. ...
- Docker Cloud
Docker Cloud is the best way to deploy and manage Dockerized applications. Docker Cloud makes it easy for new Docker users to manage and deploy the full spectrum of applications, from single container apps to distributed microservices stacks, to any cloud or on-premises infrastructure. ...
Docker Swarm alternatives & related posts
- Multi-container descriptor123
- Fast development environment setup110
- Easy linking of containers79
- Simple yaml configuration68
- Easy setup60
- Yml or yaml format16
- Use Standard Docker API12
- Open source8
- Go from template to application in minutes5
- Can choose Discovery Backend5
- Scalable4
- Easy configuration4
- Kubernetes integration4
- Quick and easy3
- Tied to single machine9
- Still very volatile, changing syntax often5
related Docker Compose posts
Our whole DevOps stack consists of the following tools:
- GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
- Respectively Git as revision control system
- SourceTree as Git GUI
- Visual Studio Code as IDE
- CircleCI for continuous integration (automatize development process)
- Prettier / TSLint / ESLint as code linter
- SonarQube as quality gate
- Docker as container management (incl. Docker Compose for multi-container application management)
- VirtualBox for operating system simulation tests
- Kubernetes as cluster management for docker containers
- Heroku for deploying in test environments
- nginx as web server (preferably used as facade server in production environment)
- SSLMate (using OpenSSL) for certificate management
- Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
- PostgreSQL as preferred database system
- Redis as preferred in-memory database/store (great for caching)
The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:
- Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
- Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
- Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
- Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
- Scalability: All-in-one framework for distributed systems.
- Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
Recently I have been working on an open source stack to help people consolidate their personal health data in a single database so that AI and analytics apps can be run against it to find personalized treatments. We chose to go with a #containerized approach leveraging Docker #containers with a local development environment setup with Docker Compose and nginx for container routing. For the production environment we chose to pull code from GitHub and build/push images using Jenkins and using Kubernetes to deploy to Amazon EC2.
We also implemented a dashboard app to handle user authentication/authorization, as well as a custom SSO server that runs on Heroku which allows experts to easily visit more than one instance without having to login repeatedly. The #Backend was implemented using my favorite #Stack which consists of FeathersJS on top of Node.js and ExpressJS with PostgreSQL as the main database. The #Frontend was implemented using React, Redux.js, Semantic UI React and the FeathersJS client. Though testing was light on this project, we chose to use AVA as well as ESLint to keep the codebase clean and consistent.
- Easy to use103
- Open source and totally free79
- Multi-host docker-compose support63
- Load balancing and health check included58
- Simple58
- Rolling upgrades, green/blue upgrades feature44
- Dns and service discovery out-of-the-box42
- Only requires docker37
- Multitenant and permission management34
- Easy to use and feature rich29
- Cross cloud compatible11
- Does everything needed for a docker infrastructure11
- Simple and powerful8
- Next-gen platform8
- Very Docker-friendly7
- Support Kubernetes and Swarm6
- Application catalogs with stack templates (wizards)6
- Supports Apache Mesos, Docker Swarm, and Kubernetes6
- Rolling and blue/green upgrades deployments6
- High Availability service: keeps your app up 24/76
- Easy to use service catalog5
- Very intuitive UI4
- IaaS-vendor independent, supports hybrid/multi-cloud4
- Awesome support4
- Scalable3
- Requires less infrastructure requirements2
- Hosting Rancher can be complicated10
related Rancher posts
Ansible
- Agentless284
- Great configuration210
- Simple199
- Powerful176
- Easy to learn155
- Flexible69
- Doesn't get in the way of getting s--- done55
- Makes sense35
- Super efficient and flexible30
- Powerful27
- Dynamic Inventory11
- Backed by Red Hat9
- Works with AWS7
- Cloud Oriented6
- Easy to maintain6
- Vagrant provisioner4
- Simple and powerful4
- Multi language4
- Simple4
- Because SSH4
- Procedural or declarative, or both4
- Easy4
- Consistency3
- Well-documented2
- Masterless2
- Debugging is simple2
- Merge hash to get final configuration similar to hiera2
- Fast as hell2
- Manage any OS1
- Work on windows, but difficult to manage1
- Certified Content1
- Dangerous8
- Hard to install5
- Doesn't Run on Windows3
- Bloated3
- Backward compatibility3
- No immutable infrastructure2
related Ansible posts
Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).
It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up
or vagrant reload
we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.
I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up
, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.
We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.
If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.
The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it: 1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around. 2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management. 3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally. 4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).
Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.
Heroku was a decent choice to start a business, but at some point our platform was too big, too complex & too heterogenic, so Heroku started to be a constraint, not a benefit. First, we've started containerizing our apps with Docker to eliminate "works in my machine" syndrome & uniformize the environment setup. The first orchestration was composed with Docker Compose , but at some point it made sense to move it to Kubernetes. Fortunately, we've made a very good technical decision when starting our work with containers - all the container configuration & provisions HAD (since the beginning) to be done in code (Infrastructure as Code) - we've used Terraform & Ansible for that (correspondingly). This general trend of containerisation was accompanied by another, parallel & equally big project: migrating environments from Heroku to AWS: using Amazon EC2 , Amazon EKS, Amazon S3 & Amazon RDS.
- Easy scaling21
- Web UI6
- Fault-Tolerant2
- Elastic Distributed System1
- High-Available1
- Not for long term1
- Depends on Zookeeper1
related Apache Mesos posts
Docker containers on Mesos run their microservices with consistent configurations at scale, along with Aurora for long-running services and cron jobs.
- Container management20
- Lightweight15
- Systemd9
- End-of-lifed3
related CoreOS posts
As the basis of our new infrastructure, we formerly used CoreOS (and transitioned towards Fedora CoreOS as CoreOS was reaching its EOL) as a reliable solution for our docker-server-instances. We plan to deploy all our servers as individual docker containers to make use of the extensive possibilties offered in terms of isolation, resource-managemant (cgroups) and scalability.
The additional abstraction through containers allows us to adhere very closely to the "Cattle not Pets" best practice. Serverless was also an option that we considered, but as running Minecraft-Server requires quite unique resource profiles, that are usually not covered at most cloud providers, we settled with CoreOS for the time being and will reevaluate our options in the years to come.
We only use Ansible for some limited cluster-management, irregular maintenance tasks and low-level docker debugging and re-configuration on the individual servers, as we chose CoreOS (Fedora CoreOS) as our operating system and setup is done with an ignition-configuration. That is why we don't need to have a playbook for setting up servers or individual services. The servers boot up, completely initialized and ready to use.
Kubernetes
- Leading docker container management solution164
- Simple and powerful128
- Open source106
- Backed by google76
- The right abstractions58
- Scale services25
- Replication controller20
- Permission managment11
- Supports autoscaling9
- Cheap8
- Simple8
- Self-healing6
- No cloud platform lock-in5
- Promotes modern/good infrascture practice5
- Open, powerful, stable5
- Reliable5
- Scalable4
- Quick cloud setup4
- Cloud Agnostic3
- Captain of Container Ship3
- A self healing environment with rich metadata3
- Runs on azure3
- Backed by Red Hat3
- Custom and extensibility3
- Sfg2
- Gke2
- Everything of CaaS2
- Golang2
- Easy setup2
- Expandable2
- Steep learning curve16
- Poor workflow for development15
- Orchestrates only infrastructure8
- High resource requirements for on-prem clusters4
- Too heavy for simple systems2
- Additional vendor lock-in (Docker)1
- More moving parts to secure1
- Additional Technology Overhead1
related Kubernetes posts
How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:
Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.
Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:
https://eng.uber.com/distributed-tracing/
(GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)
Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark
To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.
Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.
We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.
Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.
Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.
#BigData #AWS #DataScience #DataEngineering
Compose
- Simple to set up42
- One-click mongodb32
- Automated Backups29
- Designed to scale23
- Easy interface21
- Fast and Simple13
- Real-Time Monitoring10
- Fastest MongoDB Available7
- Great Design6
- REST API6
- Easy to set up4
- Free for testing3
- Geospatial support3
- Elasticsearch2
- Heroku Add-on2
- Automated Health Checks1
- Email Support1
- Query Logs1
related Compose posts
We went with MongoDB , almost by mistake. I had never used it before, but I knew I wanted the *EAN part of the MEAN stack, so why not go all in. I come from a background of SQL (first MySQL , then PostgreSQL ), so I definitely abused Mongo at first... by trying to turn it into something more relational than it should be. But hey, data is supposed to be relational, so there wasn't really any way to get around that.
There's a lot I love about MongoDB, and a lot I hate. I still don't know if we made the right decision. We've been able to build much quicker, but we also have had some growing pains. We host our databases on MongoDB Atlas , and I can't say enough good things about it. We had tried MongoLab and Compose before it, and with MongoDB Atlas I finally feel like things are in a good place. I don't know if I'd use it for a one-off small project, but for a large product Atlas has given us a ton more control, stability and trust.
- Easy to use9
- Seamless transition from docker compose2