Ensuring a smooth Kubernetes Dockershim Deprecation with Chaos Engineering
Kubernetes 1.20 is scheduled to be released next week, and this version contains a number of amazing enhancements including graceful node shutdown, more visibility into resource requests, and snapshotting volumes. But the change generating the most buzz is the deprecation of Docker as a container runtime.
Much of the discussion revolves around a misunderstanding of what the Docker deprecation entails and how it affects both Kubernetes administrators and application developers.
Let’s dive into what this deprecation means for you and how you can use Chaos Engineering to ensure a smooth transition off of the Docker runtime.
Docker isn't just a container runtime
They say that naming things is one of the hardest things in technology—and this is certainly true with Docker. Often when discussing Docker, there’s some confusion around whether this refers to the container images or the application that runs those containers. The Kubernetes deprecation is for the application that runs your containers. But don’t worry, this doesn’t mean your Kubernetes cluster will stop running your Docker containers!
Docker, like most applications, is actually a collection of smaller applications. There are sub-applications for the UI, an API, and many other things, including the container runtime. In 2016, Docker spun off its container runtime into a new, more module runtime project called containerd. Containerd is fully supported by Kubernetes, so your Docker containers are also fully supported by Kubernetes.
The piece that’s being deprecated is the support for all of the other sub-applications that come along with Docker. Supporting all of those pieces required an integration layer called dockershim that required additional work to maintain—work that is largely unnecessary because most Kubernetes users only needed containerd and not the extra Docker features. Deprecating this feature will actually reduce the workload on the Kubernetes maintainers and make Kubernetes less complex.
Updating your container runtime
One of the most powerful features of Kubernetes is its modularity. This makes it easy to change your container runtime. The process to change the default container runtime on a node is straightforward:
- Install your new container runtime.
- Update your kubelet configuration with the extra arguments <span class="code-class-custom">--container-runtime=remote</span> and <span class="code-class-custom">--container-runtime-endpoint=unix:///run/containerd/containerd.sock</span> (or <span class="code-class-custom">unix:///var/run/crio/crio.sock</span> if you’re using CRI-O).
- Drain the node to reschedule any Kubernetes workloads to other nodes.
- Restart the kubelet service.
- Uncordon the node to allow Kubernetes to schedule workloads on the updated node.
For specific steps, see the cri-o documentation or the containerd documentation. For more information about runtimes and transitioning from the Docker runtime to containerd, I highly recommend Ana Calin’s session from KubeCon EU 2019:
Ensuring a smooth migration with Chaos Engineering
While changing the runtime is a simple process in theory, updating an entire Kubernetes cluster can be a bit more complex and requires some planning. One important aspect to consider is whether your cluster has enough resources to support all of your workloads while a node (or multiple nodes if upgrading several at the same time) is being updated.
Chaos Engineering can help. Using Gremlin’s CPU, Memory, and IO attacks, you can simulate additional resource pressure on your cluster to validate that they have enough overhead. You can use Gremlin's Shutdown attack to force Pod rescheduling by halting a Pod to ensure it gets scheduled onto another node. As you expand your testing, you can use Gremlin's Blackhole attack to simulate the loss of a node without actually removing it from the cluster. Using Chaos Engineering to prepare for a container runtime change has the additional benefit of ensuring your Kubernetes cluster is reliable for unplanned node outages as well!
Gremlin supports containerd and cri-o and will automatically detect your runtime during installation. When using our Helm chart, you can also specify your runtime by adding <span class="code-class-custom">--set gremlin.container.driver=crio-runc</span> or <span class="code-class-custom">--set gremlin.container.driver=containerd-runc</span> to your <span class="code-class-custom">helm install</span> command. To learn more, please see the Gremlin documentation.
To learn more about Chaos Engineering for Kubernetes, check out our guide to running chaos experiments on Kubernetes.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.sTART YOUR TRIAL
What is Failure Flags? Build testable, reliable software—without touching infrastructure
Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to...
Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to...Read more
Introducing Custom Reliability Test Suites, Scoring and Dashboards
Last year, we released Reliability Management, a combination of pre-built reliability tests and scoring to give you a consistent way to define, test, and measure progress toward reliability standards across your organization.
Last year, we released Reliability Management, a combination of pre-built reliability tests and scoring to give you a consistent way to define, test, and measure progress toward reliability standards across your organization.Read more