How to ensure your Kubernetes Pods have enough memory

Memory (or RAM, short for random-access memory) is a finite and critical computing resource. The amount of RAM in a system dictates the number and complexity of processes that can run on the system, and running out of RAM can cause significant problems, including:

System-wide lockups
Terminated processes
Increased disk activity
Lower service reliability and throughput

This problem can be mitigated using clustered platforms like Kubernetes, where you can add or remove RAM capacity by adding or removing nodes on-demand. But even in platforms like these, where memory is dynamic, resource management is still an important task that DevOps teams need to stay on top of.

In this blog, we'll discuss the importance of setting memory requests on Kubernetes deployments, how Kubernetes allocates RAM to pods, and what the risks are if you don't set requests. More importantly, we'll show you how Gremlin detects missing memory requests and helps you resolve it.

Looking for more Kubernetes risks lurking in your system? Grab a copy of our comprehensive ebook, “Kubernetes Reliability at Scale.”

What are memory requests and why are they important?

A memory request specifies how much RAM should be reserved for a pod's container. When you deploy a pod that needs a minimum amount of memory, such as 512 MB or 1 GB, you can define that in your pod's manifest using spec.containers[].resources.requests.memory. Kubernetes then uses that information to determine where to deploy the pod so it has at least the amount of memory requested. You can enter any value here as long as it follows Kubernetes' units syntax.

When deploying a pod without a memory request, Kubernetes has to make a best guess decision about where to deploy the pod. If the pod gets deployed to a node with a limited amount of free memory remaining, and the pod gradually consumes more memory over time, it could trigger an out of memory (OOM) event that terminates the pod. This could even make the pod unschedulable, which manifests as the dreaded CrashLoopBackOff status.

How do I measure and implement memory requests?

Before you can define a memory request, the important question to ask is: how much memory should I request? This will vary depending on the requirements of the application or service that you're deploying, how many instances you plan to deploy, and how much you expect it to grow. A simple web server may only need 30Mi, whereas a database might need 500Mi or more.

One way to estimate your memory requirements is to install the Kubernetes Metrics Server, deploy your application without memory requests, then observe to see how much it's using. For example, you can run kubectl top pod to get a list of your pods with their CPU and memory usage:

BASH


NAMESPACE            NAME                                                CPU(cores)   MEMORY(bytes)
bank-of-anthos       accounts-db-0                                       4m           45Mi
bank-of-anthos       balancereader-c6bff755b-t7v2j                       8m           203Mi
bank-of-anthos       contacts-6df47656c8-b2rv5                           8m           80Mi
bank-of-anthos       frontend-6f7b5f7f88-v6vdw                           41m          64Mi
ingress-nginx        ingress-nginx-controller-77945d74f8-nw7n9           1m           24Mi

From here, we can assume that our pods require at least this much. Let's look at the balancereader pod: it's currently using 203Mi of memory, so let's add a request for 250Mi to allow for a bit of overhead:

YAML


..
spec:
  containers:
  - name: balancereader
    image: balancereader:latest
    resources:
      requests:
        memory: "250Mi"
    ..

You can define the memory amount as a plain integer (measured in bytes), or using any of the suffixes defined here. The same goes for memory limits, but we'll cover those in a future blog. Once you're ready, deploy the manifest using kubectl or a similar tool, and Kubernetes will re-deploy the pod with the new request.

How do I validate that my memory requests are in place?

Once you've defined and deployed your memory request, there are a few ways you can test it to make sure it's working as expected. First, of course, is using kubectl get pods <pod name> to check that the request is part of the pod's definition. This is how Gremlin's Detected Risks feature works to detect missing memory requests automatically. As soon as you deploy the Gremlin agent, it determines which pods are missing a memory request definition and surfaces this to you in the Gremlin web app.

Using fault injection to validate your fix

Gremlin also lets you run Chaos Engineering experiments to test this more directly. For instance, if we deploy a Pod with a memory request of 250Mi and the pod only uses 100Mi, that leaves us 150Mi of overhead to expand into. Or, at least, it should. We can prove this by using a memory experiment to consume just enough memory to push our pod past the 250Mi mark. Then, we simply need to observe it to make sure it keeps running.

To test this using the Bank of Anthos example:

Log into the Gremlin web app at app.gremlin.com.
Select Experiments in the left-hand menu and select New Experiment.
Select Kubernetes, then select the balancereader Pod.
Expand Choose a Gremlin, select the Resource category, then select the Memory experiment.
For the Allocation Strategy, select "Bring system to amount" from the drop-down. This adjusts the amount of memory consumed to bring it to a certain amount - in this case, 250 Mi.
Change the Memory Amount from GB to MB and increase the amount to 265. We need to use a slightly higher value here due to the conversion from MB (megabytes) to MiB (mebibytes).
Click Run Experiment to start the experiment.

Now, if we watch our balancereader pod, we should see its memory usage spike:

BASH


kubectl top pod -n bank-of-anthos

‍


NAME                                 CPU(cores)   MEMORY(bytes)   
balancereader-c6bff755b-t7v2j        2m           265Mi

What similar Kubernetes resource risks should I be looking for?

We've already covered CPU requests in a previous blog. In addition to setting memory and CPU requests, consider setting limits. Limits—like the name implies—limits how much of either resource a pod can consume. For example, a limit of 4 GiB means a pod cannot use more than 4 GiB of memory. If it does, the host terminates the container with an out of memory (OOM) error.

To put it concisely: a request is the minimum amount of CPU/RAM that a pod needs to run, and a limit is the maximum amount of CPU/RAM that the host should give the pod.

If you'd rather not spend hours determining which Kubernetes risks are most pressing, download a copy of our ebook, “Kubernetes Reliability at Scale” for a comprehensive overview of system risks. Then use Gremlin’s Detected Risks feature to automatically find these risks and build a more resilient system.

No items found.

Start your free trial

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.

sTART YOUR TRIAL

K8s Reliability at Scale

To learn more about Kubernetes failure modes and how to prevent them at scale, download a copy of our comprehensive ebook

Get the Ultimate Guide