Getting started with DNS attacks
Whenever an online service goes down, you're likely to hear three words: "it was DNS!" Blaming DNS might be a running joke among network admins and engineers, but it's one rooted in experience. DNS problems are known for causing massive, Internet-wide outages such as the 2021 Akamai outage that temporarily made the websites for Delta Air Lines, American Express, Airbnb, and others unreachable. Since DNS is a critical component of modern networks, outages can have a huge impact, so teams must design their systems to be capable of withstanding and recovering from DNS problems.
In this blog, we'll take a deep dive into Gremlin's DNS attack. We'll look at how it works, how to use it, and how it can help you build responsive, fault-tolerant applications and systems.
How does a DNS attack work?
DNS—short for the Domain Name System—is a distributed system used to identify networked resources by name. It's most commonly used to map IP addresses to human-friendly names. For example, DNS is how you can access the Gremlin website by typing gremlin.com into your browser instead of an IP address. This abstraction also lets you map multiple systems or resources to a single DNS name for load balancing requests, proxying and routing requests, and assigning static names to systems with dynamic IP addresses.
The Gremlin DNS attack works by blocking all outgoing DNS traffic over the standard DNS port (port 53). This is effectively the same as running a Blackhole attack on port 53, only the DNS attack includes a built-in exception for <span class="code-class-custom">api.gremlin.com.</span> This ensures that the Gremlin agent can communicate with the Gremlin Control Plane while the attack is running, otherwise the agent would lose connection. This would in turn trigger a failsafe and automatically halt the attack. Like the other Gremlin network attacks, this attack uses traffic management tools built into the operating system and doesn't modify firewall or iptables rules.
By default, the attack drops all DNS traffic from the target. You can configure the attack to only impact traffic to specific DNS servers (by IP address), network devices, network protocols (TCP, UDP, or ICMP), and service providers (such as Amazon Route 53). The attack supports these parameters:
- <span class="code-class-custom">Length</span>: How long the attack runs for.
- <span class="code-class-custom">IP Addresses</span>: Restricts the attack to specific IP address(es). This field supports CIDR values (e.g. 10.0.0.0/24).
- <span class="code-class-custom">Device</span>: The network interface to impact traffic on. If left blank, Gremlin will target all network interfaces.
- <span class="code-class-custom">Protocol</span>: Which network protocol(s) to impact. The options are TCP, UDP, ICMP, or all.
- <span class="code-class-custom">Providers</span>: Which external service provider(s) to impact, if any.
- <span class="code-class-custom">Tags</span>: If one or more tags are selected, the attack will only impact traffic to the targets associated with those tags.
An example of using tags would be if you had dedicated DNS servers. After installing the Gremlin agent, you could add the following to your <span class="code-class-custom">/etc/gremlin/config.yaml</span> file. This would add a tag with the name <span class="code-class-custom">service</span> and the value <span class="code-class-custom">dns</span> to all agents sharing this tag, making it easy to target all of your DNS servers at once:
The above parameters make up what's called the magnitude of the attack. As with all Gremlin attacks, you can run a DNS attack on multiple hosts simultaneously. This is called the blast radius. You can also run a DNS attack on containers, Kubernetes resources, and Services.
When running your first DNS attack, start small by reducing the magnitude and blast radius to a single non-production host and a single DNS server. Keep in mind that a host may have multiple DNS servers configured, and you can target one or more of these servers individually by adding them to the IP Addresses field. If you're not sure which DNS server(s) your target device is using, you check its network configuration using the following commands.
Scroll down to your active network adapter, then look for the line starting with <span class="code-class-custom">DNS Servers</span>:
For Mac and Linux:
Try running a DNS attack against one of these servers by adding its IP address to the IP Addresses field. While the attack is running, try sending a network request from the target to a domain name (such as example.com). If the request is successful, then that indicates your system successfully failed over to the secondary DNS server. If not, you might not have your secondary configured correctly, or the secondary is unavailable. In either case, try changing to a different DNS server (such as Cloudflare's <span class="code-class-custom">184.108.40.206</span>), restart the attack, then repeat your test to see if that addresses the problem.
As you run these experiments, remember to record your observations, discuss the outcomes with your team, and track any changes or improvements made to your systems as a result. This way, you can demonstrate the value of the experiments you’ve run to your team and to the rest of the organization.
Why should you run DNS attacks?
If "it's always DNS" as the old adage goes, how can running DNS attacks help mitigate DNS-related issues? First, let's consider how DNS can fail (here's a quick introduction to the different types of DNS servers):
- A recursive resolver is down, causing DNS queries to time out or return errors.
- Your DNS provider's nameserver is down, preventing customers from resolving your website's address.
- Network saturation (or worse, a DDoS attack) is slowing down DNS queries or causing them to drop.
- A misconfigured Quality of Service (QoS) rule is causing the network to de-prioritize DNS traffic.
There are different ways of mitigating, avoiding, and recovering from DNS-related issues, such as:
- Configuring your systems with fallback DNS servers.
- Using multiple DNS providers.
- If you're using a cloud architecture, rerouting traffic to a different availability zone, region, or virtual private cloud (VPC).
Running DNS attacks lets you verify that these methods are successful at preventing outages.
Get started with DNS attacks
Now that you know how DNS attacks work, try running one yourself:
- Log into your Gremlin account (or sign up for a free trial).
- Create a new attack and select a host to target. Start with a single host to limit your blast radius.
- Under Choose a Gremlin, select the Network category, then select DNS.
- Set the Length of the attack.
- Optionally, enter the IP Addresses to drop traffic to, the network Device to impact, and the Protocol to impact (TCP or UDP). For convenience, you can select an external service Provider to target, as well as target specific hosts by Tags.
- ^If you leave all of these options at their default values, Gremlin will block all DNS traffic on the target.
- Click Unleash Gremlin to start the attack.
Measuring and observing the outcome of a DNS attack is pretty straightforward. While the attack is running, try making a network request from the target to another system using its DNS name. Even basic command-line tools like <span class="code-class-custom">ping</span> or <span class="code-class-custom">curl</span> will work for this:
If we run this before the attack, we get the following output:
If we run it during the attack, the command appears to hang for a few moments before displaying this:
This confirms that the attack has blocked all DNS traffic, which is what we'd expect. Now, let's test falling back to a secondary DNS server. First, we need to know the IP address of our primary server, which we can do by running <span class="code-class-custom">cat /etc/resolv.conf</span> or by using the <span class="code-class-custom">nslookup</span> command. The output's <span class="code-class-custom">Server</span> and <span class="code-class-custom">Address</span> fields will contain the IP address of the DNS server:
Now that we know our DNS server is at <span class="code-class-custom">192.168.122.1</span> , let's re-run the attack, only this time we'll put our DNS server's IP address in the IP Addresses field:
Now if we run the attack and run <span class="code-class-custom">nslookup</span>, we get the "Temporary failure in name resolution" message again. This means our secondary DNS server (assuming we had one configured) did not work. As a result, this simulated failure of our primary DNS server resulted in our target not being able to resolve DNS queries at all, which is a big problem unless all of our outbound network traffic uses IP addresses and not hostnames (which is extremely unlikely).
Depending on how you've configured your DNS settings, you may want to try different test cases or scenarios. For example, if your systems cache DNS entries locally, only add your external DNS servers to the IP Addresses field. If you've configured your local cache correctly, your systems should still be able to resolve hostnames even while the attack is running.
Once you feel comfortable running DNS attacks on a single host or service, increase the blast radius by selecting more targets. Gremlin also makes it easy to run DNS attacks targeting specific cloud DNS services, like Amazon Route 53. While configuring the attack, use the Providers drop-down to select the Route 53 service and region that you want to impact traffic to:
Now that you’ve run the attack, try using a Scenario. Scenarios allow you to run multiple attacks sequentially, as well as monitor the availability of the target system(s) using Health Checks. Health Checks can periodically contact a monitor that you provide before, during, and after a Scenario, and if the monitor returns a failed state or fails to respond successfully or within a window of time, then the Scenario will automatically halt. You can use this to set an upper bound and prevent latency from increasing too much. Gremlin also includes a Recommended Scenario for testing DNS outages in a Kubernetes cluster. Click on the card below to see this Scenario in the Gremlin web app.
This is an availability scenario for Kubernetes. This scenario will cause a DNS outage. We expect that the application will still be able to serve user traffic and operate as expected due to DNS failover. If DNS failover is not setup correctly we expect an outage to occur.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.sTART YOUR TRIAL
How to troubleshoot unschedulable Pods in Kubernetes
Kubernetes is built to scale, and with managed Kubernetes services, you can deploy a Pod without having to worry...
Kubernetes is built to scale, and with managed Kubernetes services, you can deploy a Pod without having to worry...Read more
How to fix Kubernetes init container errors
One of the most frustrating moments as a Kubernetes developer is when you go to launch your pod, but it fails to start…
One of the most frustrating moments as a Kubernetes developer is when you go to launch your pod, but it fails to start…Read more