Today, Gremlin is introducing Gremlin for AWS, a suite of tools to more easily find and fix the reliability risks that cause downtime on AWS. 

The cloud opens up a range of reliability challenges that didn’t exist before, especially for customers running distributed, mission-critical workloads. Teams experience the pain of failed migrations, frequent incidents, and reliability toil, but often struggle to modernize their approach to reliability as they modernize their infrastructure.

That’s where Gremlin for AWS can help. 

Gremlin for AWS enables engineering teams on AWS to prevent incidents, monitor and test systems for known causes of failure, and gain visibility into the reliability posture of their applications.

Of course, Gremlin already supports AWS, and the majority of our customers run on AWS. But with the new capabilities, organizations can capture the benefits of a modern reliability practice with 90% less effort. 

So what’s included? Read on for details. 

Monitor service health with Gremlin Intelligent Health Checks

Gremlin’s new Intelligent Health Checks enable teams to understand if a service is healthy or at risk without relying on third-party tools. 

Health Checks are how Gremlin tracks the health of your services (or workloads, in AWS language) before, during, and after reliability testing. Normally, Health Checks require you to have a pre-existing monitoring or observability tool set up and linked to a service in Gremlin. With Intelligent Health Checks, click a box and we’ll create and integrate these for you.

Intelligent Health Checks can be enabled for any service behind an AWS ELB. Gremlin will find three of the service’s metrics—throughput, latency, and error rate—and monitor these metrics to understand your service’s baseline performance. When you run a reliability test, Gremlin continuously compares each metric’s current levels against its baseline to determine whether the service is healthy. If they’re significantly different, Gremlin halts the test and returns your service to its normal operation.

Intelligent Health Checks makes it easy to quickly and accurately start testing the reliability of services, and extend testing to environments where observability tools aren’t already deployed, such as pre-production. They can be used on their own or alongside your existing observability tools. 

Test to cloud-best practices with the Well-Architected Cloud Test Suite

Cloud providers often publish guidance on how to build applications that take full advantage of the platform’s features. Whether published by AWS, Azure, or Google Cloud, these “Well-Architected Frameworks” offer recommendations, but sometimes lack concrete steps for engineers to take and ways to prove systems behave as intended.

To help teams effectively govern to well-architected reliability principles and AWS best practices, we’re introducing the Well-Architected Cloud Test Suite. This suite builds upon Gremlins’ Recommended Tests that cover various redundancy, scalability, and dependency tests, with two brand new tests: I/O, which tests whether your services can handle a spike in input/output operations per second (IOPS); and DNS, which tests whether your services can successfully fail over to a secondary DNS provider.

Available for all users on AWS and other cloud platforms, the Well-Architected Cloud Test Suite comes ready out of the box to help teams know how their services stand up against cloud best practices and quickly spot issues that are likely to cause downtime in the future. And like all of Gremlin’s test suites, it can be modified to best suit your needs. 

Uncover AWS-specific issues with new Detected Risks

Building on our introduction of Detected Risks for Kubernetes last year, Gremlin now monitors AWS load balancer configurations for some of the most common causes of incidents and outages on AWS infrastructure. 

We’ve added three new AWS-specific Detected Risks to help ensure that your AWS workloads are redundant and accident-resistant:

  • Availability zone redundancy checks if an Application, Network, or Gateway load balancer is mapped to multiple availability zones.
  • Cross-zone load balancing checks whether you have cross-zone load balancing enabled on this service, improving your application’s ability to handle the loss of one or more instances.
  • Deletion protection checks that your load balancer has the “deletion protection” flag enabled to avoid accidental deletion.

Detected Risks are enabled by default and can quickly show teams risks unintentionally introduced through configuration drift and non-best practices—without running fault injection tests. 

Effortlessly onboard your AWS-based services

Gremlin for AWS introduces a new workflow to onboard services into Gremlin: we now automatically discover services running on AWS so you can onboard new teams and services painlessly. 

Just deploy the Gremlin agent, grant Gremlin IAM access, and choose which services you want to onboard in just a few clicks. You don’t need to manually define the services yourself, or add annotations to your manifests when using EKS. You’ll be able to start running reliability tests in minutes. 

The result: every team can build resilience on AWS

Modern reliability practices that prevent incidents and outages, including resilience testing and Chaos Engineering, are seen as expensive, time consuming, and out of reach for many organizations. Gremlin for AWS changes that by putting the tools of advanced teams into the hands of every engineer and automating much of the heavy lifting previously required. Gremlin for AWS helps teams onboard quickly, know what to test, see how tests pass or fail, remediate issues and get instant feedback, and see the reliability posture of their services, teams, and overall organization improve. 

Try it today

Gremlin for AWS is available to all users today. Try it free for 30 days at or in AWS Marketplace. You can also connect with our team for a personalized demo. 

No items found.
Ryan Detwiller
Ryan Detwiller
Director of Product Marketing
Start your free trial

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.

Close Your AWS Reliability Gap

To learn more about how to proactively scan and test for AWS reliability risks and automate reliability management, download a copy of our comprehensive guide.

Get the AWS Primer