The Chaos Engineering Platform for AWS

Everything you need to safely, securely, and simply run Chaos Engineering experiments on AWS.
Get a demo
Charter CommunicationsGrubhubNABSASShiptTargetTwilioWalmartWorkiva
Charter CommunicationsGrubhubNABSASShiptTargetTwilioWalmartWorkiva
You can’t consider your workload to be resilient until you hypothesize how your workload will react to failures, inject those failures to test your design, and then compare your hypothesis to the testing results.
AWS Reliability Pillar Announcement

Continuous Validation of the AWS Well-Architected Framework with Chaos Engineering

Applying Chaos Engineering to AWS workloads

Build your infrastructure to conform to the well-architected framework easier and faster using Gremlin

Compute and containers

  • Ensure your EC2 instances are right-sized and autoscale at the appropriate time to save money and meet demand
  • Prepare for networking issues and error handling in your Lambda functions
  • Test for common Kubernetes failure modes prior to migration

Storage, database, and analytics

  • Prevent single points of failure for your databases and S3 buckets
  • Ensure your cache, such as DAX, are properly configured to be the first stop for data and are prepared to failover to the database
  • Confirm your distributed databases meet ACID properties under poor network conditions and node failure


  • Meet disaster recovery testing standards by experimenting with Availability Zone and Region degradations or unavailability
  • Prepare Route53 to act as your primary or secondary in the event of a DNS outage
  • Test your DirectConnect failover to prevent a single point of failure

Management, governance, and integration

  • Tune your CloudWatch monitoring to reduce the time to detection
  • Train your teams to leverage CloudWatch and X-Ray to triage and fix issues faster
  • Correlate common failures with their end user impact using EventBridge and Synthetics

Discover more about Gremlin for AWS

AWS Advanced Technology Partner