Simulate real-world scenarios that can impact performance, uptime, and customer experience. Run pre-built scenarios based on actual outages and be sure your system is resilient to common cloud failures.
Verify that your autoscaling works
Prepare for host failure
Handle a slow, unreliable dependency
Perform zone and region evacuations
Validate your capacity plan
Build and share your own Scenarios
Configure scenarios based on common outages.
Chain attacks together
Scale the impact magnitude
Increase the blast radius
Safely scale the impact of your experiments
Scenarios provide you the ability to divide your attacks into incremental steps to mitigate the risk of complex experiments.
Dial up the blast radius over time
Increase the magnitude
Hypothesize and observe
Record your hypothesis, observe, and record the results of your experiments so you can take action and improve the reliability of your system.
Track, share, and schedule experiments
Follow how your experiments perform over time to prevent the drift into failure. Status Checks prevent scheduled experiments from running when the system is in an unsteady state.
Chaos Engineering on
Gain confidence in the reliability of your Kubernetes clusters and train your team.
Choose objects to target
1. Choose a cluster
2. Choose a namespace
0 of 2 selected
0 of 1 selected
0 of 2 selected
0 of 5
Be confident in the reliability of your Kubernetes clusters
Filter and control access by cluster and namespace to easily find and harden specific Kubernetes objects
Prevent noisy Pods from bringing down your application
Ensure you can withstand common Kubernetes failure modes including CPU throttling, DNS issues, and Blackholes
Confidently operate Kubernetes in production and prevent downtime
Validate your self-healing and orchestration
Be sure your app autoscales as expected
Find out what happens when you unexpectedly lose Pods - are your customers negatively impacted?
Develop quickly and safely using Kubernetes
Verify your Kubernetes migration is regression free
Identify critical bugs lurking within your clusters before they cause an outage
Share what you learn with the rest of your organization