Today, we are excited to announce the availability of two new features of our platform :
- Chaos Charting to visualize the effect of Gremlin’s experiments as they happen
- On-Demand Demos for Gremlin Free Edition to practice Chaos Engineering in a free sandbox environment
In this post we’ll review the value that these new capabilities bring to new and existing users.
Chaos Charting allows all Gremlin users to visualize the impact of Shutdown and CPU attacks on selected targets as they happen from within the Gremlin UI. This helps us understand how turbulent conditions affect the customer experience of our applications.
Until today, to see what was happening under the covers during an attack, teams had to swap over to a monitoring tool. Chaos Charting eliminates that extra step and setup making it easy and immediate to understand by automatically visualizing it within the Gremlin UI.
Customers can, of course, opt out of the attack visualizations if they have any data privacy concerns.
On-Demand Demos are a great option for first time users who want to start their journey with Chaos Engineering. When users log in to Gremlin Free for the first time, we’ll spin up a safe demo sandbox environment where they can run chaos experiments and observe the results right away. Using Chaos Charting, they will be able to see the impact of experiments and visualize how the system responds.
On-Demand Demo environments are free Linux hosts that we spin up automatically in AWS EC2 with the pre-installed Gremlin Agent. The first time a new user logs in to the Gremlin Webapp they can watch the icon in the header to know when hosts are ready to attack.
As soon as your On-demand Demo environment is ready, you can leverage Gremlin’s Recommended Scenarios to simulate a real outage. With just a couple of clicks we can run our first chaos experiment and observe a system’s response to failure in real time.
Gremlin Free users have access to three common failure Scenarios; Validate Auto-Scaling, Prepare for Host Failure, and Unavailable Dependency. Validate auto-scaling adds load to a host to make sure it properly scales by spinning up a new host. Preparing for host failure shuts down a percentage of our hosts to make sure your application can handle this common situation, for example after a cloud migration. Unavailable dependency simulates what happens when a service becomes unresponsive and how this might affect other downstream services, or the whole application.
Simply click “Run with Gremlin Host”, select the hosts you’d like to attack, and run the Scenario to see how the demo hosts respond.
We wanted Free users to have immediate access to Linux hosts running the Gremlin agent. To achieve this, we built a demo host management service in Go named Vivarium.
On the first login, the Gremlin Webapp alerts Vivarium that demo hosts are needed. Vivarium uses Terraform to spin up a pair of temporary Amazon EC2 instances on a single-purpose AWS account. The demo hosts install Gremlin and authenticate to our control plane.
Two hours later, Vivarium’s reaper will mark these instances as expired and destroy them with Terraform. One particularly cool engineering note is that Vivarium grew out of a Gremlin engineering offsite hackathon earlier this year!