Linux host redundancy
Description
Test resilience to host failures by shutting down a randomly selected Linux host. Verify that your platform automatically restarts or replaces it.
What this Scenario does
This Scenario shuts down a randomly selected Linux host, simulating an unexpected host failure. This forces your infrastructure to detect the failure and initiate recovery—whether through auto-scaling groups, load balancer health checks, or manual failover processes.
Why run this Scenario?
This Scenario uses the same principle as Chaos Monkey: if a host or container shuts down unexpectedly, the underlying platform should detect this and automatically restart or replace it.
- Validate that Linux instances restart within a reasonable timeframe and workloads successfully migrate to healthy hosts.
- Verify that load balancers automatically route traffic away from the failed Linux host.
- Test that losing a critical node (such as a Kafka broker or database primary) doesn't cause a split-brain scenario.
- Build the same confidence as Netflix's Chaos Monkey approach: if a host shuts down unexpectedly, the platform handles it automatically.
Expected outcome
When a Linux host fails, the cloud platform or infrastructure automatically restarts or replaces it, and workloads migrate to healthy instances.