Container redundancy

Description

Test resilience to container failures by shutting down a randomly selected container. Verify that your container runtime automatically restarts or replaces it.

What this Scenario does

This Scenario shuts down a randomly selected container, simulating an unexpected container failure. This tests whether your container runtime's restart policies and health check mechanisms detect the failure and recover automatically.

‍

Why run this Scenario?

This Scenario uses the same principle as Chaos Monkey: if a host or container shuts down unexpectedly, the underlying platform should detect this and automatically restart or replace it.

Validate that container restart policies are configured correctly and recover failed containers within acceptable timeframes.
Verify that load balancers or service meshes route traffic away from the failed container during recovery.
Test that container health checks detect the failure and trigger restarts appropriately.
Confirm that container-level failures don't cascade into broader application outages.

‍

Expected outcome

When a container fails, the container runtime or orchestrator automatically restarts or replaces it, and traffic is routed to healthy replicas.