Gremlin Scenario: Windows host redundancy

Description

Test resilience to host failures by shutting down a randomly selected Windows host. Verify that your platform automatically restarts or replaces it.

What this Scenario does

This Scenario shuts down a randomly selected Windows Server host, simulating an unexpected host failure. This forces your infrastructure to detect the failure and initiate recovery through Windows failover clustering, cloud auto-scaling, or load balancer health checks.

‍

Why run this Scenario?

This Scenario uses the same principle as Chaos Monkey: if a host or container shuts down unexpectedly, the underlying platform should detect this and automatically restart or replace it.

Validate that Windows Server instances restart and rejoin the cluster within acceptable timeframes.
Verify that Windows services exit gracefully and restart cleanly after an unexpected shutdown.
Test that load balancers automatically route traffic away from the failed Windows host.
Confirm that Windows failover clustering promotes a secondary node when the primary fails.

‍

Expected outcome

When a Windows host fails, the cloud platform or infrastructure automatically restarts or replaces it, and workloads migrate to healthy instances.