Systems fail, sometimes publicly and at great cost. Airlines have experienced system-wide ticketing outages, causing hundreds of flight cancellations and significant inconvenience to customers. Retailers have experienced website crashes on the busiest shopping days of the year, costing millions in lost revenue and customer goodwill.
It is vital to understand both DevOps and SRE and the roles they play in preventing such outages.
Can we prevent outages in an era of such great velocity? We have gone from annual software releases to daily releases, from running software as a monolith to running hundreds of microservices, from on prem hosting on hundreds of physical hosts to Kubernetes, containers, and cloud hosts numbering sometimes into the hundreds of thousands.