Interactive Assessment

Are You Building a Best-in-Class Reliability Program?

Based on Gremlin’s work with reliability leaders across the Fortune 100, we’ve identified the 18 traits across four pillars shared by best-in-class reliability programs. Use the checklist below to see how you stack up.

Your progress

0 / 18

Pillar 1

Leadership & Strategy

Best-in-class reliability programs move beyond ad hoc testing by building a defined strategy grounded in clear goals, executive buy-in, and company timelines.

Pillar score

Define clear, specific missions and goals

You should know what you're working towards, which services you're targeting, milestone timelines, and the reliability levels you aspire to.

Identify initial timelines or mission-critical dates

You should have reliability policies and compliance standards mapped to key business events, like peak traffic events, product launches, or migrations.

Focus on goals that are proactive, not reactive or chasing incident response

Successful reliability programs should be built around getting ahead of and preventing incidents, and should be differentiated from incident response efforts.

Define clear accountability stakes and secure visible interest from leadership

Everyone involved should understand how they’ll be held accountable for their part in the program, and they should be able to see leadership’s interest in the program.

Establish milestone-based review and celebration

Reliability programs should regularly track progress against reliability targets and timeline goals, and when those goals are hit, they should be celebrated.

Pillar 2

Ownership & Accountability

A reliability program needs to drive action to be effective, which means you need clear ownership and ownership handoffs, including who addresses uncovered risks.

Pillar score

Identify your program owner

Reliability programs should have a single owner who takes responsibility for the program and is empowered to make sure it happens.

Centralize ownership for baselines, testing, and reporting

You should have a central owner who can define baselines, enable testing, and manage reporting for consistent standards across your organization.

Decentralize ownership for system improvements

Individual teams know their systems best and should take ownership of making sure any risks are addressed, verifying fixes, and reporting back.

Create ownership handoff processes

Reliability programs should be able to adapt to organizational changes, so they need clear processes for when service or program ownership changes to minimize disruption.

Pillar 3

Measurement & Metrics

A successful reliability program needs metrics that establish a baseline related to business value, show changes in reliability, and share them with the broader organization.

Pillar score

Define the background behind the program

You should have a document that quantifies the impact of downtime, and every stakeholder should have reviewed this data so you all know the stakes and are on the same page.

Set up consistent and regular reliability measurement and normalized scoring

Reliability programs should be built around standardized metrics, such as a reliability score, that measure and track reliability changes across your organization.

Record your progress against your goals

All of your reliability metrics should be stored and tracked over time so everyone, from leadership to individual teams, can see trends over time and make informed decisions.

Tie high-value golden signals to business metrics

The metrics used to determine the pass/fail for reliability metrics should be tied directly to mission-critical aspects of your company, such as storefront downtime or failed transactions.

Pillar 4

Process & Policies

Reliability isn’t a one-time switch. It requires the right processes and policies paired with clear ownership that creates accountability and clarity to drive results.

Pillar score

Build a catalog of services, their owners, and the impact of disruption

You should have an updated catalog of all relevant services, including their criticality and owners, to prevent surprises and ensure nothing falls between the cracks.

Establish regular progress reviews

You should set regular meetings, such as biweekly, with program owners, leadership, and service owners to review recent changes and progress toward reliability goals.

Document new service onboarding

You should have a clear, repeatable onboarding process for new services that prevents coverage gaps and incidents from untracked services.

Define response to services falling out of compliance

Reliability programs should have a clear workflow for detecting, responding to, and correcting services that drop below set reliability levels.

Define response to services coming into compliance

Teams should be visibly celebrated when their services hit reliability compliance goals to keep them engaged and encourage other teams.

Talk to a reliability expert

Your Reliability Maturity Score

of 18

Leadership & Strategy

0/5

Not started

Ownership & Accountability

0/4

Not started

Measurement & Evidence

0/4

Not started

Process & Governance

0/5

Not started

Top Recommendations

Talk to a reliability expert

Are You Building a Best-in-Class Reliability Program?

Your Reliability Maturity Score

Get your personalized results