
Gremlin’s unofficial Microsoft Ignite 2025 reliability track
Microsoft Ignite is next week in San Francisco! The 4-day conference includes over 1,000 different sessions covering everything Microsoft and Azure. Obviously you can’t attend all the talks, but to help narrow your options we put together our Unofficial Ignite 2025 Reliability Track!
Each of these talks digs into how to build resiliency into your systems. Don’t forget to stop by and see Gremlin at Booth #1629 to see how to pair your learnings with reliability testing to build true confidence in your systems.
Using Azure to tools to improve reliability
Start, Get and Stay Resilient with Azure
Tue, Nov 18 | 4:30 PM - 5:45 PM PST | LAB520
About: Understand the Start, Get, and Stay Resilient journey. Get equipped with tools & insights to architect mission critical applications with Azure’s Resiliency and Configuration experiences. Assess your resiliency posture, apply recommendations, validate your posture and orchestrate recovery. With the Essentials Machine Management bundle from Azure, manage and maintain the state of your resources, enforce configurations across devices and ensure resilience is not a one-time goal but an ongoing state.
Why we care: One of the biggest causes of outages for cloud-based applications is misconfigurations, such as autoscaling or multi-zone replication being disabled. It’s always a good idea to keep track of the latest and greatest to make sure your applications are resilient.
—
Resilience by design: Secure, scalable, AI-ready cloud with Azure
Thu, Nov 20 | 1:00 PM - 1:45 PM PST | BRK217
About: Resiliency is foundational. Explore how resiliency on Azure enables secure, scalable, AI-ready cloud architectures. Learn to set resilience goals, simulate failures, and orchestrate recovery. See live demos and discover how shared responsibility empowers teams to deliver trusted, resilient outcomes.
Why we care: AI applications are quickly becoming crucial parts of modern systems, ones that usually represent substantial investments of time and money. In some ways, keeping them reliable is like with any applications, but in other ways they’re very much their own beast with unique hurdles and best practices.
—
Resiliency and recovery with Azure Backup and Site Recovery
Fri, Nov 21 | 9:00 AM - 9:45 AM PST | BRK146
About: Learn how to protect, detect, and rapidly recover your most critical workloads—across Azure VMs, databases, file shares, and cloud native applications like AKS. Modernize backups with secure by default protection and immutable storage, use intelligent signals, and orchestrate clean, at scale restores. We’ll cover threat aware backups, container aware protection, and one click DR to help you recover with confidence while meeting your RPO/RTO goals and compliance needs.
Why we care: With outages, it’s really not a matter of if they’ll happen, but when. Every application owner needs to make sure their disaster recovery plans are in place. And remember: it’s not really a backup unless you test it!
Reliability architecture best practices
Architecting for resiliency on Azure Infrastructure
Thu, Nov 20 | 1:00 PM - 1:45 PM PST | BRK178
About: Discover how to build resilient cloud solutions on Azure by leveraging availability zones, multi-region deployments, and fungible products. This session explores architectural patterns, platform capabilities, and best practices to ensure high availability, fault tolerance, and business continuity for growth of mission-critical workloads in dynamic and distributed environments.
Why we care: Reliability should be designed into applications from the beginning, so we’re all about learning more best practices to do that. And once the applications are built, you can use reliability test suites to make sure applications comply with your architectural standards.
—
Architect resilient apps with Azure backup and reliability features
Thu, Nov 20 | 3:30 PM - 4:15 PM PST | BRK148
About: Learn to use self-serve tools to strengthen zonal resiliency for critical workloads. Assess and validate resilience across VMs, DBs, and containers. Explore enhanced data and cyber resiliency with immutability and threat detection to guard against ransomware. Discover expanded workload coverage and real-time insights to proactively protect your applications and infrastructure.
Why we care: Recent outages have definitely highlighted the importance of multi-zone redundancy and other resiliency best practices. Led by experts in Disaster Recovery, this session promises plenty of important best practices to help make your systems more resilient.
Tales of building reliable systems
Windows in the Cloud: Resilience Meets Productivity
Thu, Nov 20 | 4:00 PM - 4:30 PM PST | THR787
About: Disruptions happen—whether from device failures, cyberattacks, or unexpected outages—but downtime doesn’t have to. Join us to learn how Windows in the cloud delivers end-to-end resiliency, and discover how Windows 365 Reserve helps organizations rapidly recover, keep employees productive, and maintain business continuity—even in the face of disruptions.
Why we care: Windows is an essential part of business across the world, which means companies need to be able to rely on it. This talk promises insights into how they minimize downtime even in the face of disruptions, and some tips for how you can, too.
—
Azure Innovations Elevate Healthcare Resiliency and Performance
Tue, Nov 18 | 2:00 PM - 2:30 PM PST | THR796
About: Discover how Azure’s latest innovations in compute and storage help healthcare organizations reduce downtime, accelerate refresh cycles, and unlock new performance & scalability tiers for mission-critical workloads like EHR systems and more.
Why we care: Healthcare is an industry loaded with high-uptime, mission-critical workloads with some very specialized use cases. Reliability is non-negotiable for these applications, so there’s some great lessons and best practices to be learned!
——
Looking forward to seeing all of you at the Moscone Center! Want to set up time to chat? Drop us a line and our team will reach out!
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
sTART YOUR TRIALSee Gremlin in action with our fully interactive, self-guided product tours.
Take the tourHow to be prepared for cloud provider outages
Check out these testing best practices teams should follow to minimize the impact of cloud provider outages so they don’t catch you by surprise.


Check out these testing best practices teams should follow to minimize the impact of cloud provider outages so they don’t catch you by surprise.
Read more