WEBINAR

Improving Resilience for GenAI Workloads on AWS

Join AWS and Gremlin for a roundtable discussion showing you best practices for improving the reliability of GenAI applications. Using real architectures from AWS customers, we'll walk you through potential reliability issues, as well as the best practices and tests you can use to make sure your AI applications are reliable, resilient, and available.

On-Demand

Thank you for registering! Click here to watch the recording.

GenAI can do incredible things, but like any technology, its success depends on how we implement and use it. Without proper implementation, GenAI failures can pose significant risks to your organization's reputation and customer trust, leading to real financial impact. And like any other application, regulatory rules, SLAs, and reliability standards still apply to GenAI.

With more companies integrating GenAI into their systems and products, it’s essential to make sure GenAI workloads and applications are highly available to deliver an exceptional user experience.

But how do you actually do that? How is it different than standard resilience efforts, and how is it the same?

In this webinar with AWS and Gremlin, we go over how customers are using GenAI workloads on AWS, how the reliability pillar best practices of the Well-Architected Framework apply, and what you can do to improve the resilience and uptime of your GenAI-related workloads.

Includes a live demo of how to test the resilience of GenAI workloads with Gremlin.

Key Takeaways:

–Common GenAI architectures
–Reliability best practices specific to GenAI workloads—both managed and unmanaged
–Standard reliability practices that you can apply beyond GenAI

About the speakers

Andre Newman

Sr. Reliability Specialist
Gremlin

At Gremlin, Andre promotes the benefits of Chaos Engineering and reliability testing to engineering teams around the world, including at some of the largest enterprise organizations. Prior to Gremlin, he created technical content explaining Kubernetes and containerization, the shift to cloud computing, DevOps, observability, and more. His work has been featured in The New Stack, DZone, Software Engineering Daily, TechBeacon, and StatusCode Weekly.

Dylan Souvage

Partner Solutions Architect
AWS

Dylan is a Partner Solutions Architect at Amazon Web Services (AWS) based in Austin, TX. Dylan has spent his career working with ISV and startup customers to understand their business needs and enable them in their cloud journey. In his spare time, he enjoys going out in nature, going on long road trips, and traveling to warm, sunny places.

Shivam Patel

Partner Solutions Architect
AWS

Shivam is a Partner Solutions Architect at Amazon Web Services (AWS) based in New York, NY. He is a passionate technologist who has experience supporting Enterprise & SMB customers. Shivam combines his background in R&D and business experience to meet customer needs through leveraging AWS. Outside work, Shivam enjoys being an avid mountaineer, running, weightlifting, and reading about psychology/philosophy.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Product Hero ImageShape