The Art of Embracing Failure at Scale

Adrian Hornsby

Amazon Web Services

@adhorn

Abstract Mistakes. Bad judgment. Errors. Failures. They are all part of our engineering lives. While many think of them as being undesirable aspects of engineering, failures are very important, and even- beneficial. One thing that is sure is that failures will happen and will come in many forms, some expected, and some unexpected. It’s therefore important to embrace failure. The question is how to limit its blast-radius? In this talk, I will discuss a range of blast radius reduction design techniques used at AWS and by our customers, including isolation, bulkheads, cells, and sharding. I will also discuss how embracing failure infuses impact our operational practices.

Bio Adrian is a principal technical evangelist at Amazon Web Services and is based in the Nordics. He has over 15 years of experience in the IT industry, having worked as a software and systems engineer; a backend, web, and mobile developer; and part of DevOps teams where his focus has been on cloud infrastructure and site reliability, writing application software, deploying servers, and managing large-scale architectures. The truth is that Adrian loves breaking stuff—controlled chaos and resiliency is his thing. Adrian frequently speaks at conferences and community meetups and blogs at https://medium.com/@adhorn.

Back to Videos

Join our mailing list:

Be the first to know all the current REdeploy happenings!

* indicates required
;