Abstract Incident Reviews in Software have a tendency to rely on seemingly satisfying yet shallow and misleading oversimplifications, hiding from us that what we think is ‘the meat of it’ is rather an empty bite.
Resilience Engineering warns us about boiling down complex situations to simple explanations. Software has built up a popular culture that is particularly vulnerable to this where there is little patience for anything not easily attributed to measurable outcomes.
Right now, organizations operating at the edge of the envelope want to grow their capabilities, but aren’t sure where to invest in learning the messy details of how to outmaneuver the complexity penalties.
Responders to incidents in software are riding that cloud of instability. In fact, they make it look easy. Really easy. Problem is–the sky is cloudy with a chance of meatballs.
Bio Ryan Kitchens is a senior engineer on the CORE team at Netflix, where he works on building capacity across the organization to ensure its availability and reliability.