16th & 17th
• • •
On Sale Now!
Thursday, 9:55 AM
Collaboration, Coordination, Co-Design CoEvolution: Challenges to Resilience in Concurrent SocioTechnical System Design
Abstract Much has been made of the requirement for teams to be collaborative and we all know teams need to coordinate, whether it be during sprint planning or incident response. But are "collaboration" and "coordination" the same thing?
And just how do these two activities fit into the evolution of our work systems, both the technical aspects and the social structures? How do we reason about and influence this over longer timespans? Can we? This talk introduces the ideas of co-design and co-evolution, both of which shed light onto the answer.
Bio Jabe Bloom has been working to align design, innovation, development, and operational excellence in organizations for more than 20 years. He is an experienced executive leader of software and product development companies, serving in numerous executive roles including: Chief Architect, Principal Technical Director, Chief Technical Officer, Chief Executive Officer, and Chief SocioTechnical Officer.
As an academic and international consultant, Jabe teaches students and clients design, strategy, innovation and flow thinking through a series of lectures, workshops, classes, and coaching.
Currently pursuing a Ph.D. in Design Studies at Carnegie Mellon University (PA), his research focuses on understanding how temporality can inform the field of Transition Design. His academic research informs an ongoing exploration of the practice of design and strategy with a select group of international clients. He is a founder of and the Chief SocioTechnical Officer at PraxisFlow, a consultancy focused on applying scientific and design research methodologies to enable exploration, increase flow, improve software engineering and create operational excellence.
Wednesday, 1:30 PM
Document Yourself: A Framework for Career Advancement
Abstract The goal of this workshop is to document yourself the way you would document code. You wouldn’t expect someone who wants to use the program you built to read every line of code. Instead, they’re relying on the design documents and doc strings to know how it works. The same is true with your career. This workshop is about making it easy for you to provide overwhelming evidence of your value to the company. When you can show your ROI, it’s much easier to secure that promotion, raise or new job that you deserve.
This workshop consists of 3 parts. Writing your daily accomplishments in the form of success statements. Putting them together into a brag sheet. And finally using them to create your elevator pitch. Using this framework makes it easy to make a habit of documenting, the same way a style guide helps you document your code. Once you have documented yourself, you will be amazed at how much you have accomplished. You will walk out of the workshop with the confidence and plan to take your next step.
Bio Michelle is currently a Senior Backend Engineer in food tech, helping restaurants thrive. She has 9+ years experience, from the front lines of support to managing a team. While spending most of her career in entertainment tech, she worked tirelessly to help movies and television get made faster and cheaper.
A Philadelphia native, she is an art school graduate and a self-taught Python developer. She also runs a tech podcast called From the Source, interviewing working professionals with a focus on underrepresented voices, to answer the question of what tech jobs are really like. Michelle works to promote diversity and inclusion in tech through conference speaking and organizing, mentoring, board membership and making sure everyone knows they belong here.
Wednesday, 9:10 AM
A Few Observations on the Marvelous Resilience of Bone and Resilience Engineering
Abstract Dramatic and mundane examples of resilience have encouraged a search for resilience engineering. The possibility of deliberately exploiting or enhancing resilience is tantalizing. But what exactly is resilience engineering? There are at least two possible forms: first, engineering that exploits what we know about resilient systems and, second, engineering that shapes resilience itself. Bone is useful example of resilience and both types of resilience engineering. The example shows how resilience and resilience engineering are related and also what resilience engineering in other settings might entail.
Bio Richard Cook is a physician, researcher, and educator. He is a co-author of “Operating at the Sharp End: The Complexity of Human Error”, “Adapting to New Technology in the Operating Room”, A Tale of Two Stories: Contrasting Views of Patient Safety, “Gaps in the Continuity of Care and Progress on Patient Safety”, and “‘Going Solid’: A Model of System Dynamics and Consequences for Patient Safety” as well as numerous book chapters and other writings. He is the recipient of the 1999 Peter Kiewit Memorial Award of The Annenberg Foundation and the 2001 McGovern Medal for Medical Writing of the American Medical Writers Association.
Wednesday, 10:50 AM
Getting Comfortable With Being Underwater
Abstract When we talk about resiliency, we often overlook the fact that a good amount of risk in the environments we operate in is an active choice instead of an unfortunate byproduct of the systems we build. If you dig into the failure modes of technical diving and other high-risk/high-consequence fields, it would be a fair conclusion that no one should do them because it’s impossible to make it perfectly safe to do so. And yet we collectively do continue pursuing these activities, just as we continue to developing in complex systems instead of halting new development to focus purely on making them safer. Why? Because finding the sweet spot of the risk versus resilience allows us to achieve more ambitious goals, learn more, and keeps boredom and complacency at bay.
This talk will address the external and internal pressures that teams are under which contribute to them being in uncertain or high-risk environments and the benefits and risks of operating teams in that state by drawing parallels from deep sea technical diving.
Bio Ronnie Chen is an engineering manager at Twitter. She is a deep sea technical diver and was also the sous chef of a Michelin-starred restaurant in a previous life.
Wednesday, 11:30 AM
The Practice of Practice: Teamwork in Complexity
Abstract This session will unravel the methodology around how we humans come together and operate complex software systems by taking a closer look at intuition through the eyes of performing in a music ensemble. It will introduce the concept of Fundamental Common Ground Breakdown and how it interrupts our efforts to collaborate and respond to events and incidents. A Chaos Engineering Game Day walkthrough will show that intuition is not an act of instinct, but a developed ability based on careful analysis and practice. By relating my own direct experiences in both performing music and running distributed systems, I will show how being inspired by working together in tech is a thing, just like playing in a band.
Improvising musicians develop a deep intuition built around internalizing the materials and form of their genre – like scales, chord changes, or rhythmic structures. It can be directly compared to the mental map that engineers develop when writing software and understanding complexities. Each member of an ensemble have their subjective view on relevant (but overlapping) parts of the system and are challenged when relating each other’s substrate to theirs. Musicians are prime examples that the more we come together and share our perspectives to further understand a complex system, the better we know how to bolster its resilience to uncertainty.
Because systems become more complex as they grow, shrinking the capacity of any one person to comprehend the whole thing, we depend heavily on shared and discovered knowledge. When joint activities in complexity fail due to assumptions that participants share the same knowledge, Fundamental Common Ground Breakdown rears its dragon-like head, making it difficult to move the activity forward. Whether it be during an incident or improvising jazz, part of the game is learning how to harmonize these separate threads of experience, with the emphasis that what goes right in a complex system is just as valuable as what goes wrong.
Bio Matt has a passion for exploring the relationships between the artistic mind and operating distributed computer architectures, with experience in a wide variety of fields including data center operations, storage, distributed data, and site reliability. In addition to embracing complex systems and chaos engineering, Matt creates music with DIY synthesizers and spins eclectic all vinyl DJ sets. He writes at sounding about music and technology.
Wednesday, 2:15 PM
Resilience Engineering Mythbusting
Abstract How confident are you in your prod servers staying up without your help? Too often in tech we mistakenly interchange three important concepts when describing our socio-technical systems: how resilient they are, the reliability they exhibit in day to day work, and how robust they are under duress. Though interrelated, they are not equivalent.
How can we successfully gain insights in post incident reviews, execute chaos engineering experiments, and build scalable infrastructure if we’re misinterpreting our approaches? By separating out these core concepts, we can isolate better approaches in adapting to unforeseen circumstances. We’ll look at common misconceptions when describing our systems as resilient and focus on proven methods to help us improve our understanding of our systems.
Bio Will Gallego is a systems engineer with 15+ years of experience in the web development field, currently as a Senior Engineer at Fastly. Comfortable with several parts of the stack, he focuses now on building scalable, distributed backend systems and tools to help engineers grow. He believes in a free and open internet, blame aware retrospectives, and pronouncing gif with a soft “G”.
Thursday, 11:30 AM
Approaching Overload: Automation as Fellow Responders
Abstract Software systems operate at an unprecedented scale today, requiring extensive automation to develop and maintain services. The systems are designed to regularly adapt to dynamic load, though anomalies inevitably challenge Site Reliability Engineers and incident responders’ ability to mitigate and manage saturation. As the systems scale and complexity grows, it becomes more difficult to observe, model, and track how the systems function and malfunction. Black box artificial intelligence and machine learning technologies promise to solve future issues, but in isolation, will fail. Can we still work together with this evolving generation of tools to manage overload?
This talk will explore cases showing how people and machines can act as cognitive agents in resilient, joint systems. A different framing can help us design tools and artificial agents smarter.
Bio Marisa Grayson is a Cognitive Systems Engineer at Mile Two LLC, where she manages and envisions the design of human-machine systems. Marisa recently received her Masters of Science in Industrial and Systems Engineering from the Ohio State University. She is a member of the SNAFU Catchers Consortium and an alumni of the Cognitive Systems Engineering Lab. She has performed research in healthcare, defense intelligence, and distributed software systems. Her work spans user-experience design, data analysis, and complex resilient systems. In 2019, Marisa won the HFES Healthcare App Design Competition and won awards at the 2019 Swarm and Search AI Challenge.
Thursday, 10:50 AM
Abstract In the aftermath of John Allspaw’s influential Blameless Post-Mortems, blamelessness has become a shibboleth for modern production operations teams: Is our culture blameless? Are our incident reviews blameless? But it seems that something has been lost in translation.
Organizations that try to implement blamelessness without understanding blame end up making behaviors and activities taboo because they can lead to blame even when they can lead to learning instead. When we throw the baby of learning out with the bathwater of blaming, we miss out on vital opportunities to become more resilient.
Bio Rein is a strange sort of software developer who spends more time thinking about systems made with people than systems made with computers. He believes that most technical problems are really people problems, and that people problems can be solved by listening, caring, and empowering others. Talking about himself in the third person makes him uncomfortable, but he is working on it. He also wrote a database in Haskell once, so he has that going for him, which is nice.
Wednesday, 9:55 AM
The Art of Embracing Failure at Scale
Abstract Mistakes. Bad judgment. Errors. Failures. They are all part of our engineering lives. While many think of them as being undesirable aspects of engineering, failures are very important, and even- beneficial. One thing that is sure is that failures will happen and will come in many forms, some expected, and some unexpected. It’s therefore important to embrace failure. The question is how to limit its blast-radius? In this talk, I will discuss a range of blast radius reduction design techniques used at AWS and by our customers, including isolation, bulkheads, cells, and sharding. I will also discuss how embracing failure infuses impact our operational practices.
Bio Adrian is a principal technical evangelist at Amazon Web Services and is based in the Nordics. He has over 15 years of experience in the IT industry, having worked as a software and systems engineer; a backend, web, and mobile developer; and part of DevOps teams where his focus has been on cloud infrastructure and site reliability, writing application software, deploying servers, and managing large-scale architectures. The truth is that Adrian loves breaking stuff—controlled chaos and resiliency is his thing. Adrian frequently speaks at conferences and community meetups and blogs at https://medium.com/@adhorn.
Wednesday, 3:15 PM
The Meat of It
Abstract Incident Reviews in Software have a tendency to rely on seemingly satisfying yet shallow and misleading oversimplifications, hiding from us that what we think is ‘the meat of it’ is rather an empty bite.
Resilience Engineering warns us about boiling down complex situations to simple explanations. Software has built up a popular culture that is particularly vulnerable to this where there is little patience for anything not easily attributed to measurable outcomes.
Right now, organizations operating at the edge of the envelope want to grow their capabilities, but aren’t sure where to invest in learning the messy details of how to outmaneuver the complexity penalties.
Responders to incidents in software are riding that cloud of instability. In fact, they make it look easy. Really easy. Problem is–the sky is cloudy with a chance of meatballs.
Bio Ryan Kitchens is a senior engineer on the CORE team at Netflix, where he works on building capacity across the organization to ensure its availability and reliability.
Thursday, 1:30 PM
Rule #2: Double Tap. An Elasticsearch Journey of Resiliency and Eliminating Zombies and Split Brain
Abstract At Elastic, our goal is to continuously improve upon the resiliency of Elasticsearch and our other open source software. With each new feature or improvement has brought a new set of resiliency challenges and unintended consequences. Including creating zombies and split brain applications/clusters. We invite you to listen to Elastic’s journey of building resiliency in a complex distributed system and educating you on how to avoid our previous mistakes.
Bio George Kobar is a veteran technologist and a Community Advocate at Elastic. He believes that technology is one of the greatest enablers of humanity. He loves to speak, write and demo technology that can be used to improve the world around us.
Thursday, 2:15 PM
Cognitive Linguistics at the Sharp End: Prototypes, Metaphors, and Resilience
Abstract Our ability to reason about and respond to unanticipated situations “in the moment” is both enabled—and constrained—by our cognitive systems. Developing a deeper understanding of how these cognitive systems work can help us learn and respond more effectively. In this talk, we’ll look at how some results from cognitive linguistics reveal those systems, and how we can apply that to improve our response to operational surprises.
Bio Michael is a Technical Lead at Ubisoft, where he works on platform tools for game operations. He has previously worked on software in the financial and health care sectors, and earned a MA in English Linguistics while taking a break from IT. He is currently interested in ways we can improve the human and organizational side of software delivery and operation.
Thursday, 3:15 PM
A Talk For Right Now: Resilience in the Face of Change
Abstract I thought I was going to be a Park Ranger for my whole life, I thought I was going to grow old with my husband by my side, I thought I would be best friends with my mom forever. As it turns out, life has a funny way of being unpredictable.
I want to give fun talks, memorable and humorous talks, talks about how bears, moose, and ravens are like coding because those are the talks I would want to go to.
But in the last 5 years, I’ve become a Software Developer, my husband has become my wife, and my mom and are thinking about being friends again. During these past years, these changes have caused bouts of depression, thoughts that I could never be myself again, and thoughts that I don’t belong anywhere. I’m still working on overcoming these thoughts and feelings, and I’m not sure they’ll ever go away completely, but I’ve learned ways to reframe the change and try to figure out what I want and try to influence the change I want in my life. And what is this, if not a form of personal resilience?
So, I guess, if I’m honest with myself and with you, this is really the talk I want to give right at this point in my life.
Bio Kate is a Park Ranger and Environmental Educator turned Software Developer, who’s boring fact is that she may have matched socks on today. Kate has a Master’s Degree from Griffith University in Brisbane and a Certificate of Completion from Ada Developers Academy in Seattle. Her experience coming to tech from an unconventional background has given her the opportunity to embrace life changes and overcome a myriad of challenges to be the person and developer she is today. She is passionate about technology, about people, and about making connections that lift others up. She also has bear stories, a light-up, Bluetooth enabled skirt, and loves soldering!
Thursday, 9:10 AM
Trajectory of Chaos
Abstract Almost five years ago I published at manifesto of sorts at https://principlesofchaos.org to define a new discipline in software engineering called Chaos Engineering. It wasn’t about creating chaos, but rather identifying the chaos inherent in a complex system. The other practices that commonly address availability (incident management, alerting, monitoring, disaster recovery, etc) are all reactive: they focus on time to detect, and time to remediate. Chaos Engineering on the other hand is proactive: finding systemic vulnerabilities before they affect customers. Now that Chaos Engineering has high adoption at big tech companies and non-digital native organizations alike, we can look at how the practice is maturing. Our knowledge of systemic properties of complex systems is improving and leading us into a new era of Continuous Verification.
Bio CEO and Cofounder of Verica.io. As an Executive Manager and Senior Architect, Casey manages teams to tackle complex systems, architect solutions to difficult problems, and train others to do the same. He seek opportunities to leverage his experience with distributed systems, artificial intelligence, translating novel algorithms and academia into working models, and selling a vision of the possible to clients and colleagues alike. His superpower is transforming misaligned teams into high performance teams, and his personal mission is to help people see that something different, something better, is possible. For fun, Casey models human behavior using personality profiles in Ruby, Erlang, Elixir, Prolog, and Scala.