Talk: Incidents are new normal - Kasia Balcerzak

February 5th, 2021

Lightning Talk: Incidents are new normal - Kasia Balcerzak

 

no description for image available

 

blameless incident review

(1) gather facts

no description for image available

facts

events

everything related to incident

-logs

 

how our org behaved before, during, after the incident

 

no description for image available

not: when did you discover

yes: How did you discover...

 

no description for image available

led to the incident

or made the incident worse

 

causal factors by themselves are not a problem

 

incident = when causal factors are combined

 

eg. "not enough time for testing"

 

Root Causes

no description for image available

anything that allowed to happen + be ignored

 

no description for image available

 

serious incidents are combination

 

no description for image available

Don't prevent things from going wrong!

Try to make things go right!

 

risk management: "Can we handle this when it fails"

 

no description for image available

 

building a learning org is the only way to be proactive

 

no description for image available

failing things are normal

broken production is not normal

 

never waste a good incident to improve your org/product

This post was referenced in: