Video: When Is SRE Right For You? - Stephen Thorne

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.png ./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.1.png

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.2.png ./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.3.png - release - capacity planning - migrate to another cluster - provisioning  - installing - operating

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.4.png

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.5.png

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.6.png best people to do emergency response

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.7.png - human cost (less people or more systems) - resource cost (fewer VMs)

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/screenshot.8.png Model: When SRE? Mission Critical, Operable, Mutable

Examples rated in that model^: -ads -network hardware -hosted software -HR website -web shop -data pipeline -kubernetes cluster -

./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/unknown_filename.png when it’s not suitable

lower expectations / lower targets SLAs

production readyness review

when you have unreliable dependencies ./resources/video-when-is-sre-right-for-you-stephen-thorne.resources/unknown_filename.1.png

Q: how to convince company leadership that SRE is needed A: figure out where it will provide value to your business

  • decrease codes
  • more reliable

Q: would you start with just one system A: start with one system, or one stack eg. app + cluster + dependency subdivide & iterate “you want to have people in place that can actually defend the customer experience, if they’re not empowered and enabled to do that both short term and long term then they are applying their energy to the wrong place”

Q: take responsibility from a dev A: devs tend to be really happy with that, too much even we need to leave the devs to do a little bit of running it themselves partially responsible -one on call rotation/month -do the releases / half the releases

Q: at what level of scale is hiring SRE worth it A: I’ve seen SRE done successfully at businesses 40-50 ppl as you scale down, you apply practices but not staffing -error budgets -not full time people

Q: difference between SRE & devops specialist engineer? A: SRE is job role SRE = take a bunch of devs: “your responsibility is do nothing but prod platform” Devops spec eng = same space, different heritage parallel evolution between SRE (google) & devops (outside google) ~> same role, same goals

Q: SRE vs CRE A: Customer Reliability Engineer,  team of SREs called CRE generally a SRE focus: look outwards towards customers who are using us as a platform, enable them to have success on top of our platform interface SRE teams of customers

Q: dev team + SRE team working on same code? A: inside error budget, everything goes, outside budget - work together with dev team it’s reactive reliable enough: remove breaks, remove friction, increase velocity not reliable enough: SRE team empowered to do something to address (eg. no features for a while, we need to address reliability)