Welcome to the Google Cloud DevOps and SREs course. This course is the second in the Google Professional Cloud DevOps Engineer certification path. If you’re coming from the traditional DevOps world, or even from the general computing world, you’re likely not familiar with the abbreviation SRE. SRE stands for Site Reliability Engineering, and it’s the Google method for realizing DevOps or, in the more formal software speak, "class SRE implements DevOps."
Besides SRE, this field introduces a metric ton of abbreviations: SLI, SLO, SLA — not to mention some weird-sounding phrases such as "error budget" and "toil." During this course, we’ll explain what each of these terms means, how they interconnect, and how they relate to the concept of DevOps.
The SRE approach is quite quantitative. But don’t worry — we’ll explore the exact formulas you’ll need to calculate baseline values for each of the key criteria. We’ll help you see how Google maximizes the engineering velocity of developer teams while keeping products reliable.
In order to balance development and operations, you need to keep an eagle eye on operations. We’ll dive into the various SRE strategies for monitoring reliability with special attention to alerting capabilities. Critically, we’ll spend a good amount of time exploring the best way to handle the inevitable issues and incidents that are part of any service lifecycle.
And it’s not just me here to help you out. My colleague, Mattias Andersson, will stop by at the end of every section for a quick recap and perhaps a slightly different perspective on the topics covered.
We recommend you have an Associate Cloud Engineer level certification before taking this course.
If the world of DevOps in general or Site Reliability Engineering specifically is new to you – whether or not you’re on the certification path – be sure to take this course before diving into our development and operations offerings. It’s designed to lay the foundation you’ll need before you get hands-on.