Google Cloud DevOps and SREs (GCP DevOps Engineer Track Part 2)

By Joseph Lowery
By Mattias Andersson (ACG)

Dive into SRE (Site Reliability Engineering) on your way to becoming a certified Google Professional Cloud DevOps Engineer.

2.5 hours
  • 21 Lessons

About the course

Welcome to the Google Cloud DevOps and SREs course. This course is the second in the Google Professional Cloud DevOps Engineer certification path. If you’re coming from the traditional DevOps world, or even from the general computing world, you’re likely not familiar with the abbreviation SRE. SRE stands for Site Reliability Engineering, and it’s the Google method for realizing DevOps or, in the more formal software speak, "class SRE implements DevOps."

Besides SRE, this field introduces a metric ton of abbreviations: SLI, SLO, SLA — not to mention some weird-sounding phrases such as "error budget" and "toil." During this course, we’ll explain what each of these terms means, how they interconnect, and how they relate to the concept of DevOps.

The SRE approach is quite quantitative. But don’t worry — we’ll explore the exact formulas you’ll need to calculate baseline values for each of the key criteria. We’ll help you see how Google maximizes the engineering velocity of developer teams while keeping products reliable.

In order to balance development and operations, you need to keep an eagle eye on operations. We’ll dive into the various SRE strategies for monitoring reliability with special attention to alerting capabilities. Critically, we’ll spend a good amount of time exploring the best way to handle the inevitable issues and incidents that are part of any service lifecycle.

And it’s not just me here to help you out. My colleague, Mattias Andersson, will stop by at the end of every section for a quick recap and perhaps a slightly different perspective on the topics covered.

We recommend you have an Associate Cloud Engineer level certification before taking this course.

If the world of DevOps in general or Site Reliability Engineering specifically is new to you – whether or not you’re on the certification path – be sure to take this course before diving into our development and operations offerings. It’s designed to lay the foundation you’ll need before you get hands-on.

  • Chapter 1 4 Lessons Introduction 11:16

    An Important Note About A Cloud Guru and Linux Academy Courses

    1:19

    About the Course and Learning Path

    3:28

    About the Training Architects

    1:35

    Milestone: Getting Started...

    4:54
  • Chapter 2 5 Lessons Balancing Change, Velocity, and Service Reliability with SREs 47:09

    Big Picture: What Is Site Reliability Engineering?

    13:07

    Understanding SLIs

    12:43

    Understanding SLOs

    7:00

    Understanding SLAs

    8:11

    Milestone: Oh My!

    6:08
  • Chapter 3 3 Lessons Making the Most of Risk 23:45

    Setting Error Budgets

    9:01

    Defining and Reducing Toil

    8:06

    Milestone: Risky Business

    6:38
  • Chapter 4 4 Lessons Generating SRE Metrics 26:24

    Monitoring Reliability

    6:56

    Alerting Principles

    8:41

    Investigating SRE Tools

    4:02

    Milestone: I See You!

    6:45
  • Chapter 5 4 Lessons Reacting to Incidents 28:28

    Handling Incident Response

    9:45

    Managing Service Lifecycle

    6:21

    Ensuring Healthy Operations Collaboration

    6:40

    Milestone: Incidents R Us

    5:42
  • Chapter 6 1 Lesson Next Steps 2:29

    Milestone and Continuity

    2:29

What you will need

  • Associate Cloud Engineer certificate or equivalent

Practice alongside courses in Cloud Playground

What is Cloud Playground? Cloud Playground lets you build skills in real-world AWS, Google Cloud, and Azure environments. Spin up risk-free Sandboxes, Servers and Terminals and follow along with courses, test a new idea or prepare for exams.

Get Started
Who’s going to be learning?
Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!