Advanced Prometheus Queries

1 hour
  • 2 Learning Objectives

About this Hands-on Lab

The Prometheus Query Language (PromQL) provides a variety of tools that enable you to transform your raw metric data into useful and actionable information. In this lab, you will have the opportunity to explore some advanced features of Prometheus queries as you build queries to solve slightly complex problems.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Write a Query to Determine Which Instances Have High CPU Utilization
  1. Access the Prometheus expression browser in your web browser:

    http://<PROMETHEUS_SERVER_PUBLIC_IP>:9090
  2. Run a query to add the CPU usage in the system and user modes for each instance. Then, filter the results to only instances where the combined number of CPU seconds is more than 10000:

    (node_cpu_seconds_total{mode="system"} + ignoring(mode) node_cpu_seconds_total{mode="user"}) > 10000
  3. Log in to the Prometheus server and open the output file:

    vi /home/cloud_user/report.md
Get the Per-Second Rate of Increase in HTTP Request Duration for All Instances
  1. Run a query from the Prometheus expression browser:

    rate(http_request_duration_seconds_sum[5m])
  2. Copy the output, including the element data and values.

  3. On the Prometheus server, open the output file in a terminal:

    vi /home/cloud_user/report.md
  4. Paste in the data obtained using the query at the end of the file.

Additional Resources

Your company, LimeDrop, has a Prometheus instance that monitors a variety of applications. Some user reports have come in indicating there may be some performance issues somewhere in the company's infrastructure. Unfortunately, these reports do not include enough information to pinpoint the problem.

You have been asked to collect some specific data points from Prometheus in order to help locate the issue. Write and execute queries to obtain the requested data, then record it in a file on the Prometheus server located at /home/cloud_user/report.md.

  • The team performing the troubleshooting needs to know which instances have high CPU usage. Write a query that will add together CPU usage in seconds in both the system and user modes on each instance. Then, add a filter to the query that will return only records where the combined system and user seconds exceed 10000. You can use the node_cpu_seconds_total metric to get this information. Save the instance names of the instances exhibiting high usage to the output file.
  • The team wants to be able to determine how quickly HTTP request duration is currently increasing for all instances. Using the http_request_duration_seconds_sum metric, write a query to determine the rate of increase in HTTP request duration over the last five minutes. Save the resulting value to the output file.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?