Monitoring a Batch Job with Prometheus Pushgateway

45 minutes
  • 2 Learning Objectives

About this Hands-on Lab

Prometheus Pushgateway provides a way to provide metrics to Prometheus with a push-based model. This is particularly useful for monitoring short-lived job processes. In this lab, you will have the opportunity to work with the Pushgateway API by pushing metrics to it. You will modify a simple job to implement monitoring for the job by pushing metrics to Pushgateway every time it runs.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Modify the Cleanup Job to Push a Metric to Pushgateway to Signal When It Runs
  1. Log in to the Job server.

  2. Edit the cleanup job script:

    sudo vi /etc/jobs/cleanup.sh
  3. Implement a call to the Pushgateway API at the end of the script to signal that the job ran:

    num_files=$(rm -vrif /etc/debug_data/* | wc -l)
    
    cat << EOF | curl --data-binary @- http://prometheus:9091/metrics/job/debug_cleanup/instance/10.0.1.102
      # TYPE job_executed_successful gauge
      job_executed_successful 1
    EOF
  4. Access the Prometheus expression browser (http://<PROMETHEUS_SERVER_PUBLIC_IP>:9090/graph), and run a query to verify you can see metric data pushed to Pushgateway by the cleanup job script. Note you may have to wait a minute or so for the job to execute so you can see your changes.

    job_executed_successful[5m]
Modify the Cleanup Job to Push a Metric to Pushgateway Representing the Number of Files Deleted in Each Execution
  1. Edit the cleanup job script again:

    sudo vi /etc/jobs/cleanup.sh
  2. Implement a call to the Pushgateway API at the end of the script to signal that the job ran:

    num_files=$(rm -vrif /etc/debug_data/* | wc -l)
    
    cat << EOF | curl --data-binary @- http://prometheus:9091/metrics/job/debug_cleanup/instance/10.0.1.102
      # TYPE job_executed_successful gauge
      job_executed_successful 1
      # TYPE job_num_files_deleted gauge
      job_num_files_deleted $num_files
    EOF
  3. Access the expression browser again and verify you can see the new metric in Prometheus. Note you may have to wait a minute or so for the job to execute so you can see your changes.

    job_num_files_deleted[5m]

Additional Resources

Your company, LimeDrop, is using Prometheus to monitor their infrastructure. There is a simple Bash cleanup script that removes unneeded files periodically, but this job is not monitored. Since this job is a short-lived process, it can use Pushgateway to send metrics to Prometheus using a push model.

Implement monitoring for the job by modifying the cleanup script to send some metrics to Pushgateway.

Some additional details:

  • The cleanup script is located on the Job server at /etc/jobs/cleanup.sh. You can implement monitoring by making changes to this script.
  • The script is already set up to run approximately once every minute.
  • You can reach Pushgateway from the Job server at prometheus:9091.
  • Implement a simple gauge metric for the job called job_executed_successful. Push this metric with a value of 1 every time the script completes execution.
  • Implement a gauge metric for the job called job_num_files_deleted. Every time the job executes, send push metric with a value representing the number of files cleaned up by the script. The script already contains a variable called num_files you can use to get the number of files.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?