Building a Highly Available Prometheus Setup

45 minutes
  • 3 Learning Objectives

About this Hands-on Lab

A single Prometheus server can provide a great deal of value in terms of monitoring. However, it can also become a single point of failure if something goes wrong. Luckily, the simple architecture employed by Prometheus makes it relatively easy to set up additional Prometheus servers in a highly-available configuration. In this lab, you will see what this looks like as you build your own highly-available, multi-instance Prometheus setup.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Copy the Prometheus configuration from the existing Prometheus server to the new instance.
  1. Log in to Prometheus Server 1.

  2. Copy prometheus.yml to Prometheus Server 2.

scp /etc/prometheus/prometheus.yml cloud_user@
  1. Log in to Prometheus Server 2.

  2. Copy prometheus.yml to the appropriate location.

sudo cp ~/prometheus.yml /etc/prometheus/prometheus.yml
Copy the rules configuration from the existing Prometheus server to the new instance.
  1. On Prometheus Server 1, view prometheus.yml to determine the rules location(s) under rule_files:
cat /etc/prometheus/prometheus.yml
  1. Copy files from the rules directory to Prometheus Server 2:
scp /etc/prometheus/rules/* cloud_user@
  1. On Prometheus Server 2, create the rules directory, then copy the rules file to the appropriate location:
sudo mkdir -p /etc/prometheus/rules

sudo cp ~/limedrop-alerts.yml /etc/prometheus/rules
Start the new Prometheus instance and verify that everything is working.
  1. On Prometheus Server 2, start and enable Prometheus:
sudo systemctl enable prometheus

sudo systemctl start prometheus
  1. Access Prometheus Server 2 in a browser at http://<Prometheus Server 2 Public IP>:9090. Run a query to verify that it is scraping metrics from limedrop-web:
  1. You can also click Alerts to verify that the WebServerDown alert appears.

Additional Resources

Your company, LimeDrop, is using Prometheus to monitor a variety of applications and servers. Recently, a major outage occurred which caused the Prometheus server itself to go down, severely impacting the team's ability to discover what went wrong since they could not access any metric data.

The company would like to ensure that Prometheus is more highly available by setting up a second Prometheus instance. This way, if one goes down, it will be more likely that the team can still obtain metric data.

Prometheus Server 1 is an existing server that is already running. Prometheus Server 2 has Prometheus installed, but it is not configured or running. Your task is to configure Prometheus Server 2 to serve as an additional instance alongside Prometheus Server 1 so that monitoring services are more highly available.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?