Prometheus and Grafana

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects and stores time-series metrics from systems, applications, and services. Grafana is a visualisation and analytics platform that integrates seamlessly with Prometheus and other data sources. It allows users to create interactive dashboards, set up alerts, and analyse performance trends in real time. Together, Prometheus and Grafana form a robust monitoring stack- Prometheus handles data collection and storage, while Grafana provides visual insights and alerting.

Prometheus

Prometheus will be installed on the head node (node1). To install Prometheus, first, create a dedicated user for Prometheus:

sudo useradd --system --no-create-home --shell /usr/sbin/nologin prometheus
id prometheus

Then download, extract, and install Prometheus on your system, then link the binaries to your system path so they can be run easily from anywhere in the terminal.

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v3.5.0/prometheus-3.5.0.linux-amd64.tar.gz

sudo tar -xvf /tmp/prometheus-3.5.0.linux-amd64.tar.gz -C /opt/

sudo ln -s /opt/prometheus-3.5.0.linux-amd64/prometheus /usr/local/bin/prometheus
sudo ln -s /opt/prometheus-3.5.0.linux-amd64/promtool /usr/local/bin/promtool

You can verify the installation by checking the versions:

prometheus --version
promtool --version

Then create the necessary directories and set the appropriate permissions:

sudo mkdir -p /etc/prometheus
sudo mkdir -p /var/lib/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

Next, create the Prometheus configuration file at /etc/prometheus/prometheus.yml:

global:
    scrape_interval: 15s

scrape_configs:
    - job_name: 'node_exporters'
    static_configs:
            - targets: ['node1:9100', 'node2:9100', 'node3:9100', 'node4:9100', 'node5:9100', 'node6:9100', 'node7:9100']
  • scrape_interval: Sets how often Prometheus collects (or scrapes) metrics from all configured targets.

  • job_name: Labels this scrape job (node_exporters ). This is helpful for organising and filtering metrics later.

  • static_configs: Lists fixed, manually defined targets.

  • targets: Specifies the hosts and ports to scrape. Each target here is a node running Node Exporter - a Prometheus agent that exposes system metrics like CPU, memory, and disk usage on port 9100.

Now set the ownership and permissions for the configuration file:

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
sudo chmod 644 /etc/prometheus/prometheus.yml

Then create a systemd service file, /etc/systemd/system/prometheus.service, for Prometheus with the following conent

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus

ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus \
    --web.listen-address=:9090

Restart=on-failure

[Install]
WantedBy=multi-user.target
  • Description: A short description of the service (shown when you run systemctl status prometheus).

  • Wants: Ensures the network is up before Prometheus starts, but doesn’t strictly block startup if it fails.

  • After: Makes Prometheus start after the system’s network is ready.

Finally, reload the systemd daemon and start the Prometheus service:

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus

Node Exporter

Node Exporter is a lightweight agent used with Prometheus to collect detailed hardware and OS-level metrics from Linux systems. It runs on each node and exposes system information such as CPU usage, memory consumption, disk I/O, filesystem statistics, and network performance via an HTTP endpoint (usually :9100/metrics). In a monitoring setup, Prometheus periodically scrapes these metrics from Node Exporter, helping administrators track system health, detect performance issues, and analyse resource utilisation across nodes in a cluster or HPC environment.

Node Exporter has to be installed on login (node2), compute (node3, node4, node5), and storage node (node6, node7). So the next steps have to be done on all these nodes.

First, create a dedicated user for Node Exporter:

sudo useradd --system --no-create-home --shell /usr/sbin/nologin node_exporter
id node_exporter

Then download and install Node Exporter:

cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.10.2/node_exporter-1.10.2.linux-amd64.tar.gz

sudo tar -xvf /tmp/node_exporter-1.10.2.linux-amd64.tar.gz -C /opt/

sudo ln -s /opt/node_exporter-1.10.2.linux-amd64/node_exporter /usr/local/bin/node_exporter

Verify the installation by checking the version:

/usr/local/bin/node_exporter --version

Then create a systemd service file /etc/systemd/system/node_exporter.service for Node Exporter:

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

This systemd service file configures Node Exporter to run automatically as a background service on system startup. It starts after the network is available, runs securely under the dedicated node_exporter user, and executes the Node Exporter binary to expose system metrics (usually on port 9100). The service is set to automatically restart if it fails, with a short 5-second delay between attempts, ensuring continuous availability for Prometheus monitoring.

Finally, reload the systemd daemon and start the Node Exporter service:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter

Grafana

To install Grafana, first, import the Grafana GPG key:

cd /tmp
wget -q -O gpg.key https://rpm.grafana.com/gpg.key
sudo rpm --import gpg.key

Then create the Grafana repository file /etc/yum.repos.d/grafana.repo:

[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Once the repository is set up, install Grafana:

sudo dnf install grafana -y
grafana-server -v

Then start and enable the Grafana service:

sudo systemctl enable --now grafana-server
sudo systemctl status grafana-server

Dashboard

Now that everything is set up, access the Grafana web interface by navigating to the following URL in your web browser on the local system.

http://<ip-of-management-node>:3000
../_images/0_login.png

The default username and password are both admin. Once logged in, you have the following landing page.

../_images/1_dash.png

The next step is to select a data source for Grafana. In our case, the datasource is Prometheus.

../_images/2_datasource.png

Within the Prometheus data source, we have to specify the Prometheus server. In our case, Grafana and Prometheus are installed on Node1, so we can point to the Prometheus server using localhost.

../_images/3_localhost.png

Once this is set, you can save the datasource.

../_images/4_save.png

Now that the datasource is defined, we can create a dashboard.

../_images/5_dash.png

For this, first select any metric you want in the dashboard, along with the visualisation style you want.

../_images/6_select.png

Once selected, run the query using the Run queries button, and this will give you the dashboard for that metric.

../_images/7_visual.png

You can repeat this step for different metrics and have multiple dashboards.