Posted on 9 February 2023, updated on 28 May 2024.
Do you have a Virtual Machine running Linux that you want to monitor? Do you have several Docker containers that need monitoring but you don’t know where to start? With Ansible, we will see how to set up a full monitoring stack using Prometheus and Grafana to monitor VM and/or Docker containers.
Our objectives
Let’s say that you have a machine running Virtual Machines (VMs) that themselves are running Docker. You can also have another real machine (or more!) that you want to monitor.
This stack is flexible, and that is the whole point. What is important is that you have something to monitor, whether it is VMs, real machines or both does not matter.
For today, we will use 3 VMs on a unique machine and we will monitor only the VMs.
Ok, so now that we all agree on what we have, let’s talk about want we want. One simple word: monitoring. For once in this blog, we will not use Kubernetes. Why? Because we want a simple way to start, and Kubernetes is not that simple for beginners.
So what do we want? We want all our VMs and real machines monitored, with a simple dashboard to visualize the data.
We also want it to answer these criteria:
- I can easily add a new machine or VM to monitor
- I can store my configuration as code using GitHub
- I can safely store my secrets on Github without revealing them to everyone that has access to the repo
I have the perfect solution to achieve these goals: Ansible.
How to achieve our objectives?
First of all, I will not cover here what is Ansible nor how to install it. Others in the community have done it many many times.
However, I will talk about the pattern that we will use to monitor our infrastructure: the observer and the targets.
It is quite simple actually: we have a machine that monitors all the others. On this observer, we will install Prometheus and Grafana. On the targets, we will install the agents that will report the data of our VMs and their Docker containers: node-exporter and cAdvisor.
node-exporter will report metrics on the hardware and the OS (things like CPU consumption for instance), whereas cAdvisor will focus solely on Docker-related metrics. One can work without the other, like on VM 2 on our schema. It is useless to install cAdvisor on VMs that do not execute Docker.
Now that you have the theory, let’s dive deeper into our architecture. Our folders will be structured like that:
├── inventories -> hosts inventory files
│ └── hosts.yml -> describes the different hosts
├── playbooks -> ansible playbooks
├── roles -> ansible roles
And in our hosts.yml
, you will find… the hosts! Whether it is an observer or a target, it will be referenced here. The whole monitoring stack is referenced here.
all:
children:
observer:
hosts:
padok-observer:
ansible_host: 192.168.0.1
target:
hosts:
padok-observer:
ansible_host: 192.168.0.1
padok-target-1:
ansible_host: 192.168.0.10
padok-target-2:
ansible_host: 192.168.0.11
You might find something weird in this file: padok-observer
is mentioned twice. This is because I lied to you earlier! The schema I gave you is missing something: self-monitoring.
This one is way better. Now we also have the first VM’s metrics. In fact, the observer is also a target. That’s why it is mentioned twice: once in the observer list, and once in the target one.
Now that everything is crystal-clear, we can start working on the observer.
Let’s set up the observer
For the observer, we will use three open-source software:
- Prometheus is a free software application used for event monitoring. It records real-time metrics that are collected through network calls in a time series database.
- Grafana is an interactive visualization web application. It provides charts and graphs when connected to supported data sources such as the Prometheus server.
- Prometheus Alertmanager is a web application that handles alerts sent by client applications such as the Prometheus server. It allows us to forward these alerts to a big range of software such as Slack, OpsGenie, etc.
These guys will interact together to create a full monitoring stack.
The configuration of Prometheus, Grafana, and Alertmanager is not the main topic of this tutorial. But you will find the entire codebase in our GitHub.
Moving back to Ansible. In order to make everything work correctly, in our Ansible roles we need to :
- Create the configuration folders
- Create the configuration files
- Create the application containers
This gives us this roles/observer/tasks/main.yml
:
- name: Create Folder /srv/prometheus if not exist
file:
path: /srv/prometheus
mode: 0755
state: directory
- name: Create Folder /srv/grafana if not exist
file:
path: /srv/grafana
mode: 0755
state: directory
- name: Create Folder /srv/alertmanager if not exist
file:
path: /srv/alertmanager
mode: 0755
state: directory
- name: Create prometheus configuration file
copy:
dest: /srv/prometheus/prometheus.yml
src: prometheus_main.yml
mode: 0644
- name: Create prometheus alert configuration file
copy:
dest: /srv/prometheus/prometheus_alerts_rules.yml
src: prometheus_alerts_rules.yml
mode: 0644
- name: Create grafana configuration files
copy:
dest: /srv/
src: grafana
mode: 0644
- name: Create alertmanager configuration file
template:
dest: /srv/alertmanager/alertmanager.yml
src: alertmanager/alertmanager.j2
mode: 0644
- name: Create Prometheus container
docker_container:
name: prometheus
restart_policy: always
image: prom/prometheus:
volumes:
- /srv/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /srv/prometheus/prometheus_alerts_rules.yml:/etc/prometheus/prometheus_alerts_rules.yml
- prometheus_main_data:/prometheus
command: >
--config.file=/etc/prometheus/prometheus.yml
--storage.tsdb.path=/prometheus
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
published_ports: "9090:9090"
- name: Create Grafana container
docker_container:
name: grafana
restart_policy: always
image: grafana/grafana:
volumes:
- grafana-data:/var/lib/grafana
- /srv/grafana/provisioning:/etc/grafana/provisioning
- /srv/grafana/dashboards:/var/lib/grafana/dashboards
env:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
published_ports: "3000:3000"
- name: Create Alertmanager container
docker_container:
name: alertmanager
restart_policy: always
image: prom/alertmanager:
volumes:
- alertmanager-data:/data
- /srv/alertmanager:/config
command: >
--config.file=/config/alertmanager.yml
--log.level=debug
published_ports: "9093:9093"
You might have some questions about some interesting particularities of the setup, for instance, how to handle a secret.
Using Ansible Vault and Jinja2 to handle a secret
In our monitoring configuration, we often have passwords or tokens. We call them secrets because that’s what they are: secrets. Remember our objectives: we want to be able to store our code and our secrets safely on GitHub (or any other repo).
But we don’t want our secrets to be readable by anyone! Well, worry no more: let me introduce to you Ansible Vault.
With a simple command, you can encrypt your secret. It will ask for an encryption password; do not forget it, you will need it at every deployment :
ansible-vault encrypt_string "password" --ask-vault-pass
It will give you something like:
!vault |
$ANSIBLE_VAULT;1.1;AES256
64306663363562356132323065396635636630373031303739323666373262663961393132316333
6135653763363566303331313639633030623530646239310a353236343035643132646230333466
36336439376131333630346563323833313164353265313264643232373465633561663331396133
3163303166373166390a396131303239356139653063616437363933333130393563646338663933
3966
And voila! Your secret is now encrypted. You can use it with Jinja2 with the template
command (as we saw before) and your roles/observer/defaults/main.yml
:
---
prometheus_version: v2.40.1
grafana_version: "9.2.5"
alertmanager_version: v0.24.0
alertmanager_smtp_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
64306663363562356132323065396635636630373031303739323666373262663961393132316333
6135653763363566303331313639633030623530646239310a353236343035643132646230333466
36336439376131333630346563323833313164353265313264643232373465633561663331396133
3163303166373166390a396131303239356139653063616437363933333130393563646338663933
3966
In the roles/observer/templates/alertmanager.j2
, you can call the alertmanager_smtp_password
which will be decrypted when applied:
route:
receiver: "mail"
repeat_interval: 4h
group_by: [ alertname ]
receivers:
- name: "mail"
email_configs:
- smarthost: "outlook.office365.com:587"
auth_username: "test@padok.fr"
auth_password: ""
from: "test@padok.fr"
to: "test@padok.fr"
I want to deploy!
Enough talking, let’s deploy our observer! Thanks to the tags in our playbooks, we have the ability to deploy only the observer.
This is our playbook in playbooks/monitoring.yml
:
- name: Install Observability stack (targets)
hosts: target
tags:
- monitoring
- target
roles:
- ../roles/target
- name: Install Observability stack (observer)
hosts: observer
tags:
- monitoring
- observer
roles:
- ../roles/observer
In our Ansible Playbook command, we will specify that we want to execute only the roles with the observer
tag.
ansible-playbook -i ansible/inventories/hosts.yml -u TheUserToExecuteWith ansible/playbooks/monitoring.yml -t observer --ask-vault-pass
That’s it! You now have a fully functional observer. You can access Prometheus on port number 9090, and Grafana on the 3000. The only thing missing now are the targets and you will have installed a full monitoring stack.
Let’s set up the targeted ones
For the targeted ones, the setup will depend on what is on your machine. We will use two main agents: node-exporter and cAdvisor. As I’ve said before, node-exporter will focus on hardware and OS metrics, whereas cAdvisor will report Docker-related metrics.
Here, we will install both. This gives us this Ansible role: roles/target/tasks/main.yml
:
- name: Create NodeExporter
docker_container:
name: node-exporter
restart_policy: always
image: prom/node-exporter:
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command: >
--path.procfs=/host/proc
--path.rootfs=/rootfs
--path.sysfs=/host/sys
--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
published_ports: "9100:9100"
- name: Create cAdvisor
docker_container:
name: cadvisor
restart_policy: always
image: gcr.io/cadvisor/cadvisor:
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
published_ports: "9101:8080"
These two will expose their metrics on a specific port (9100 for node-exporter and 8080 for cAdvisor) and create an endpoint called /metrics
for Prometheus to scrape.
There is only one thing missing for our monitoring stack to work: Prometheus needs to be aware of the targeted ones. For that, we will add them to the scrape_configs
of the Prometheus config file (/roles/observer/files/prometheus_main.yml
) :
scrape_configs:
- job_name: prometheus
scrape_interval: 30s
static_configs:
- targets: ["localhost:9090"]
- job_name: node-exporter
scrape_interval: 30s
static_configs:
- targets: ["192.168.0.1:9100", "192.168.0.10:9100", "192.168.0.11:9100"]
- job_name: cadvisor
scrape_interval: 30s
static_configs:
- targets: ["192.168.0.1:9101", "192.168.0.11:9101"]
You can now deploy the monitored ones. To do that, same as before, use the tags:
ansible-playbook -i ansible/inventories/hosts.yml -u TheUserToExecuteWith ansible/playbooks/monitoring.yml -t target --ask-vault-pass
Be careful, if you have modified the Prometheus config file, you will also need to redeploy the observer and restart Prometheus to apply the configuration.
Add a targeted machine
To add a targeted machine or VM, the steps are quite easy:
- Make sure that you can connect through SSH
- Add the IP of the targeted machine and its hostname under
target
in the inventory (hosts.yml
) - Add the IP of the targeted machine in the
targets
of thenode-exporter
job (andcadvisor
if you also want to monitor the containers) of the observer machine (ansible/roles/observer/files/prometheus_main.yml
). - Run the
ansible-playbook
command for both the observer and the targets - That’s it!
We have now covered everything. All the codebase is available publicly.
Here is what it looks like (this is the node-exporter dashboard):
The end?
In fact, it might not be the end.
The use of templating with Jinja2 may allow you to update the Prometheus config file in accordance with the automatically hosts.yml
file.
Also, remember when I told you that we would not use Kubernetes because it wasn’t simple enough? Well, I might have been wrong. With Kube, the Prometheus operator does everything I’ve shown you automatically.
The initial setup is longer, but it may be a better solution depending on the situation. It is up to you!