Let’s say that you have a machine running Virtual Machines (VMs) that themselves are running Docker. You can also have another real machine (or more!) that you want to monitor.
This stack is flexible, and that is the whole point. What is important is that you have something to monitor, whether it is VMs, real machines or both does not matter.
For today, we will use 3 VMs on a unique machine and we will monitor only the VMs.
Ok, so now that we all agree on what we have, let’s talk about want we want. One simple word: monitoring. For once in this blog, we will not use Kubernetes. Why? Because we want a simple way to start, and Kubernetes is not that simple for beginners.
So what do we want? We want all our VMs and real machines monitored, with a simple dashboard to visualize the data.
We also want it to answer these criteria:
I have the perfect solution to achieve these goals: Ansible.
First of all, I will not cover here what is Ansible nor how to install it. Others in the community have done it many many times.
However, I will talk about the pattern that we will use to monitor our infrastructure: the observer and the targets.
It is quite simple actually: we have a machine that monitors all the others. On this observer, we will install Prometheus and Grafana. On the targets, we will install the agents that will report the data of our VMs and their Docker containers: node-exporter and cAdvisor.
node-exporter will report metrics on the hardware and the OS (things like CPU consumption for instance), whereas cAdvisor will focus solely on Docker-related metrics. One can work without the other, like on VM 2 on our schema. It is useless to install cAdvisor on VMs that do not execute Docker.
Now that you have the theory, let’s dive deeper into our architecture. Our folders will be structured like that:
├── inventories -> hosts inventory files
│ └── hosts.yml -> describes the different hosts
├── playbooks -> ansible playbooks
├── roles -> ansible roles
And in our hosts.yml
, you will find… the hosts! Whether it is an observer or a target, it will be referenced here. The whole monitoring stack is referenced here.
all:
children:
observer:
hosts:
padok-observer:
ansible_host: 192.168.0.1
target:
hosts:
padok-observer:
ansible_host: 192.168.0.1
padok-target-1:
ansible_host: 192.168.0.10
padok-target-2:
ansible_host: 192.168.0.11
You might find something weird in this file: padok-observer
is mentioned twice. This is because I lied to you earlier! The schema I gave you is missing something: self-monitoring.
This one is way better. Now we also have the first VM’s metrics. In fact, the observer is also a target. That’s why it is mentioned twice: once in the observer list, and once in the target one.
Now that everything is crystal-clear, we can start working on the observer.
For the observer, we will use three open-source software:
These guys will interact together to create a full monitoring stack.
The configuration of Prometheus, Grafana, and Alertmanager is not the main topic of this tutorial. But you will find the entire codebase in our GitHub.
Moving back to Ansible. In order to make everything work correctly, in our Ansible roles we need to :
This gives us this roles/observer/tasks/main.yml
:
- name: Create Folder /srv/prometheus if not exist
file:
path: /srv/prometheus
mode: 0755
state: directory
- name: Create Folder /srv/grafana if not exist
file:
path: /srv/grafana
mode: 0755
state: directory
- name: Create Folder /srv/alertmanager if not exist
file:
path: /srv/alertmanager
mode: 0755
state: directory
- name: Create prometheus configuration file
copy:
dest: /srv/prometheus/prometheus.yml
src: prometheus_main.yml
mode: 0644
- name: Create prometheus alert configuration file
copy:
dest: /srv/prometheus/prometheus_alerts_rules.yml
src: prometheus_alerts_rules.yml
mode: 0644
- name: Create grafana configuration files
copy:
dest: /srv/
src: grafana
mode: 0644
- name: Create alertmanager configuration file
template:
dest: /srv/alertmanager/alertmanager.yml
src: alertmanager/alertmanager.j2
mode: 0644
- name: Create Prometheus container
docker_container:
name: prometheus
restart_policy: always
image: prom/prometheus:
volumes:
- /srv/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /srv/prometheus/prometheus_alerts_rules.yml:/etc/prometheus/prometheus_alerts_rules.yml
- prometheus_main_data:/prometheus
command: >
--config.file=/etc/prometheus/prometheus.yml
--storage.tsdb.path=/prometheus
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
published_ports: "9090:9090"
- name: Create Grafana container
docker_container:
name: grafana
restart_policy: always
image: grafana/grafana:
volumes:
- grafana-data:/var/lib/grafana
- /srv/grafana/provisioning:/etc/grafana/provisioning
- /srv/grafana/dashboards:/var/lib/grafana/dashboards
env:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
published_ports: "3000:3000"
- name: Create Alertmanager container
docker_container:
name: alertmanager
restart_policy: always
image: prom/alertmanager:
volumes:
- alertmanager-data:/data
- /srv/alertmanager:/config
command: >
--config.file=/config/alertmanager.yml
--log.level=debug
published_ports: "9093:9093"
You might have some questions about some interesting particularities of the setup, for instance, how to handle a secret.
In our monitoring configuration, we often have passwords or tokens. We call them secrets because that’s what they are: secrets. Remember our objectives: we want to be able to store our code and our secrets safely on GitHub (or any other repo).
But we don’t want our secrets to be readable by anyone! Well, worry no more: let me introduce to you Ansible Vault.
With a simple command, you can encrypt your secret. It will ask for an encryption password; do not forget it, you will need it at every deployment :
ansible-vault encrypt_string "password" --ask-vault-pass
It will give you something like:
!vault |
$ANSIBLE_VAULT;1.1;AES256
64306663363562356132323065396635636630373031303739323666373262663961393132316333
6135653763363566303331313639633030623530646239310a353236343035643132646230333466
36336439376131333630346563323833313164353265313264643232373465633561663331396133
3163303166373166390a396131303239356139653063616437363933333130393563646338663933
3966
And voila! Your secret is now encrypted. You can use it with Jinja2 with the template
command (as we saw before) and your roles/observer/defaults/main.yml
:
---
prometheus_version: v2.40.1
grafana_version: "9.2.5"
alertmanager_version: v0.24.0
alertmanager_smtp_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
64306663363562356132323065396635636630373031303739323666373262663961393132316333
6135653763363566303331313639633030623530646239310a353236343035643132646230333466
36336439376131333630346563323833313164353265313264643232373465633561663331396133
3163303166373166390a396131303239356139653063616437363933333130393563646338663933
3966
In the roles/observer/templates/alertmanager.j2
, you can call the alertmanager_smtp_password
which will be decrypted when applied:
route:
receiver: "mail"
repeat_interval: 4h
group_by: [ alertname ]
receivers:
- name: "mail"
email_configs:
- smarthost: "outlook.office365.com:587"
auth_username: "test@padok.fr"
auth_password: ""
from: "test@padok.fr"
to: "test@padok.fr"
Enough talking, let’s deploy our observer! Thanks to the tags in our playbooks, we have the ability to deploy only the observer.
This is our playbook in playbooks/monitoring.yml
:
- name: Install Observability stack (targets)
hosts: target
tags:
- monitoring
- target
roles:
- ../roles/target
- name: Install Observability stack (observer)
hosts: observer
tags:
- monitoring
- observer
roles:
- ../roles/observer
In our Ansible Playbook command, we will specify that we want to execute only the roles with the observer
tag.
ansible-playbook -i ansible/inventories/hosts.yml -u TheUserToExecuteWith ansible/playbooks/monitoring.yml -t observer --ask-vault-pass
That’s it! You now have a fully functional observer. You can access Prometheus on port number 9090, and Grafana on the 3000. The only thing missing now are the targets and you will have installed a full monitoring stack.
For the targeted ones, the setup will depend on what is on your machine. We will use two main agents: node-exporter and cAdvisor. As I’ve said before, node-exporter will focus on hardware and OS metrics, whereas cAdvisor will report Docker-related metrics.
Here, we will install both. This gives us this Ansible role: roles/target/tasks/main.yml
:
- name: Create NodeExporter
docker_container:
name: node-exporter
restart_policy: always
image: prom/node-exporter:
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command: >
--path.procfs=/host/proc
--path.rootfs=/rootfs
--path.sysfs=/host/sys
--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
published_ports: "9100:9100"
- name: Create cAdvisor
docker_container:
name: cadvisor
restart_policy: always
image: gcr.io/cadvisor/cadvisor:
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
published_ports: "9101:8080"
These two will expose their metrics on a specific port (9100 for node-exporter and 8080 for cAdvisor) and create an endpoint called /metrics
for Prometheus to scrape.
There is only one thing missing for our monitoring stack to work: Prometheus needs to be aware of the targeted ones. For that, we will add them to the scrape_configs
of the Prometheus config file (/roles/observer/files/prometheus_main.yml
) :
scrape_configs:
- job_name: prometheus
scrape_interval: 30s
static_configs:
- targets: ["localhost:9090"]
- job_name: node-exporter
scrape_interval: 30s
static_configs:
- targets: ["192.168.0.1:9100", "192.168.0.10:9100", "192.168.0.11:9100"]
- job_name: cadvisor
scrape_interval: 30s
static_configs:
- targets: ["192.168.0.1:9101", "192.168.0.11:9101"]
You can now deploy the monitored ones. To do that, same as before, use the tags:
ansible-playbook -i ansible/inventories/hosts.yml -u TheUserToExecuteWith ansible/playbooks/monitoring.yml -t target --ask-vault-pass
Be careful, if you have modified the Prometheus config file, you will also need to redeploy the observer and restart Prometheus to apply the configuration.
To add a targeted machine or VM, the steps are quite easy:
target
in the inventory (hosts.yml
)targets
of the node-exporter
job (and cadvisor
if you also want to monitor the containers) of the observer machine (ansible/roles/observer/files/prometheus_main.yml
).ansible-playbook
command for both the observer and the targetsWe have now covered everything. All the codebase is available publicly.
Here is what it looks like (this is the node-exporter dashboard):
In fact, it might not be the end.
The use of templating with Jinja2 may allow you to update the Prometheus config file in accordance with the automatically hosts.yml
file.
Also, remember when I told you that we would not use Kubernetes because it wasn’t simple enough? Well, I might have been wrong. With Kube, the Prometheus operator does everything I’ve shown you automatically.
The initial setup is longer, but it may be a better solution depending on the situation. It is up to you!