Ansible : how to create roles and install prometheus, grafana and node-exporter

How to learn ansible ? Many people begin with this infrastructure as code and many companies adopt it.

Ansible is a great tool because you can do many things with it… and it’s very easy to begin with it. When you try it, you adopt it.

As a practical exercise I’ suggest you to install a monitoring stack like prometheus, grafana and node-exporter. Because that allow you to practice different and standard modules with a little or a large scale.

To begin define an inventory

Before to code your ansible roles, I prefer to begin with the inventory file. In this file, we describe our inrfastructure.

In our case, with use a simple yaml format like it 00_inventory.yaml :

all:
children:
monitor:
hosts:
172.17.0.2
others:
hosts:
172.17.0.5:
172.17.0.4:
172.17.0.3:

Easy ? we have only two groups :

  • monitor : for our monitoring stack (prometheus and grafana)
  • others : for all other servers
  • and of course the all group which merge monitor and other groups

Now let’s go we can create a role directory along the inventory file

mkdir -p roles

Create a role to install node-exporter

For each IaC or configuration manager like ansible, you need to learn a basic case : how to install a binary or a tarball. Node-exporter is a very good example.

First of all, ansible-galaxy command helps us to create the skeleton of a role :

ansible-galaxy init roles/node-exporter

Then we can set some defaults variables in the default directory :

node_exporter_version: "1.1.2"
node_exporter_bin: /usr/local/bin/node_exporter
node_exporter_user: node-exporter
node_exporter_group: "{{ node_exporter_user }}"
node_exporter_dir_conf: /etc/node_exporter

And now the main file of tasks directory :

- name: check if node exporter exist
stat:
path: "{{ node_exporter_bin }}"
register: __check_node_exporter_present
- name: create node exporter user
user:
name: "{{ node_exporter_user }}"
append: true
shell: /usr/sbin/nologin
system: true
create_home: false
- name: create node exporter config dir
file:
path: "{{ node_exporter_dir_conf }}"
state: directory
owner: "{{ node_exporter_user }}"
group: "{{ node_exporter_group }}"
- name: if node exporter exist get version
shell: "cat /etc/systemd/system/node_exporter.service | grep Version | sed s/'.*Version '//g"
when: __check_node_exporter_present.stat.exists == true
changed_when: false
register: __get_node_exporter_version

- name: download and unzip node exporter if not exist
unarchive:
src: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
dest: /tmp/
remote_src: yes
validate_certs: no
- name: move the binary to the final destination
copy:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
dest: "{{ node_exporter_bin }}"
owner: "{{ node_exporter_user }}"
group: "{{ node_exporter_group }}"
mode: 0755
remote_src: yes
when: __check_node_exporter_present.stat.exists == false or not __get_node_exporter_version.stdout == node_exporter_version
- name: clean
file:
path: /tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/
state: absent
- name: install service
template:
src: node_exporter.service.j2
dest: /etc/systemd/system/node_exporter.service
owner: root
group: root
mode: 0755
notify: reload_daemon_and_restart_node_exporter
- meta: flush_handlers- name: service always started
systemd:
name: node_exporter
state: started
enabled: yes

Of course we need to create node_exorter.service.j2 file in teplates directory :

[Unit]
Description=Node Exporter Version {{ node_exporter_version }}
After=network-online.target
[Service]
User={{ node_exporter_user }}
Group={{ node_exporter_user }}
Type=simple
ExecStart={{ node_exporter_bin }}
[Install]
WantedBy=multi-user.target

And finally the handler file in the handler directory :

- name: reload_daemon_and_restart_node_exporter
systemd:
name: node_exporter
state: restarted
daemon_reload: yes
enabled: yes

Now we can initialize a prometheus role :

ansible-galaxy init roles/prometheus

And a role for prometheus and its configuration

Like node-exporter installation, I suggest you to write default variables :

prometheus_dir_configuration: "/etc/prometheus"
prometheus_retention_time: "365d"
prometheus_scrape_interval: "30s"
prometheus_node_exporter: true
prometheus_node_exporter_group: "all"
prometheus_env: "production"
prometheus_var_config:
global:
scrape_interval: "{{ prometheus_scrape_interval }}"
evaluation_interval: 5s
external_labels:
env: '{{ prometheus_env }}'
scrape_configs:
- job_name: prometheus
scrape_interval: 5m
static_configs:
- targets: ['{{ inventory_hostname }}:9090']

You can see, we define a part of the prometheus configuration, the header exactly with prometheus_var_config.

Now I create tasks in the main.yml file of tasks directory :

- name: update and install prometheus
apt:
name: prometheus
state: latest
update_cache: yes
cache_valid_time: 3600
- name: prometheus args
template:
src: prometheus.j2
dest: /etc/default/prometheus
mode: 0644
owner: root
group: root
notify: restart_prometheus
- name: prometheus configuration file
template:
src: prometheus.yml.j2
dest: "{{ prometheus_dir_configuration }}/prometheus.yml"
mode: 0755
owner: prometheus
group: prometheus
notify: reload_prometheus
- name: start prometheus
systemd:
name: prometheus
state: started
enabled: yes

Then I create the prometheus.yaml.j2 file

#jinja2: lstrip_blocks: "True"
{{ prometheus_var_config | to_nice_yaml(indent=2) }}
{% if prometheus_node_exporter_group %}
- job_name: node_exporter
scrape_interval: 15s
static_configs:
- targets:
{% for server in groups[prometheus_node_exporter_group] %}
- {{ server }}:9100
{% endfor %}
{% endif %}

And the prometheus.j2 file for the prometheus CLI :

ARGS="--web.enable-lifecycle --storage.tsdb.retention.time={{ prometheus_retention_time }} --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries

And finally handlers of this role :

- name: restart_prometheus
systemd:
name: prometheus
state: restarted
enabled: yes
daemon_reload: yes
- name: reload_prometheus
uri:
url: http://localhost:9090/-/reload
method: POST
status_code: 200

We have two handlers :

  • for a restart with the systemd service
  • for the reload with a curl on the prometheus API.

The last role for Grafana

Now we can create a last role to install grafana-server package and start it.

Just edit the main.yml file in the tasks directory :

- name: install gpg
apt:
name: gnupg,software-properties-common
state: present
update_cache: yes
cache_valid_time: 3600
- name: add gpg hey
apt_key:
url: "https://packages.grafana.com/gpg.key"
validate_certs: no
- name: add repository
apt_repository:
repo: "deb https://packages.grafana.com/oss/deb stable main"
state: present
validate_certs: no
- name: install grafana
apt:
name: grafana
state: latest
update_cache: yes
cache_valid_time: 3600
- name: start service grafana-server
systemd:
name: grafana-server
state: started
enabled: yes
- name: wait for service up
uri:
url: "http://127.0.0.1:3000"
status_code: 200
register: __result
until: __result.status == 200
retries: 120
delay: 1
- name: change admin password for grafana gui
shell : "grafana-cli admin reset-admin-password {{ grafana_admin_password }}"
register: __command_admin
changed_when: __command_admin.rc !=0

And that’s all for today.

You can run this playbook file :

- name: install monitoring stack
hosts: monitor
become: yes
roles:
- prometheus
- grafana
- name: install node-exporter
hosts: all
become: yes
roles:
- node-exporter

with this ansible CLI :

ansible-playbook -i 00inventory.yaml playbook.yml

Suscribe to my channel and don’t miss the next video !!!

Microservices architecture and opensource. I’m maintainer of xavki https://youtube.com/c/xavki-linux about opensource. My blog : https://xavki.blog/