The first way to boost the quality of your code is to write as few codes as possible. This is not only true when it comes to Ansible code, but in any development project, really. If you don't write any code, you cannot make any mistakes, right?
The Ansible community is very active in writing roles for all types of use-cases, so before writing one, always keep in mind that whenever you want to automate something, chances are someone else has already done it.
Ansible Galaxy is the community registry for ansible resources. It comes with a CLI, ansible-galaxy
, which allows installing roles and collections like so:
# Install a role
ansible-galaxy role install role_name
# Install a collection of modules
ansible-galaxy collection install collection_name
In case you do not find what you need online, and you do have to write a role, this CLI can also help you to bootstrap the project following the official guidelines.
# Create a role from scratch
ansible-galaxy role init role_name
As your codebase grows, it will necessarily become more difficult to keep an eye on its quality. Are all parts of the code up to your team's standards? A good way to find out is to use static code analysis tools. Two specific tools are the community's standards when it comes to Ansible : yamllint
and ansible-lint
.
Yamllint will spot mistakes in your yaml
syntax, which is at the core of your Ansible code. It will also ensure your code styling is consistent across your codebase. This may not seem as important, but it actually helps a lot to have a consistent codebase, especially when you have a very large codebase. It will also make it so much easier to onboard a new team member and share your knowledge of the codebase.
The second tool, ansible-lint, will check the logic of your playbooks and roles against a database of proven practices to avoid the most common pitfalls. It will make recommendations to help keep your code as maintainable as possible. Most of the time, ansible-lint comes with yamllint included, but it will depend on the version you install, so keep that in mind.
# With "-p" option, you get a condensed output. For full output, remove
# the "-p" option. It allows me to keep the example concise in this article.
$ ansible-lint -p ./role_name
./tasks/main.yaml:146: [EANSIBLE0002] Trailing whitespace
./tasks/main.yaml:209: [EANSIBLE0002] Trailing whitespace
./tasks/exec.yaml:16: [EANSIBLE0012] Commands should not change things if nothing needs doing
./tasks/exec.yaml:23: [EANSIBLE0013] Use shell only when shell functionality is required
In this example, you see 3 of the 18 default rules. To get the full list, simply use ansible-lint -L
.
Both of these tools are very easy to set up, and when integrated into your git-flow, as part of your CI/CD, for example, they will be the guardians of your quality standards.
Unit testing plays a big part in avoiding regressions when developing new features. It is very commonly used in most programming languages, but did you know that there is such a thing as a testing framework for Ansible?
It is called Molecule, and allows for testing your Ansible roles within docker containers. It is configured using YAML, and in its default setup, a config file will look like this:
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: instance
image: docker.io/pycontribs/centos:8
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible
The default setup can be obtained by creating your role with molecule init role my-new-role --driver-name docker
.
This command will bootstrap your role just like the ansible-galaxy role init
command does, but it will add a default configuration for linting and testing.
In fact, molecule
can be installed with common linting tools, and you can then use it for linting as well!
If we look at this config file, we see that molecule will use the docker image [docker.io/pycontribs/centos:8]
to create an instance against which it will run tasks using a given provisioner (Ansible is the only supported provisioner as of today).
It will then test the state of the instance using a verifier. In this example, the verifier is Ansible, which means that all test verifications will be done through Ansible tasks.
Here is another example from the default setup:
---
# This is an example playbook to execute Ansible tests.
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Example assertion
assert:
that: true
Ansible is not the only verifier available, one could also use Testinfra, but Ansible is the default and most common option.
Having tests like this that run your playbooks or roles against a test instance to verify the final state corresponds to the expected one is always a huge help in boosting the quality of your code. This means it will be much easier to find out about regressions before they go to production, but it also means that the expected final state of your role is well-documented within your tests.
Your README probably describes part of the expected state, like "have a working PostgreSQL server running", but this is only the tip of the iceberg. Knowing exactly how you expect your role to configure your instance will help your team collaborate better towards a common goal.
Complex Ansible roles can become slow as their task list grows. However, this can probably be improved by identifying bottlenecks in your execution pipeline. How can this be done? Using a feature of Ansible called "Callback Plugins".
What are callback plugins exactly?
Let's see with an example: the profile_tasks
plugin.
Installing it is as simple as adding the callback_whitelist
parameter to the defaults
section of your ansible.cfg
file.
A quick tip: if you want to share ansible.cfg
settings at a project level, you can actually create an ansible.cfg
file at the root of your repo, which will override the machine's settings.
Here is what it looks like:
[defaults]
callback_whitelist = profile_tasks
During your role's execution, you will now see start and end timestamps for each task, and when the execution is done, you will get a summary listing the 20 longest tasks in your role.
Here the callback plugin is added as a hook on task execution and will be called whenever a task is started.
Here is a sample output for a small role that installs a list of homebrew packages on a mac:
Tuesday 29 June 2021 08:56:32 +0200 (0:00:20.438) 0:03:49.384 **********
================================================================================
config_setup : Install homebrew packages ------------------------------- 180.29s
config_setup : Update homebrew ------------------------------------------ 23.26s
config_setup : homebrew cleanup ----------------------------------------- 20.44s
config_setup : Install homebrew cask packages ---------------------------- 1.82s
config_setup : Add custom brew taps -------------------------------------- 1.79s
Gathering Facts ---------------------------------------------------------- 1.32s
Gathering Facts ---------------------------------------------------------- 0.45s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
total ------------------------------------------------------------------ 229.37s
This is only an example, there are a couple of other callbacks I would recommend to measure the performance of your Ansible code:
profile_tasks
to measure task performance in a roleprofile_roles
to measure the execution time of your rolestimer
to measure a whole playbooks execution timeFrom this information, you can then work towards improving it. Perhaps you have useless blocks, that needn't executing? Perhaps some tasks could be run in parallel? This will highly depend on your setup, but gathering the information is always the first part of an investigation.
I hope all these tips will help you improve the quality of your Ansible code, and help your team work better together! Most of these tips I found useful in my own experience, but since I am not all-knowing, there are probably different ways to improve your code as well. Do not hesitate to share them with us on Twitter or LinkedIn :)