Fixing Ansible Configuration Errors
Software system issues are parts and parcels of all software systems no matter how robust or efficient the system is. What this article will provide are basic troubleshooting tips for some common and frequently-occurring Ansible server configuration errors. Note that some error messages may not seem to point directly to the root cause—such errors may need a deeper investigation.
Overview
Ansible is a simple, yet effective open-source tool for deploying, managing, and combining multi-node software deployment. It also manages changes in the execution and configuration management of a system. Ansible is generally seen useful and some would even say that it’s better than other similar tools. However, the unexpected always happens and errors occur. Ansible configuration issues have been described below, first by identifying the root cause followed by troubleshooting tips to possibly address the problem. But before this section, let us have a look at some of the prerequisites of Ansible Configuration.
Prerequisites:
These are the prerequisites for running Ansible on your machine properly. Knowledge of these prerequisites is important for identifying solutions to the problems or issues, which might arise at any point of time.
- Note that Ansible can be installed in only one machine, and from this (also known as the Control Machine) it can manage multiple machines via the SSH protocol.
- The Control Machine needs to have Python 2.6 or 2.7 installed. As for the required OS, Debian, Red Hat, OS X, CentOS, or any of the Berkeley Software Distribution (BSD) are supported. Windows is not supported.
- Note that the Central Machine communicates with other machines via SSH. Usually, it uses
sftp
. Ifsftp
is not available, you can usescp
inAnsible.cfg
.
Time to review common errors in Ansible configuration!
Want to Learn Python? Get Python Training
Fixing Issues with Ansible Configuration
Issue 1
The system throws an error message related to space issues in a command when you are trying to copy a script to the Oracle production servers then add the script to chkconfig
(which means the system is auto-starting). The error message could look like the one below:
ERROR: Syntax Error while loading YAML script, ppili.yml
Note: The error may actually appear before this position: line 4, column 12
- name: Copying ppili script
copy: src=/ds1/scripts/ppili dest=/etc/init.d/ppili owner=root group=rootuser
Root cause: As can be seen from the error message, the root cause is the indentation of – name
.
Troubleshooting: You just need to address the indentation issue in the – hosts
section. You could do that by entering either one of the following commands in the command line:
- hosts: oracle.com
gather_facts: False
su: yes
su_user: rootuser
tasks:
- shell: russell
Or
ansible-playbook --su --su-user=root --ask-su-pass playbook.yml
Issue 2
The system fails to find the required Python modules. In such case, the error message could look something like this:
failed: [somehost] => {"failed": true, "parsed": false}
invalid output was: Error: ansible requires a json module, none found!
Root cause: The system was unable to locate the required Python modules.
Troubleshooting: You just need to install the Python module with the help of the Raw module. You could use the following command:
ansible -m raw -a "yum -y install python-simplejson"
Issue 3
Host file issues or machine shut down. The error message in such a case could look like the one below:
fatal: [websrvr01] => {'msg': 'FAILED: [Errno -2] Name or service not known', 'failed': True}
Root cause: You could have committed a typo in the host file or someone has shut the server down.
Troubleshooting: Verify if the server has been shut down and also check the DNS name of the host file.
Issue 4
Login issues because of incorrect SSH keys
Root cause: This is a common issue. You are either passing the wrong keys or the key that you are passing have not been added to the SSH agent, if you have been using one.
Troubleshooting: First find out the SSH keys that have been added to the SSH agent already. To do that, you could use the command $ ssh-add –l
and see an output like the one below:
2049 06:c9:5c:14:de:83:00:94:ec:15:e5:c9:4e:86:4f:a6 /Usersroot/speters/devroot/projectsroot/devops/ansible/keys/ansible (RSA)
2044 dd:3b:b8:2e:85:04:06:e9:ab:ff:a8:0a:c0:04:6e:d6 /Usersroot/speters/.vagrant.d/insecure_private_key (RSA)
Tip: If there are keys that you use frequently, think of making aliases for them and add them to a custom .ssh/config
so they are automatically known to Ansible.
Need help? Ask an Ansible Expert now
Issue 5
Login issues due to missing key.
Root cause: If a user wants to access the host with a key pair, the user’s public key must be available in the server so that the connection is authenticated. In the case of SSH connection, the key must be made available in the .ssh/authorized_keys
location. Any deviation from this practice will lead to an error.
Troubleshooting: To add the key to the .ssh/authorized_keys
location, use the following command:
cd ~user
cat newkey.pub >> .ssh/authorized_keys
Alternatively, you can use the command ssh-copy-id
from your local machine.
Note that adding keys manually can be a tedious and error-prone method. You should ideally aim to automate the task with the help of Ansible Role.
Issue 6
SSH agent is not running. It may be difficult to immediately identify the cause because Ansible will provide a generic failure message.
Root cause: Failing SSH agent may be the cause behind many generic error messages.
Troubleshooting: Verify that the SSH agent is running. To do that, use the following command:
export | grep SSH
SSH_AGENT_PID=14543
SSH_AUTH_SOCK=/tmp/ssh-U4z3bbdQJiqx/agent.14543
SSH_CLIENT='192.168.10.26 59808 11'
SSH_CONNECTION='192.169.10.26 59112 10.0.32.108 22'
SSH_TTY=/dev/pts/0
Issue 7
Unknown failures
Root cause: Ansible, for all its reputation of being a robust system, can also experience unknown, unidentified errors.
Troubleshooting: You can use the debug logging feature of Ansible, which is an extremely useful and effective way to find difficult-to-spot errors. When you use debug logging, it will show you the users and the scripts that are being executed.
Issue 8
Sudo failure
Root cause: You have changed the host name of the target host but have not changed the local host entry. Sudo looks up the host and tries to match the hostname with the entries done in the hosts. In case of a mismatch, you are going to get an error message.
Troubleshooting: In case you have recently changed the name of the host, verify first that after you have changed the host name of the target, you have also changed the localhost entry in /etc/hosts
.
Issue 9
When you are executing an Ansible playbook, the control machine throws an error that the Ansible for Junos Operating System module is not a legal parameter. The error message could look like the one below:
ERROR: junos_install_config is not a legal parameter in an Ansible task or handler
Root cause: The Ansible control machine is unable to find the Ansible for Junos OS modules.
Troubleshooting: First, you need to download Ansible for Junos OS modules from the Ansible website. To download, use the command ansible-galaxy install command, and specify Juniper.junos. The command could look the one below:
[root@ansible-cm]# ansible-galaxy install Juniper.junos
To enable the Playbook so that it can access and reference the installed modules, include the Juniper.junos role in the playbook play. The command could look something the one below:
---- name: Get Device Facts
hosts: hostname
roles:
- Juniper.junos
connection: local
Conclusion:
You need to note that the error messages may be ambiguous and misleading at times. What is stated in the error message may not always reflect what is wrong (although these messages are always good starting points). Even in the case of error messages described above, they may vary depending on a lot of factors. The good thing about cracking problems is that when you troubleshoot a number of issues, you gain the experience which you can use to fix other concerns. However, you will be able to prevent a lot of issues if:
- you keep the host name and the host entries synced,
- have the required version of Python installed; and
- have the SSH agent updated and coordinated.
These three steps ensure that a lot of possible Ansible configuration issues can be prevented right from the start.
Other tutorials you might be interested in:
- Deploying a Ruby Application with Ansible
- Wrapping a LAMP project into Vagrant with Ansible
- Automating Network Mastering Scenarios for Amazon VPC with Ansible
Author’s Bio:
Kaushik Pal has more than 16 years of experience as a technical architect and software consultant in enterprise application and product development. He has interest in new technology and innovation, along with technical writing. His main focus is web architecture, web technologies, Java/J2EE, Open source, big data, cloud, and mobile technologies.You can find more of his work at www.techalpine.com and you can email him at techalpineit@gmail.com or kaushikkpal@gmail.com