Ansible, DevOps

Ansible Slack Failure Handler

So you want a simple Slack failure handler for Ansible to ping your alerts channel whenever a deployment fails. Despite a thorough search I couldn’t find any examples that did this adequately, and even the solution here is a little constrained. The basic requirements are:

  1. A Slack notification on any task failure.
  2. The name of the task.
  3. The name of the host.
  4. The error debug message.

Sadly there isn’t one global failure handler configuration in Ansible. I investigated the native Slack Callback plugin available in Ansible 2.x (and not to be confused with the existing Slack module) but this seemed to be more of a generic passthrough for outputting all plays into Slack, failed or not, which isn’t what I wanted. After discarding the idea of a custom callback plugin for my purposes (which could work, but felt overly complex), I settled on a per-playbook failure role.

This is not an actual ‘handler’ as per Ansible parlance, but I was coming from experience of Chef where I could have a global failure handler baked into the Chef-client config.

To make this work we need to use Playbook Blocks (as of 2.0) and essentially enclose the entire playbook in a block/rescue. The main hassle here is that block does not support the top-level pre-task role block (and wouldn’t catch any failures therein), and so I had to convert all of my role calls to tasks that used include_role instead.

A simple playbook example looks as follows:

playbooks/playbook.yml

- hosts: "{{ target_host | default('127.0.0.1') }}"
  gather_facts: true

  tasks:
  - block:
    - include_role:
        name: install_app
    - name: Greet the world
      shell: echo "hello world!"
    - fail:
       msg: "I've gone and failed the play!"
    rescue:
      - include_role:
          name: slack_handler
          tasks_from: failure

And in my slack_handler role (for reusability):

roles/slack_handler/tasks/failure.yml

- name: Notify Slack of Playbook Failure
  slack:
    username: 'Ansible'
    color: danger
    token: "{{ slack_webhook.split('https://hooks.slack.com/services/')[1] }}"
    channel: "#deployment-alerts"
    msg: "Ansible failed on *{{ ansible_hostname }} ({{ inventory_hostname }})* \n
    *Task*: {{ ansible_failed_task.name }} \n
    *Action*: {{ ansible_failed_task.action }} \n
    *Error Message*: \n ```{{ ansible_failed_result | to_nice_json }}``` "
  delegate_to: localhost

ansible_failed_task and ansible_failed_result are two currently painfully undocumented (shout-out to Brian Coca for pointing me in the right direction) but delightfully detailed variables that are populated on playbook failure. ansible_failed_task is a map that contains a lot of data, so you may want to add additional debug for your purposes. The raw error message is single-line JSON so we prettify it before sending it using the Slack module using our token. The string split is a bit of an ugly hack to extract the webhook token part from the full webhook URL which is used elsewhere in my plays and passed at deploy time. For some reason the Slack module requires only the token rather than the full URL, in contrast to a lot of other integrations that want the whole thing.

The remaining hassle is the need to implement this per playbook, but aside from the possibility of a custom callback plugin this seems like the simplest way to implement this in Ansible currently.