Apr 012014
 

Capistrano is an invaluable automation tool, but simultaneously restarting services and hosts can play havoc with your monitoring and alerting. It’s therefore also a good idea to use Capistrano to control your monitoring. In this post I’m going to show how I do it with my Icinga installation.

The way scheduled downtime works in Icinga (these instructions apply to Nagios as well) is described here. Briefly, when downtime is scheduled for a host and/or services, then no alert notifications are sent out during the defined period. Downtime gets scheduled with a start time, an end time, and a duration, and may be either fixed or flexible (the difference is, flexible downtime will start from the moment when a host or service goes down inside of the scheduled period).

Icinga offers a REST API to control it from the command line, but to be honest, it’s a little tricky to use and I can’t work out how to schedule downtime with it, rather than to just disable notifications. So instead, I find the command-file pipe a simpler solution – it requires access to the monitoring server itself, but that’s not really a problem.

The Icinga command-file pipe is a special file known as a FIFO that acts as a pipe into a process, in this case, the daemon itself. Therefore, whatever gets written to this file gets funnelled straight into the Icinga process. It’s location can be found by checking the “command_file” option in the /etc/icinga/icinga.cfg main config file. For example:

/etc/icinga/icinga.cfg:

command_file=/var/spool/icinga/cmd/icinga.cmd

This file is owned by the icinga user, so security is controlled by permissions in that only the icinga user or a member of the icingacmd group can write to this file. For this reason, you’ll need to update /etc/sudoers to permit your Capistrano user to write to the command file as the icinga user. Something like this, using the “tee” command:

/etc/sudoers

ALL=(icinga) NOPASSWD: /usr/bin/tee -a /var/spool/icinga/cmd/icinga.cmd

Then, all that one needs to do is send a formatted string to the command pipe and Icinga will execute the instruction accordingly. The Icinga and Nagios documentation describes the order of each semicolon-separated field.

Setting up a full Capistrano project is beyond the scope of this post, but assuming you have one, the files below will get things working.

This is a helper method to simply execute an Icinga command, for a particular host and taking one parameter, for the number of minutes of downtime.

capistrano/recipes/helpers/icinga_commands.rb:

def icinga_cli_cmd ( icingaCommand, hostName, minutes )
   run <<-CMD
      CMDFILE="/var/spool/icinga/rw/icinga.cmd";
      MINUTES=`expr #{minutes} * 60`
      NOW=`date +%s`;
      STOP=`expr $NOW + $MINUTES`;
      DURATION=$MINUTES;
      AUTHOR="Capistrano";
      COMMENT="Automation";
      CMDLINE="[$NOW] #{icingaCommand};#{hostName};$NOW;$THEN;1;0;$DURATION;$AUTHOR;$COMMENT";
      echo $CMDLINE | sudo -u icinga /usr/bin/tee -a $CMDFILE
   CMD
end

This is the Capistrano recipe to scheduled downtime for a host and all its services.

capistrano/recipes/icinga.rb:

namespace "icinga" do
  task :config do
    close_sessions
    top.load(:string => "set :user, 'deploy'")
  end

  desc "Schedule Icinga Downtime of host and all services"
  task :downtime_host_svc, :roles => :monitor do
    config
    icinga_cli_cmd "SCHEDULE_HOST_SVC_DOWNTIME", "#{hostName}", "#{period}"
  end
end

And within the host definitions themselves, you'll need something like this:

capistrano/deploy/development.rb:

role :monitor, "monitor.example.com"

Your particular setup may involve some tweaking, and this is just an example that can easily be extended to better control Icinga. To invoke this Capistrano recipe from the command line, scheduling 7 minutes of downtime on the "backup" host, execute this command:

$ cap development icinga:downtime_host_svc -s hostName=backup -s period=7

This will send a string similar to this one to the Icinga daemon, and this should also be reflected in the icinga.log file.

[1396341638] SCHEDULE_HOST_SVC_DOWNTIME;backup;1396341638;;1;0;;Capistrano;Automation

Check the effect by viewing the log file and checking the host display in the Icinga web UI.

I use this recipe in a Jenkins job that restores a database snapshot. It means that while the database is being restarted I can suppress warnings. What's the point of being notified of what you already know?


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>