Jun 152012

Nagios and Icinga – which for all intents and purposes are the same thing – are wonderful tools. They hold to the Linux tenet of every object as a file, and therefore everything in Icinga is readable text. However one slight flaw is that when things do not behave as expected, it can often be baffling to debug. Specifically, there is no way to verbosely log what commands are actually doing when they’re running tests – and therefore why they’re failing. Nagios and Icinga use various configuration files to generalise execution, but don’t verbosely log what exactly gets run when the checks execute. But there is a wonderful script by Wolfgang Wagner called capture_plugin.pl that admirably solves this problem.

First, visit the website and download the script. Full instructions are provided on the website, but briefly, to use the script, do the following.

Download and install the capture_plugin.pl script to your chosen directory for these kinds of things. The directory /usr/local/bin is suitable.

In the configuration file for the command you want to debug, insert a reference to the capture_plugin.pl Perl script at the beginning of the command_line entry:

command_line     /usr/local/bin/capture_plugin.pl   $USER1$/check_tcp .....

Essentially, what this does is to proxy the check command through the Perl script and capture output. Checking the Perl script source, the default log file is:

my $LOG_FILE = "/tmp/captured-plugins.log";

To complete the change, restart Icinga (or Nagios):

  # service icinga restart

Example Nagios Debugging

As an example, I’ve set up on my Icinga host (monhost) a JMX (for Tomcat) monitoring plugin called check_jmx4perl. This plugin runs on the Icinga host and periodically polls an agent webapp hosted on my Tomcat server. So it’s attempting to connect to a webapp here:
http://client2.example.com:8080/jolokia, but in the Icinga Web GUI, I’m only getting “UNKNOWN” from the plugin.

Checking the icinga logs in /var/log/messages:

Jun 19 00:00:00 monhost icinga: CURRENT SERVICE STATE: 
client2;JVM Thread Count;UNKNOWN;HARD;1;UNKNOWN - 
Error: 500 Error while fetching http://client2.example.com:8080/jolokia/read/java.lang%3Atype%3DThreading/ThreadCount :

Which, let’s face it, isn’t very helpful at all. It shows the plugin output, but Icinga doesn’t report anywhere the full command and its arguments that were executed.

So this time, I’ll reconfigure my definition for the check_jmx command in the Nagios configuration file to now run the capture_plugin, like this:

define command {
 command_name  check_jmx
 command_line  /usr/local/bin/capture_plugin.pl \
    $USER1$/check_jmx4perl -u http://$HOSTADDRESS$:8080/jolokia -m $ARG1$ -a $ARG2$ -p $ARG3$ -w $ARG4$ -c $ARG5$

Restart Icinga and check the capture_plugin.pl log file, /tmp/captured-plugins.log, where I find this:

 2012-5-21 16:3:31 ------ debugging
cmd=[/usr/lib64/nagios/plugins/check_jmx4perl '-u' 'http://client2.example.com:8080/jolokia' '-m' 'java.lang:type=Threading' '-a' 'ThreadCount' '-p' '' '-w' '70' '-c' '80']
output=[UNKNOWN - Error: 500 Error while fetching http://perfwap04.popcorn.bskyb.com:8080/jolokia/read/java.lang%3Atype%3DThreading/ThreadCount :

500 Can't connect to client2.example.com:8080 (connect: timeout)

So this time, I can actually see what parameters Icinga is passing to the plugin script itself. This means I can then run it myself from the command line, tweak the parameters, and work out what’s going wrong where.

So this isn’t a complete answer to fixing problems with Icinga and Nagios, but capture_plugin.pl is a very useful tool which can rapidly speed up the debugging of monitoring issues.

Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

  One Response to “Icinga and Nagios Command Debugging”

  1. Hi Matt,

    Thanks for the write-up. Any idea why (using icinga 1.1.16 on ubuntu12.04) the plugin returns 13 when invoked from within icinga, and doesn’t log anything to /tmp, but works fine from the command-line? 🙂

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>