Jun 152012
 

Nagios and Icinga – which for all intents and purposes are the same thing – are wonderful tools. They hold to the Linux tenet of every object as a file, and therefore everything in Icinga is readable text. However one slight flaw is that when things do not behave as expected, it can often be baffling to debug. Specifically, there is no way to verbosely log what commands are actually doing when they’re running tests – and therefore why they’re failing. Nagios and Icinga use various configuration files to generalise execution, but don’t verbosely log what exactly gets run when the checks execute. But there is a wonderful script by Wolfgang Wagner called capture_plugin.pl that admirably solves this problem.

First, visit the website and download the script. Full instructions are provided on the website, but briefly, to use the script, do the following.

Download and install the capture_plugin.pl script to your chosen directory for these kinds of things. The directory /usr/local/bin is suitable.

In the configuration file for the command you want to debug, insert a reference to the capture_plugin.pl Perl script at the beginning of the command_line entry:

command_line     /usr/local/bin/capture_plugin.pl   $USER1$/check_tcp .....

Essentially, what this does is to proxy the check command through the Perl script and capture output. Checking the Perl script source, the default log file is:

my $LOG_FILE = "/tmp/captured-plugins.log";

To complete the change, restart Icinga (or Nagios):

  # service icinga restart

Example Nagios Debugging

As an example, I’ve set up on my Icinga host (monhost) a JMX (for Tomcat) monitoring plugin called check_jmx4perl. This plugin runs on the Icinga host and periodically polls an agent webapp hosted on my Tomcat server. So it’s attempting to connect to a webapp here:
http://client2.example.com:8080/jolokia, but in the Icinga Web GUI, I’m only getting “UNKNOWN” from the plugin.

Checking the icinga logs in /var/log/messages:

Jun 19 00:00:00 monhost icinga: CURRENT SERVICE STATE: 
client2;JVM Thread Count;UNKNOWN;HARD;1;UNKNOWN - 
Error: 500 Error while fetching http://client2.example.com:8080/jolokia/read/java.lang%3Atype%3DThreading/ThreadCount :

Which, let’s face it, isn’t very helpful at all. It shows the plugin output, but Icinga doesn’t report anywhere the full command and its arguments that were executed.

So this time, I’ll reconfigure my definition for the check_jmx command in the Nagios configuration file to now run the capture_plugin, like this:

define command {
 command_name  check_jmx
 command_line  /usr/local/bin/capture_plugin.pl \
    $USER1$/check_jmx4perl -u http://$HOSTADDRESS$:8080/jolokia -m $ARG1$ -a $ARG2$ -p $ARG3$ -w $ARG4$ -c $ARG5$
}

Restart Icinga and check the capture_plugin.pl log file, /tmp/captured-plugins.log, where I find this:

-------
 2012-5-21 16:3:31 ------ debugging
cmd=[/usr/lib64/nagios/plugins/check_jmx4perl '-u' 'http://client2.example.com:8080/jolokia' '-m' 'java.lang:type=Threading' '-a' 'ThreadCount' '-p' '' '-w' '70' '-c' '80']
output=[UNKNOWN - Error: 500 Error while fetching http://perfwap04.popcorn.bskyb.com:8080/jolokia/read/java.lang%3Atype%3DThreading/ThreadCount :

500 Can't connect to client2.example.com:8080 (connect: timeout)
]
retcode=3
-------

So this time, I can actually see what parameters Icinga is passing to the plugin script itself. This means I can then run it myself from the command line, tweak the parameters, and work out what’s going wrong where.

So this isn’t a complete answer to fixing problems with Icinga and Nagios, but capture_plugin.pl is a very useful tool which can rapidly speed up the debugging of monitoring issues.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Jun 112012
 

As a sysadmin, I like to get things done as quickly as possible – that way I can start doing more things. I hate waiting for something to occur, particularly when this means having to constantly context-switch my attention back to the host, checking and rechecking a file or command.

An example of hits is the time I was waiting for a DNS change to propagate downwards to a local server. I got to thinking, ‘why should I constantly type “dig” every 10 minutes? This computer should do the work for me’.

Being inclined to do as little as possible, I wanted an easy single-line shell command – not an entire script. Here’s what I came up with; a piece of bash that would regularly run a command until a given outcome, and then exit and email me.

In this example, I wanted to be emailed when the IP address of www.example.com became “10.10.10.10” in DNS (obviously these examples have been sanitised of all real world corporate information).

Here it is in one line (although separated with line breaks to show the logical components). It’s also nohup’ed for resilience.

# nohup until \
  dig www.example.com|grep -A1 "ANSWER SECTION"|tail -1|grep 10.10.10.10; \
  do sleep 300; done | mail -s "IP Changed" matt@email-address.com &

So this will simply query the IP information with dig, find the line in the ANSWER SECTION and compare it to the expected new IP address. If it exists, the until loop will exit and email me. If the pattern isn’t returned, the loop will sleep for five minutes (300 seconds) before running again.

I setted and forgetted this script and went about my other tasks. About 12 hours later I got an email saying that the IP address had changed. I didn’t have to make any periodic checks.

This sort of thing is a great technique to get your head around. Utilising tricks like this can hugely increase your productivity by taking the drudgery out of your hands and allow you to do get on to more important thinking-heavy jobs.

Any improvements to the command, particularly in how to make it shorter? If so, I’d love to hear about it in the comments.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Jun 082012
 

When chaining together shell commands with pipe ( | ), it’s easy to just turn it into a stream of consciousness. “This” produces “that” which is the input for the next command. This is fine, but it does often produce bad habits which can lead to sloppiness when these idioms are repeated in scripts.

A good example is when you want to find a string in a file, and then print out a single field of that line. Say, you want to find the userid for a given username in the password file.

The stream of consciousness method leads one to think, “grep will give the line that contains the pattern, and then awk will format the result by stripping out the field I need”. So if we want to find the ID of user “ptolemy”, where the line in our password file looks like this:

ptolemy:x:497:1001:icinga:/home/ptolemy:/bin/bash

we might type this:

  # grep ptolemy /etc/passwd | awk -F: '{print $3}'   
(where -F specifies the field separator, a colon in the /etc/passwd)

But this isn’t really correct. If the string “ptolemy” matches somewhere else as well, this will be ambiguous. Moreover, this is just inelegant.

The reason is, AWK already can do pattern matching, and much better than grep can. So you could do this:

  # awk -F: /ptolemy/'{print $3}' /etc/passwd

Or better yet, AWK can specify its string comparison to an individual field – rather than the whole line. So a much better command would be this:

  # awk -F: '$1=="ptolemy" {print $3}' /etc/passwd

Which will precisely return the third password field if and only if the first field of the password file is “ptolemy”. And it uses only one command, rather than two.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Jun 072012
 

From the command line, what’s an easy way to test that a web page is up and displaying the content you expect? It could be that you need to write a monitoring script to check a remote site, or you just want to quickly ascertain whether you can access the internet from the command line, from wherever you’re logged in. wget is a great little tool for the job.

The wget tool is a wonderful Swiss-army knife for command line web browsing. It’s frequently used for recursively grabbing entire sites and their contents, but it can also be used as a very lightweight means of querying the web.

The -O switch indicates an output filename, but adding “-” to this will just send the contents of the page to the standard output. From here, you can use a pipe and redirect wget into your favourite pattern matching utility.

 # wget -O - http://www.example.com/statuspage.html | grep OK

Where “OK” could be any text pattern known to appear on the web page.

Or, if it’s an HTTPS page, ignore the certificate so you don’t get a pesky error:

 # wget -O - --no-check-certficiate - https://secure.example.com/status.html | grep OK

The extensive man page discusses further uses for this wonderful utility. Enjoy the read.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.