debug

Nov 032012
 
Sometimes – whether it be a new job or an inherited system or a colleague who’s taken time off to conceive a baby – you just don’t know what the heck is going on. There’s an application running with an unfamiliar name, there’s no man page, and the font of knowledge that is Google can only spit out cryptic snatches of email from mail-archive.com. I’m going to offer a few techniques and tips to illustrate how I gather information about unfamiliar software, its files, ports and command line arguments. It’s starts off simple and gets a little more interesting towards the end.

For the purpose of this post, I’m going to pretend I don’t know what Mongo DB is. It’s a NoSQL database, very good at horizontal scaling. But let’s pretend. I’m selecting it as an example because it’s not too untidy, and helps me make my point.

Hyopthetical Situation

You’re in the deep end at a new job, you’re on your own, there’s no documentation, and the monitoring shows that a host is running at high load. You’ve never come across this program before, and have no idea where to start. First thing to do, top:

Good ol' top

We can see that the mongod process is running pretty hot. We’re not concerned at this point why it’s having problems. I just want to know what it is, what it does, and where its files are.

Process listing

I’ll run ps with a full listing to find out what its command line arguments look like, and if it’s been invoked with a full path, which may indicate where it’s installed.

This show that the executable resides at /usr/bin/mongod and that it takes a configuration file /etc/mongod.conf. So not too perplexing.

Search on pattern

But let’s assume instead that we couldn’t obtain the full path of the mongod process. The next thing I like to do is just a big find command on the root directory and see what turns up, using “*mongo*” as our search pattern:

That’s actually pretty comprehensive (I’m liking mongo more and more) and logical. We can see that the log files are in the right place, and there are init scripts. But if these executables and logs weren’t named *mongo*, then this wouldn’t have been so straightforward.

Package Listing

So a better way to get the list of files associated with an application is to find out which package the /usr/bin/mongod comes from. On RPM based systems, like this:

Or on Debian-based systems, you’d use:

   # dpkg -S /usr/bin/mongod

Turns out the software is called mongo-10gen-server. Let’s query the contents of that package (and, in a slight jump, a related package) to find out what other files are installed out of this box:

This is the official list of files that were created at installation time.

At this point, it would probably be worth perusing the configuration file /etc/mongod.conf and the log file /var/log/mongod.log for clues and comments about what the program does, and how.

List all open filehandles

The tool lsof is fantastic for seeing everything a process has open – files, sockets and pipes. Executing ps, we can see that the process ID of mongod is 6569, so we invoke lsof with the PID as an argument:

Again, we can see that mongod is holding open its logfile, /var/log/mongod.log. Also, the two lines with LISTEN show that the process is listening on TCP ports 27017 and 28017. This information could also have been obtained by typing:

   # netstat -nlp

Application Network Traffic

We know from lsof and netstat that the process is listening on port 27017 and 28017. Let’s just take port 27017 and see if there’s any traffic coming in on this, and from where.

Bearing in mind that my testbox has the IP address of 10.243.52.51, we can see that incoming traffic is emanating from 10.243.24.69. I’d be logging in to that host to find out more about it and what it thinks it’s doing talking to this mongo thing.

Process internals

So we’ve got this far, and we know how the process is invoked and where it writes to. I’ve often found that the tricky part about lsof is that it only show files that are held open. If a configuration file is read once on startup, then it won’t show up in lsof. It can be handy to know where a program’s inputs are coming from. Attaching the strace program to a process when it starts up can reveal all sorts of information. In this example, yes it’s obvious that mongod has had the /etc/mongod.conf configuration file passed on the command line. But the point is that even if it hadn’t, strace would reveal that the file had been opened.

There are many options that can be passed to strace, but “-e open” narrows it to filehandles being opened only, which is a bit more manageable. By running it with the “-f” option, it will also drill down to any forked processes.

Reading embedded text in the application binary

Here’s one more trick that I like to use when I’m desperate. If you’ve got no manpage, a feeble “–help” and no “Usage”, then sometimes this may be of assistance. Run the strings command against the application binary, use a grep and a less for practicality, and see if you can extract anything useful – comments, expected arguments, anything:

In the mongod case, we get a few command line options (–replSet, etc). If I was really trying to ascertain how to use a program, some of these may be helpful. Again, not the best example, but it’s sometimes worth a try.

Of course, it goes without saying that you should try the man page, although for Mongo DB I only get this:

But you’ll find that the man pages for lsof, strace, tcpdump and find are extremely comprehensive and packed with great examples.

So feel free to share a few of your favorite debugging tips in the Comments. Bob knows, I could use them.
[flattr uid=’matthewparsons’ /]


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Jun 152012
 

Nagios and Icinga – which for all intents and purposes are the same thing – are wonderful tools. They hold to the Linux tenet of every object as a file, and therefore everything in Icinga is readable text. However one slight flaw is that when things do not behave as expected, it can often be baffling to debug. Specifically, there is no way to verbosely log what commands are actually doing when they’re running tests – and therefore why they’re failing. Nagios and Icinga use various configuration files to generalise execution, but don’t verbosely log what exactly gets run when the checks execute. But there is a wonderful script by Wolfgang Wagner called capture_plugin.pl that admirably solves this problem.

First, visit the website and download the script. Full instructions are provided on the website, but briefly, to use the script, do the following.

Download and install the capture_plugin.pl script to your chosen directory for these kinds of things. The directory /usr/local/bin is suitable.

In the configuration file for the command you want to debug, insert a reference to the capture_plugin.pl Perl script at the beginning of the command_line entry:

command_line     /usr/local/bin/capture_plugin.pl   $USER1$/check_tcp .....

Essentially, what this does is to proxy the check command through the Perl script and capture output. Checking the Perl script source, the default log file is:

my $LOG_FILE = "/tmp/captured-plugins.log";

To complete the change, restart Icinga (or Nagios):

  # service icinga restart

Example Nagios Debugging

As an example, I’ve set up on my Icinga host (monhost) a JMX (for Tomcat) monitoring plugin called check_jmx4perl. This plugin runs on the Icinga host and periodically polls an agent webapp hosted on my Tomcat server. So it’s attempting to connect to a webapp here:
http://client2.example.com:8080/jolokia, but in the Icinga Web GUI, I’m only getting “UNKNOWN” from the plugin.

Checking the icinga logs in /var/log/messages:

Jun 19 00:00:00 monhost icinga: CURRENT SERVICE STATE: 
client2;JVM Thread Count;UNKNOWN;HARD;1;UNKNOWN - 
Error: 500 Error while fetching http://client2.example.com:8080/jolokia/read/java.lang%3Atype%3DThreading/ThreadCount :

Which, let’s face it, isn’t very helpful at all. It shows the plugin output, but Icinga doesn’t report anywhere the full command and its arguments that were executed.

So this time, I’ll reconfigure my definition for the check_jmx command in the Nagios configuration file to now run the capture_plugin, like this:

define command {
 command_name  check_jmx
 command_line  /usr/local/bin/capture_plugin.pl \
    $USER1$/check_jmx4perl -u http://$HOSTADDRESS$:8080/jolokia -m $ARG1$ -a $ARG2$ -p $ARG3$ -w $ARG4$ -c $ARG5$
}

Restart Icinga and check the capture_plugin.pl log file, /tmp/captured-plugins.log, where I find this:

-------
 2012-5-21 16:3:31 ------ debugging
cmd=[/usr/lib64/nagios/plugins/check_jmx4perl '-u' 'http://client2.example.com:8080/jolokia' '-m' 'java.lang:type=Threading' '-a' 'ThreadCount' '-p' '' '-w' '70' '-c' '80']
output=[UNKNOWN - Error: 500 Error while fetching http://perfwap04.popcorn.bskyb.com:8080/jolokia/read/java.lang%3Atype%3DThreading/ThreadCount :

500 Can't connect to client2.example.com:8080 (connect: timeout)
]
retcode=3
-------

So this time, I can actually see what parameters Icinga is passing to the plugin script itself. This means I can then run it myself from the command line, tweak the parameters, and work out what’s going wrong where.

So this isn’t a complete answer to fixing problems with Icinga and Nagios, but capture_plugin.pl is a very useful tool which can rapidly speed up the debugging of monitoring issues.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

May 212012
 

When you start operating Apache HTTPD webserver with a lot of virtual hosts, the configuration can become quite unwieldy and difficult to debug. It can become an unfortunate process of trial and error to work out whether all of your virtual hosts are up and running, and which one is the default (it’s the first one alphabetically, unless specified). In fact, working out the name of your default virtual host is often the key to solving many Apache problems.

But there is a rather nice and quick debugging trick for getting Apache to just dump out a list of all its running virtual hosts. The command line is simply this:

  # httpd -D DUMP_VHOSTS

Which outputs:

VirtualHost configuration:
10.241.53.10:443       secure.website.com (/etc/httpd/conf.d/secure_website.conf:4)
wildcard NameVirtualHosts and _default_ servers:
*:80                   is a NameVirtualHost
      default server alpha.website.example.com (/etc/httpd/conf.d/website.conf:1)
      port 80 namevhost beta.website.example.com (/etc/httpd/conf.d/website.conf:1)
      port 80 namevhost gamma.website.example.com (/etc/httpd/conf.d/website.conf:10)
      port 80 namevhost delta.website.example.com (/etc/httpd/conf.d/website.conf:19)
      port 80 namevhost website.example.com (/etc/httpd/conf.d/website.conf:29)

Hurrah! That’s a list of the URLs of all virtual hosts, and the conf.d files in which they’re configured.

And better yet, if you run this command with the “-t” switch, it will instead parse the configuration files, and not the running configuration:

  # httpd -t -D DUMP_VHOSTS

So you can test your configuration (note that it also performs a syntax check) before loading it.

Finally, when you’re satisfied that your changes are correct, reload the configuration:

  # apachectl configtest
  # apachectl restart

Or, on Fedora:

  # apachectl configtest
  # service httpd reload

Special bonus Apache debug information!
If you run httpd on the command line with the “-L” switch, you get a list of all recognised directives, their descriptions, and the contexts in which they are allowed. It’s extremely handy to have this information to hand, and to not have to revert to the manual every time.

  # httpd -L
<Directory (core.c)
  Container for directives affecting resources located in the specified directories
  Allowed in *.conf only outside <Directory>, <Files> or <Location>
<Location (core.c)
  Container for directives affecting resources accessed through the specified URL paths
  Allowed in *.conf only outside <Directory>, <Files> or <Location>
<VirtualHost (core.c)
  Container to map directives to a particular virtual host, takes one or more host addresses
  Allowed in *.conf only outside <Directory>, <Files> or <Location>
....


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

May 142012
 

The telnet service is never used these days as a means of accessing a host shell – it’s too insecure – but the telnet client is, for many sysadmins, still the tool of choice for testing whether a host has connectivity with another host network service. It’s a good reliable and general tool, and the syntax is dead simple. So for example, if you wanted to just test whether you could hit the Tomcat port (8080) on host “davros”, you’d:

   # telnet davros 8080

But telnet isn’t really the “right” tool for this job. By default it opens a TCP connection to a nominated port, but this flexible port argument is more of a side effect of its function as an interactive console, and not its real function. So from now on, you should be using netcat – which is executed as nc.

Netcat

Netcat is a Swiss army knife for network connections, and as well as being able to initiate outgoing calls to TCP and UDP ports, it can be made to function as a listener to read incoming connections. It can be used to construct a basic proxy. Most importantly, because netcat is non-interactive (unlike telnet) it can be easily used in scripts. The connection will be terminated gracefully, rather than left rudely cut off from the prompt with telnet’s “^]”.

Netcat is installed by default on several Linux distributions and is easily obtainable from the standard repositories on others.

Working Example

Here’s a few basic examples which show how it’s used to test network connectivity. Note that extent of the tests is just to check whether the port is listening, and not the nature of the daemon or whether it’s working. In every case, I’ll assume once again that our listening host is called “davros”.

Test that the Tomcat port (tcp/8080) is listening and accessible

   # nc -v davros 8080
Connection to davros 8080 port [tcp/http-alt] succeeded!

The man page has a lot of other good examples which are worth trying out. Netcat is a very versatile, yet very basic command, admirabl suited to creating TCP and UDP sockets.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.