shell

Nov 032012
 
Sometimes – whether it be a new job or an inherited system or a colleague who’s taken time off to conceive a baby – you just don’t know what the heck is going on. There’s an application running with an unfamiliar name, there’s no man page, and the font of knowledge that is Google can only spit out cryptic snatches of email from mail-archive.com. I’m going to offer a few techniques and tips to illustrate how I gather information about unfamiliar software, its files, ports and command line arguments. It’s starts off simple and gets a little more interesting towards the end.

For the purpose of this post, I’m going to pretend I don’t know what Mongo DB is. It’s a NoSQL database, very good at horizontal scaling. But let’s pretend. I’m selecting it as an example because it’s not too untidy, and helps me make my point.

Hyopthetical Situation

You’re in the deep end at a new job, you’re on your own, there’s no documentation, and the monitoring shows that a host is running at high load. You’ve never come across this program before, and have no idea where to start. First thing to do, top:

Good ol' top

We can see that the mongod process is running pretty hot. We’re not concerned at this point why it’s having problems. I just want to know what it is, what it does, and where its files are.

Process listing

I’ll run ps with a full listing to find out what its command line arguments look like, and if it’s been invoked with a full path, which may indicate where it’s installed.

This show that the executable resides at /usr/bin/mongod and that it takes a configuration file /etc/mongod.conf. So not too perplexing.

Search on pattern

But let’s assume instead that we couldn’t obtain the full path of the mongod process. The next thing I like to do is just a big find command on the root directory and see what turns up, using “*mongo*” as our search pattern:

That’s actually pretty comprehensive (I’m liking mongo more and more) and logical. We can see that the log files are in the right place, and there are init scripts. But if these executables and logs weren’t named *mongo*, then this wouldn’t have been so straightforward.

Package Listing

So a better way to get the list of files associated with an application is to find out which package the /usr/bin/mongod comes from. On RPM based systems, like this:

Or on Debian-based systems, you’d use:

   # dpkg -S /usr/bin/mongod

Turns out the software is called mongo-10gen-server. Let’s query the contents of that package (and, in a slight jump, a related package) to find out what other files are installed out of this box:

This is the official list of files that were created at installation time.

At this point, it would probably be worth perusing the configuration file /etc/mongod.conf and the log file /var/log/mongod.log for clues and comments about what the program does, and how.

List all open filehandles

The tool lsof is fantastic for seeing everything a process has open – files, sockets and pipes. Executing ps, we can see that the process ID of mongod is 6569, so we invoke lsof with the PID as an argument:

Again, we can see that mongod is holding open its logfile, /var/log/mongod.log. Also, the two lines with LISTEN show that the process is listening on TCP ports 27017 and 28017. This information could also have been obtained by typing:

   # netstat -nlp

Application Network Traffic

We know from lsof and netstat that the process is listening on port 27017 and 28017. Let’s just take port 27017 and see if there’s any traffic coming in on this, and from where.

Bearing in mind that my testbox has the IP address of 10.243.52.51, we can see that incoming traffic is emanating from 10.243.24.69. I’d be logging in to that host to find out more about it and what it thinks it’s doing talking to this mongo thing.

Process internals

So we’ve got this far, and we know how the process is invoked and where it writes to. I’ve often found that the tricky part about lsof is that it only show files that are held open. If a configuration file is read once on startup, then it won’t show up in lsof. It can be handy to know where a program’s inputs are coming from. Attaching the strace program to a process when it starts up can reveal all sorts of information. In this example, yes it’s obvious that mongod has had the /etc/mongod.conf configuration file passed on the command line. But the point is that even if it hadn’t, strace would reveal that the file had been opened.

There are many options that can be passed to strace, but “-e open” narrows it to filehandles being opened only, which is a bit more manageable. By running it with the “-f” option, it will also drill down to any forked processes.

Reading embedded text in the application binary

Here’s one more trick that I like to use when I’m desperate. If you’ve got no manpage, a feeble “–help” and no “Usage”, then sometimes this may be of assistance. Run the strings command against the application binary, use a grep and a less for practicality, and see if you can extract anything useful – comments, expected arguments, anything:

In the mongod case, we get a few command line options (–replSet, etc). If I was really trying to ascertain how to use a program, some of these may be helpful. Again, not the best example, but it’s sometimes worth a try.

Of course, it goes without saying that you should try the man page, although for Mongo DB I only get this:

But you’ll find that the man pages for lsof, strace, tcpdump and find are extremely comprehensive and packed with great examples.

So feel free to share a few of your favorite debugging tips in the Comments. Bob knows, I could use them.
[flattr uid=’matthewparsons’ /]


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Oct 262012
 
In the spirit of ad hoc, sloppy hackery, here’s a trick I’ve used in the past for providing myself with an escape route when performing risky remote maintenance without a console. There’s nothing worse than irredeemably disconnecting your remote shell and causing yourself the catastrophe of a visit to the data centre, when you should be in the pub. Here’s a little way of taking out some insurance against the your fat fingers, using the “at” daemon.

Broken ethernet cable

Caveat: this is not best practice. The whole reason for adhering to sound strict principles of design is to avoid performing risky procedures which might require this kind of mitigation. Still, it doesn’t hurt to know.

The basic principle of this contrivance is to use the “at” daemon to run a back-out script as a kind of “dead man’s handle”. That is, if your hands come off the controls, this failsafe kicks in and puts things back the way they were – not careening out of control around a sharp bend.

The “at” command

The command at is a relatively unused little tool, despite having been a part of Linux since the beginning, probably. Its more organised uncle, cron, is the one everyone’s familiar with for automating recurring tasks, ad infinitum. But at is for executing one-offs, and so isn’t really an automation tool as such, since you still have to actually type whatever you want it to execute. But it adds a latency to the actual execution of that command and does it later on, when it’s more timely to do so. So it doesn’t save you any work, it just does it later so you don’t have to. You’re down the pub.
Continue reading »


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

May 142012
 

The telnet service is never used these days as a means of accessing a host shell – it’s too insecure – but the telnet client is, for many sysadmins, still the tool of choice for testing whether a host has connectivity with another host network service. It’s a good reliable and general tool, and the syntax is dead simple. So for example, if you wanted to just test whether you could hit the Tomcat port (8080) on host “davros”, you’d:

   # telnet davros 8080

But telnet isn’t really the “right” tool for this job. By default it opens a TCP connection to a nominated port, but this flexible port argument is more of a side effect of its function as an interactive console, and not its real function. So from now on, you should be using netcat – which is executed as nc.

Netcat

Netcat is a Swiss army knife for network connections, and as well as being able to initiate outgoing calls to TCP and UDP ports, it can be made to function as a listener to read incoming connections. It can be used to construct a basic proxy. Most importantly, because netcat is non-interactive (unlike telnet) it can be easily used in scripts. The connection will be terminated gracefully, rather than left rudely cut off from the prompt with telnet’s “^]”.

Netcat is installed by default on several Linux distributions and is easily obtainable from the standard repositories on others.

Working Example

Here’s a few basic examples which show how it’s used to test network connectivity. Note that extent of the tests is just to check whether the port is listening, and not the nature of the daemon or whether it’s working. In every case, I’ll assume once again that our listening host is called “davros”.

Test that the Tomcat port (tcp/8080) is listening and accessible

   # nc -v davros 8080
Connection to davros 8080 port [tcp/http-alt] succeeded!

The man page has a lot of other good examples which are worth trying out. Netcat is a very versatile, yet very basic command, admirabl suited to creating TCP and UDP sockets.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

May 022012
 

There’s an annoying and confusing error that can come up from time to time when performing a Puppet update from the client. In particular when running the update for the first time.

It looks like this:

# puppetd --test
err: Could not retrieve catalog from remote server: 
SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: 
certificate verify failed
warning: Not using cache on failed catalog
err: Could not retrieve catalog; skipping run
 

This is saying that the verification check of the certificate against the keys has failed.

Solution

This could mean one of two things. The most common reason, particularly with a newly kickstarted host is that the discrepancy is too large between the time on the client and Puppet server. Or, the certificate on the client just needs to be regenerated.

Check the Date

Simply confirm this with the date command on both :

  # date
  Wed May 2 12:34:00 BST 2012

And either update manually, or using the ntpdate command.

The second reason is that the certificate on the client doesn’t match that on the server. The easiest way to remedy this is to clear both certificates and start again like this:

Remove client certificate

Remove all SSL information from the Puppet client configuration:

  # find /var/lib/puppet -type f -print0 |xargs -0r rm

Clean from server the client certificate

Where the fully-qualified domain name of the problematic client is “client.example.com”:

  # puppetca --clean 
 

Re-execute client Puppet run

Rerun the Puppet client update:

  # puppetd --test

If all goes well, the Puppet client should successfully verify its certificate and accept the updates, as it should.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.