Nov 062012
 

It’s always nice to have a bunch of tricks for processing files easily and quickly. It’s fairly straightforward to remove duplicate lines by sorting a file with a unique filter, maybe using a couple of pipes, but this has the drawback of leaving you with a file now completely out of order. This would be fine if the file was a list, but if it’s a piece of code, it’s now totally useless. There’s a surprisingly quick and easy way to remove subsequent duplicate lines of text from a file without sorting.

Here it is:

awk '!x[$0]++' filename.txt

For example, take a file with this text:

# cat test.txt 
abc
def
ghi
abc
xyz
abc
ghi
plq
def

Run the awk command, and this happens:

# awk '!x[$0]++' test.txt 
abc
def
ghi
xyz
plq

Make a note of this one, because it’s bound to come in handy sooner or later.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Nov 052012
 

The Puppet language is well documented here, but to really understand its idioms takes practise and a lot of good working examples. While Puppet uses arrays as types, it doesn’t give a lot of operations for working with them. So for example, many resources take arrays as variables, but to try and apply a class to every element in an array individually, this is not so simple. A solution is presented below.

In this particular example, I’m also using the Puppet Firewall module, which you can get here. I wholeheartedly recommend using this module for managing iptables, as this is notoriously difficult to do with any granularity or flexibility on Puppet straight out of the box.

You’d configure it like this, to permit connections to the Apache HTTPD daemon, listening on port 80, from the client 10.10.10.10 only.

class iptables::httpd-server {    
   firewall { "500 allow connection to httpd:80 and https:443 from 10.10.10.10":
      state => ['NEW'],
      dport => [ '80', '443' ]
      proto => 'tcp',
      action  => 'accept',
      source  => '10.10.10.10',
   }
 }

But if, for example, you wanted to configure the firewall module to configure iptables for two or more source ports, you can’t just submit an array value to the “source” variable, as you can with the “dport” variable, because the syntax of the firewall resource won’t permit it (this is actually because you can’t do it in plain old iptables). You need to instead call the firewall class for each source address. An ugly way, would be like this:

class iptables::httpd-server {    
   firewall { "500 allow connection to httpd:80 and https:443 from 10.10.10.10":
      state => ['NEW'],
      dport => [ '80', '443' ]
      proto => 'tcp',
      action  => 'accept',
      source  => '10.10.10.10',
   }
   firewall { "500 allow connection to httpd:80 and https:443 from 10.10.10.11":
      state => ['NEW'],
      dport => [ '80', '443' ]
      proto => 'tcp',
      action  => 'accept',
      source  => '10.10.10.11',
   }
 }

Solution

But in the more likely scenario that you needed to generalise this module, and you effectively wanted to include the class iptables::httpd-server and have it utilise an array variable for the “source” resource, this is perhaps a more effective way of doing this – using a “define”.

class iptables::httpd-server {    
    define allow_http_client {
    	firewall { "500 allow connection to httpd:80 from $name":
        	state => ['NEW'],
        	dport => ['80','443'],
        	proto => 'tcp',
        	action  => 'accept',
        	source => $name,
    	}
    }
    if $incomingAcl {
      	$source = $incomingAcl
    } else {
    	$source = '0.0.0.0/0'
    }
    allow_http_client { $source: }
}

In this case, “define allow_http_client” sets a function which wraps the firewall{} resource. The variable “$name” is actually a reserved variable and is set to whatever is passed to the define when it’s called, in this case each element of “$source”. The variable “$incomingAcl” gets set somewhere outside, in the node definition perhaps.

Ths if conditional statement is just to implicitly set the source address to “0.0.0.0/0” (anything) as a default if the $incomingAcl variable isn’t set.

This may also be as useful illustration for other similar cases in Puppet where an array is needed, but cannot be used directly.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Nov 032012
 
Sometimes – whether it be a new job or an inherited system or a colleague who’s taken time off to conceive a baby – you just don’t know what the heck is going on. There’s an application running with an unfamiliar name, there’s no man page, and the font of knowledge that is Google can only spit out cryptic snatches of email from mail-archive.com. I’m going to offer a few techniques and tips to illustrate how I gather information about unfamiliar software, its files, ports and command line arguments. It’s starts off simple and gets a little more interesting towards the end.

For the purpose of this post, I’m going to pretend I don’t know what Mongo DB is. It’s a NoSQL database, very good at horizontal scaling. But let’s pretend. I’m selecting it as an example because it’s not too untidy, and helps me make my point.

Hyopthetical Situation

You’re in the deep end at a new job, you’re on your own, there’s no documentation, and the monitoring shows that a host is running at high load. You’ve never come across this program before, and have no idea where to start. First thing to do, top:

Good ol' top

We can see that the mongod process is running pretty hot. We’re not concerned at this point why it’s having problems. I just want to know what it is, what it does, and where its files are.

Process listing

I’ll run ps with a full listing to find out what its command line arguments look like, and if it’s been invoked with a full path, which may indicate where it’s installed.

This show that the executable resides at /usr/bin/mongod and that it takes a configuration file /etc/mongod.conf. So not too perplexing.

Search on pattern

But let’s assume instead that we couldn’t obtain the full path of the mongod process. The next thing I like to do is just a big find command on the root directory and see what turns up, using “*mongo*” as our search pattern:

That’s actually pretty comprehensive (I’m liking mongo more and more) and logical. We can see that the log files are in the right place, and there are init scripts. But if these executables and logs weren’t named *mongo*, then this wouldn’t have been so straightforward.

Package Listing

So a better way to get the list of files associated with an application is to find out which package the /usr/bin/mongod comes from. On RPM based systems, like this:

Or on Debian-based systems, you’d use:

   # dpkg -S /usr/bin/mongod

Turns out the software is called mongo-10gen-server. Let’s query the contents of that package (and, in a slight jump, a related package) to find out what other files are installed out of this box:

This is the official list of files that were created at installation time.

At this point, it would probably be worth perusing the configuration file /etc/mongod.conf and the log file /var/log/mongod.log for clues and comments about what the program does, and how.

List all open filehandles

The tool lsof is fantastic for seeing everything a process has open – files, sockets and pipes. Executing ps, we can see that the process ID of mongod is 6569, so we invoke lsof with the PID as an argument:

Again, we can see that mongod is holding open its logfile, /var/log/mongod.log. Also, the two lines with LISTEN show that the process is listening on TCP ports 27017 and 28017. This information could also have been obtained by typing:

   # netstat -nlp

Application Network Traffic

We know from lsof and netstat that the process is listening on port 27017 and 28017. Let’s just take port 27017 and see if there’s any traffic coming in on this, and from where.

Bearing in mind that my testbox has the IP address of 10.243.52.51, we can see that incoming traffic is emanating from 10.243.24.69. I’d be logging in to that host to find out more about it and what it thinks it’s doing talking to this mongo thing.

Process internals

So we’ve got this far, and we know how the process is invoked and where it writes to. I’ve often found that the tricky part about lsof is that it only show files that are held open. If a configuration file is read once on startup, then it won’t show up in lsof. It can be handy to know where a program’s inputs are coming from. Attaching the strace program to a process when it starts up can reveal all sorts of information. In this example, yes it’s obvious that mongod has had the /etc/mongod.conf configuration file passed on the command line. But the point is that even if it hadn’t, strace would reveal that the file had been opened.

There are many options that can be passed to strace, but “-e open” narrows it to filehandles being opened only, which is a bit more manageable. By running it with the “-f” option, it will also drill down to any forked processes.

Reading embedded text in the application binary

Here’s one more trick that I like to use when I’m desperate. If you’ve got no manpage, a feeble “–help” and no “Usage”, then sometimes this may be of assistance. Run the strings command against the application binary, use a grep and a less for practicality, and see if you can extract anything useful – comments, expected arguments, anything:

In the mongod case, we get a few command line options (–replSet, etc). If I was really trying to ascertain how to use a program, some of these may be helpful. Again, not the best example, but it’s sometimes worth a try.

Of course, it goes without saying that you should try the man page, although for Mongo DB I only get this:

But you’ll find that the man pages for lsof, strace, tcpdump and find are extremely comprehensive and packed with great examples.

So feel free to share a few of your favorite debugging tips in the Comments. Bob knows, I could use them.
[flattr uid=’matthewparsons’ /]


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

Nov 012012
 

I’m going to begin by saying you should never ever do what I’m about to describe, as it’s bad practice, sloppy and dangerous. But then, so is working late on a Friday night, so you might as well choose the lesser of two evils.

As a sysadmin my overriding concern is to do as little work as possible, as quickly as possible. If that’s at all possible. And by work, I mean those repetitive and tedious tasks. There is probably nothing worse than performing the same command, or series of commands, over and over again on multiple hosts. It’s the adult equivalent of writing out lines on a blackboard, and unfortunately this seems to crop up more often than it should.
StateLibQld 1 102016 Interior of Brisbane Technical College Signwriting class, ca. 1900

The best practice way of addressing multiple updates is by implementing some form of configuration management system. This kind of software allows the administrator to centrally manage the files, packages and patches of all their hosts from a single central point. Puppet, Chef, Spacewalk and Satellite are all excellent products that approach this problem from different angles, with each enjoying differing degrees of favour among different factions of the Linux community.

But there are some times when either these tools aren’t complete, or because you’re just tired and desperate, and a quicker, dirtier solution is what’s required. I’m going to describe the use of such a tool. One that will allow you to execute commands simultaneously in shells on multiple hosts. It’s potential for wholesale destruction is enormous, and I can’t recommend you ever use it, but once it’s in your toolbox, you will.

So what if rather than invoking commands from an SSH shell on hosts one after the other – that is, in serial – you could type the command once and have it execute simultaneously on all hosts – that is, in parallel? There are two tools that I know of that perform this task. One is called ClusterSSH, and the other is MultiSSH(or, mssh). ClusterSSH, or cssh is available from the EPEL repository for Fedora/CentOS/RedHat, and mssh can be obtained from the Ubuntu Software Manager. Both can be downloaded from Sourceforge. For the purposes of this post, I’ll be discussing mssh but everything applies equally to ClusterSSH as well. Their command line arguments and behaviour are identical.

Install the software, in the form that your distribution expects, or however you’re comfortable. Then, invoke mssh by passing it a list of hosts to connect to:

   # mssh perfdb01 perfdb02 perfdb03 perfdb04

Boom!

And “Boom!”, you’ll have a window subdivided into four separate shells – one for each host – and all operating in unison. Type a command in the top strip and it appears at each command line. Click on an individual pane and you’ll be typing in just that one shell. You can also click on the Serversmenu and you can unselect hosts to disable input to them.

The first thing you’ll get is a login prompt, if you haven’t distributed your public SSH keys, and if your password is that same on each host, or you’re using LDAP, logging in simultaneously is a breeze.

Now, you can sudo to whatever account you need, and hack away to your heart’s content. But remember, “measure twice, cut once”. It’s easy enough to think all your shells have the same current working directory at any particular time, but they may not. The possibilities for total system destruction are enormous, so for Pete’s sake, be careful. I’ll leave example use as an exercise for the reader.

Cautionary tricks when using MultiSSH

Here’s a few tricks that I’ve picked up that can increase the power of MultiSSH.

Staggered execution timing

One of the difficulties of simultaneous execution is that if the command you’ve invoked is accessing a common resource, say a local YUM repository, then some of the commands will fail due to heavy IO or CPU. So I like to add a bit of a random pause before my command:

# sleep $(( $RANDOM/1000 )); yum -y update

Since the $RANDOM variable returns a random number of milliseconds, but sleep takes seconds as an argument, and I don’t want to wait hours.

Prevention of wholesale deletion

There’s a cute way to stop the famous "rm -rf *". You can probably do it with Selinux, and probably should read the man page to work out how to, but if you’ve turned Selinux off in mystified frustration, this is a quick hack to counter it. All you need to do is create a file name -i that is undeletable. This will make any expansion of * in an "rm -rf *" be expanded to: "rm -rf -i *", forcing the command into interactive mode and putting a halt to things. How do you create a file that’s even undeletable by root? If you’re using an “ext” filesystem, you can use the "chattr" command, like this:

   # cd /
   # touch ./-i
   # chattr +i ./-i

This will create a file called "-i" that the filesystem itself (rather than the inode permissions) will prevent you from deleting:

  # rm -f ./-i
rm: cannot remove `-i': Operation not permitted

Read the chattr man page to find out how to undo the attribute change and delete the file..

So that’s a quick rundown of MultiSSH. With the time that you’ll gain by using this tool, do some research on Puppet or Chef and implement one of those to manage all your files, packages and configurations across your entire estate. That’s a better practice. But in the mean time, this quick and dirty tool could really help you out.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.