May 222013

I needed to maintain two separate SVN repositories that should have ostensibly been the same, but which had a few minor differences. Most of these were differences trivial, but all needed to be checked individually and resolved on a case by case basis. This could have been a complicated proposition, but fortunately, the “diff” command is recursive, and I came up with a little hack of a script to further simplify the process.

The “diff” command is a handy utility that reports the differences between files, on a line by line basis. It has a lot of switches which control the amount of output, and the man page is a fine resource for this. Two useful ones are “-u” which simplifies or “unifies” the output (try it with and without to see the difference), and as mentioned above, “-r” which makes the command recursive, comparing files within subsequent subdirectories as well.

The problem is, doing a recursive diff on two directories can produce massive amounts of output. If the aim of the exercise is to resolve the differences completely, you’d just do a straight copy, but more often than not, it’s to root out a few differences or make some slight modifications. The trick is to weed out the signal from the noise on subsequent refinements.

One way of doing this (rather than just a “grep -v” on the output which would become unnecessarily unwieldy) is to use the “-X” switch to supply the name of a file which contains filenames (or patterns of filenames) to exclude from the check. Better yet, write a script to execute the diff command, submitting the script itself as the argument to the “-X” switch. In the script, list the filename patterns after the “diff” invocation, like this:

Example script:

diff -X $0 -ur dropbear/puppet/ bunyip/puppet/


The “$0” argument means that the executing filename will be substituted. And the “exit” command means that nothing following it in the script will be executed.

To use this script, start with nothing after “exit”, then as you explore and resolve inconsistencies between files, exclude any false positives by listing the filename at the end of the script. Re-run the script, and repeat until the differences have been resolved or ignored.

This is one of those long, drawn-out processes, but there are some few things that just can’t be automated.

Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>