It’s always nice to have a bunch of tricks for processing files easily and quickly. It’s fairly straightforward to remove duplicate lines by sorting a file with a
unique filter, maybe using a couple of pipes, but this has the drawback of leaving you with a file now completely out of order. This would be fine if the file was a list, but if it’s a piece of code, it’s now totally useless. There’s a surprisingly quick and easy way to remove subsequent duplicate lines of text from a file without sorting.
Here it is:
awk '!x[$0]++' filename.txt
For example, take a file with this text:
# cat test.txt abc def ghi abc xyz abc ghi plq def
Run the awk command, and this happens:
# awk '!x[$0]++' test.txt abc def ghi xyz plq
Make a note of this one, because it’s bound to come in handy sooner or later.
Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.
He lives and works in London.