Oct 132012

It’s only a matter of time before a disk fills up. The longer lived and busier a host is, the more its files grow and are created. Good practice dictates that scheduled jobs and scripts keep these files under control, but there are times when all of a sudden you just need to find the disk hog and take it down. This post will list a few ways of doing this, in increasing order of obscurity.

List directories and files sorted by size

Determine the filesystem that’s filling up:

  # df -h

Then, change to this top-level directory, or whichever you suspect is causing the problem and generate a list of files and folders, by disk usage, sorted by size:

  # du -sk * | sort -n

Locate large dynamic files and truncate them

The assumption behind this command is that if a file system starts being reported as filling up, then there’s probably a large file in there that was written to recently. The find command locates that file – and the cat /dev/null zeroes the filesize while keeping it. We do this because if the file is currently held open by a process, actually deleting it can have adverse consequences (the process may keep writing to nothing, losing future logs, and may even crash).

  # find / -mtime -1 -size +50000 -ls
  # cat /dev/null > file

Clear the YUM cache (Redhat/CentOS/Fedora)

The YUM installer caches a lot of files as packages are installed and upgraded. These all reside under the directory. If you don't plan on rolling back any installed packages, you can generally clear this out:

  # yum clean all

Find large installed packages

It’s amazing how many large packages can get installed on a host. Desktops especially tend to aggregate software installed for a quick test and then never removed. Or numerous old kernels can hang around. The following commands – one for Fedora and one for Ubuntu – while long an unwieldy, are very useful for displaying all installed packages, and also showing how much space they consume from a fresh install.

  # rpm -qa --queryformat '%{SIZE} %{NAME} %{VENDOR}\n'  | sort -n

Any unnecessary packages can be removed (ignoring dependencies) like this. Before uninstalling any old kernel versions, be absolutely sure you won’t need to roll back.

  # rpm -e package-name

And the same thing in Debian/Ubuntu:

  # dpkg-query -W --showformat='${Installed-Size;10}\t${Package}\n' | sort -k1,1n

Uninstalled with:

  # dpkg -r package-name

Suggestions about what software you do and don’t need installed is out of the scope of this particular post, but the exercise that this research presents to the reader makes for a very useful learning experience.

Any other tips for hunting the disk hog? Let me know in the Comments.

Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

  2 Responses to “Freeing up Linux disks – hunting the space hog”

  1. Todd,I’ve been in Data Recovery field for over There are software I would rmocemend you the best to suite your needs.1. R-studio2. File Scavanger3. Get Data Back NTFSThose are the thre I would use.Some Precaution has to be taken when trying to do Data recovery.1. Do not use the disk do not write anything on the Hard Drive (no windows reinstallation no data copy)2. Plug the hard drive as slave on an other working computer.3. Do not attemp to save recovered data on the same hard drive.4. Ideally make a full image of the drive (bit level) with tools like linux DD, winhex, acronis R-studio also has the ability to create an image and work from this image so in case of mistake you will be able to restart the entire procedure from the original Hard Drives.Hope it helps

    • My name’s not Todd, but thanks for the tip. This is a Linux blog, and this post was more about deleting files rather than data recovery, but I’m definitely going to make a note of these tools. You never know when they may come in handy.

      Stay classy.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>