Yearly Email Archiving

This time it took me a while longer, because I was busy with $reallife during the regular Winter Solstice celebrations, but earlier tonight I finished my traditional “end of the year” email archiving process. Messages that I posted in 2008, ‘cached’ copies of the replies I received, and any incoming message that listed one of my addresses in the recipient list have been moved to “mail.archive-2008“.

These messages were stashed safely away, and then deleted from my laptop’s disk. I don’t often need more than a year’s email backlog, and when I do backups are always available somewhere around.

When the archiving process was finished I ran my rdiff-backup script once more, and the shuffling of email seems to have generated an rdiff-backup increment of 946 KB:

$ rdiff-backup --list-increment-sizes /archive/keramida/mailnews
        Time                       Size        Cumulative size
Sat Jan 17 00:53:28 2009         1.85 GB           1.85 GB   (current mirror)
Fri Jan 16 18:22:24 2009          946 KB           1.85 GB   (2008 archive mirror)

I am beginning to like rdiff-backup. A lot :-)

10 thoughts on “Yearly Email Archiving

  1. Panagiotis Atmatzidis

    I need a backup solution for a computer office. I use an external USB 500 GB disk. I’m seriously thinking to deploy rdiff-backup using anacron. I considered Duplicity also, it has GPG-support but I need to do further testing for duplicity.

  2. Panagiotis Atmatzidis


    I start using rdiff-backup on my office computer. The backup should be saved in the external USB disk drive. Here is my script:

    root@Humildus:~# cat /etc/cron.daily/daily-backup
    # remove year-old backups
    rdiff-backup –remove-older-than 1Y /media/disk/backup

    # start a new daily backup
    rdiff-backup –exclude /home/atma/.gvfs –exclude /home/atma/.gvfs /home/atma /media/disk/backup

    Now the problem is that although I excluded the .gvfs directory I get this error:

    root@Humildus:~# bash /etc/cron.daily/daily-backup
    No increments older than Mon Jan 21 19:55:06 2008 found, exiting.
    ListError .gtk-bookmarks/.gvfs [Errno 13] Permission denied: ‘/home/atma/.gvfs’

    any ideas? :-/


    ps. I should post this to a forum a guess :-P

  3. keramida Post author

    atma: I think –exclude shouldn’t iterate that directory at all, but I will have to test how this works with rdiff-backup. Are you sure the /home/atma/.gvfs file is the one that fails here though? I see .gtk-bookmarks in the ListError message.

  4. Panagiotis Atmatzidis

    Yes sorry, it does not iterate the dir, it was a bad cp/paste. But if the cron is launched a root, why on earth should “He” get a “permission denied” :-( . I will try in evening to exclude .gtk-bookmarks as well.

  5. George Notaras

    I keep staring at the numbers (1.86Gb) shown in this increment sizes list! I wonder how Evolution or Thunderbird would perform while handling such a tremendous amount of data…

    Regarding rdiff-backup, I am another satisfied user. I’ve been using it for about a year and a half without any serious problems. I have heard though that when trying to restore very old data, it might require several minutes until the data is finally restored, but I guess this is a problem with the logic behind incremental backups and not an rdiff-backup specific issue. Never had to restore any really old data, so I do not have any experience on that.

    Lately, I got so excited about this program, that I decided to replace my old wrapper bash script (nowhere near the quality of your shell scripts) with a new one, written from scratch in python. Although I have spent many days on it, it’s not done yet. I just hope this time I’ve got it right :)

  6. Giorgos Keramidas

    George Not.: There is only one ‘catch’ I have noticed with rdiff-backup. Depending on the verbosity level, the backup.log and restore.log files in archive-dir/rdiff-backup-data/ can get very large. For small archives and a few hundred increments in a single archive directory, the log files may grow larger than the actual archive data.

    I am successfully using rdiff-backup to keep incremental backups of various sets of files though:

    (1) My ~/Mail and ~/News files. I am using Gnus and downloading all my email and Usenet articles locally, so I can read and reply to email and news posts even when I am offline for long periods. Having a nice archive of everything, copied periodically on an external USB-attachable disk, makes me feel a bit safer about losing all of this.

    (2) The Oddmuse Wiki instance that powers my http://localhost/ pages. I use a local wiki, accessible only from for now, to keep various random notes and other trivial information. Oddmuse stores the wiki pages in flat text files, so I keep hourly incremental backups of the wiki pages, and the lighttpd.conf configuration file that powers the wiki.

    (3) A HELLUG-internal Oddmuse Wiki instance where I have started keeping notes for the team of HELLUG’s administrators. This is a work in progress that has been going on for a coupe of months now, but I found rdiff-backup is a wonderful solution for keeping hourly backups of that wiki instance too. After having tried Oddmuse and rdiff-backup at home for a while, I felt quite comfortable with the way they work.

    (4) My parent’s documents. They share their Windows folders to the local network, and a Samba mounted copy is archived a few times every day, using rdiff-backup, to one of my FreeBSD machines. We didn’t really bother trying to find a Windows port of rdiff-backup. The script can pull the files from a read-only Samba mount just fine.


    One of the excellent aspects of keeping an archive of a file set with rdiff-backup is that you can then use plain rsync to mirror the archive in other places. This is how I am mirroring the HELLUG admin wiki at home, for example. Even if the archive is lost because of a major snafu, the rsync’ed copy at my home is a fully functional, standalone copy that works as a drop-in replacement of the original wiki archive.

    PS: Thanks for the kind words about my scripts. I hope you are having fun with the Python project. I also think that after a certain point (let’s say a few thousand lines) shell scripts beg to be rewritten in Perl, Python or another language, and Python is IMO an excellent choice :-)

  7. George Notaras

    Interesting remark about the ever-growing size of the rdiff-backup logfile. This led me to check the man page more thoroughly and find out that the --terminal-verbosity option also determines the amount of information that is recorded to the logfile. It seems that a verbosity level over 4 will cause rdiff-backup to record information about all changed files. Also, if the –print-statistics option is used, the statistics are written to the logfile too. I happened to use a verbosity level of 4 in my old wrapper script, which only records a line indicating the beginning of the backup operation and any errors that might occur, so the logfiles are not that big.

    I checked if it was possible to rotate all those */rdiff-backup-data/backup.log files, but my problem was that I could not find a shell pattern that could be used in the logrotate configuration and match them all at various directory depths…

  8. George Notaras

    Ooops, I must have overlooked the first paragraph of your last comment, in which you had already described what I wrote above. But, hey, it was a good opportunity to take a thorough look at that rdiff-backup man page! :)

  9. Mark Larson

    You may also want to try archive manager.
    You can really save on email storage space with it because it keeps only a single copy of all messages and attachments and removes the duplicates.

Comments are closed.