Mercurial command demo: hg verify

One of the wonderful commands that Mercurial support is the “verify” command. Running this command in a Mercurial workspace goes through the backing store of the repository and makes sure that the history and contents of the versioned files are not corrupt, missing or otherwise in a “bad” state.

An example of how you can use this Mercurial command is described here.

I keep a shallow “clone” of the FreeBSD src/ repository on my laptop, with all the history of the main trunk of development since 2008-01-01. The clone includes some local patches that I haven’t yet cleaned up for the main Subversion tree of FreeBSD, but it would be bad if I lost some of them or corrupted some of the changes to the point that building from my private clone would be impossible. This is why I develop the patches in one place, at the “/hg/bsd/src” directory and my main build directory at “/usr/src” is a second clone of the development tree. Right before firing up a buildworld+buildkernel run, I run the following:

% cd /usr/src
% /usr/bin/time hg --debug verify
repository uses revlog format 1
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
41996 files, 5061 changesets, 55043 total revisions
       52.22 real        34.25 user        17.33 sys
%

The extra time it takes to verify the “/usr/src” may look like a lot of time, but it also means that if a crash leaves my disk in a bad state, I will not even start compiling in this particular clone, but I will go back to my backups and restore the workspace from a clean copy.

On the same laptop I keep a converted GNU Emacs repository (from the main CVS repository of Emacs, converted first to Git at git://git.sv.gnu.org/emacs and then to Hg with the convert externsion). The main trunk of Emacs development now includes more than 94.538 commits, with almost 23.5 years of unbroken commit history! This is probably why running “hg verify” in Emacs takes a bit more time:

% cd /hg/emacs/head
% /usr/bin/time hg --debug verify
repository uses revlog format 1
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
4630 files, 94539 changesets, 162061 total revisions
      295.14 real       243.10 user         5.38 sys
%

That’s slower than the FreeBSD src/ tree, but with a repository that is slightly different. Instead of many files, this repository includes many many more changes to a smaller set of “currently active” files.

Now it would be interesting to see how well “hg verify” works with millions of files, or with a few files and a couple of million of changes to these files :-)

About these ads
This entry was posted in Computers, Emacs, Free software, FreeBSD, Mercurial, Open source, Programming, SCM, Software and tagged , , , , , , , , , . Bookmark the permalink.

One Response to Mercurial command demo: hg verify

  1. Michael Iatrou says:

    Millions of files?

    grok:~$ du -h –max-depth=0 /home /usr /var /opt
    76G /home
    2.8G /usr
    524M /var
    118M /opt

    grok:~$ find /home /usr /var /opt -type f | wc -l
    527437

    A couple of hundred thousands files should be enough for your next experiment. But it is a pretty simple observation that revisions are the bottleneck, and it actually aligns with the constraints of the revlog design.

    Still, I would find more interesting the correlation files-revisions for constant time, lets say 10sec, which is a realistic expectation for everyday use. Even more, I would like to see how the number of changesets affect the results.

Comments are closed.