A bunch of updates for the Greek FreeBSD/doc translations

Translations of technical documentation from English to Greek are a relatively difficult task. It takes a certain level of attention to detail and a fairly good command of both languages. Then there is the minor issue of keeping the translations up to date with their English counterparts.

Updating translations (the old style)

We have a growing body of translated work at the FreeBSD Greek documentation project team, and it was getting rather unwieldy going through each file manually and checking if there are updates in the English version that we would like to pull out of CVS, translate from scratch or re-translate, and commit to our main translation tree. Back when I started writing the original Greek translation build glue, I copied a tagging scheme used by existing translations that was helpful for this sort of manual check. Each translated file had a comment of the form:

<!-- Original revision: 1.17 -->

When looking for updates, one had to manually perform the following steps for each file in the doc/el_GR.ISO8859-7/ directory:

  • Check if the file includes an “Original revision” comment.
  • Extract the revision number from that comment, and note it down somewhere.
  • Make an educated guess about the pathname of the original English text. Some times the path is easy to guess by substituting el_GR.ISO8859-7 with en_US.ISO8859-1 in the file’s path name. Some other times, it isn’t so easy (especially for files in the el_GR.ISO8859-7/share directory).
  • Locate the $FreeBSD: ... $ line in the original English text.
  • Compare with the saved revision from the comment of the Greek text, and see if there are updates to translate.

There are just five steps for each file in this checking process. When translated files are just a bunch of articles, and a few makefiles, it’s boring to repeat these steps for each file, but it isn’t so difficult that nobody can do it. Now that we have Greek translations for a large part of the FreeBSD Handbook, and I am a bit more pressed for time, manually performing these steps for each file of the Greek translation tree started becoming very difficult to do in a timely manner.

New tools (checkupdate)

This was the main reason for writing the checkupdate script. With a lot of help from Gabor Pali, one of the committers who work for the Hungarian FreeBSD translations, I wrote a Python script called checkupdate and designed a tagging scheme that would make this part of the translator’s work much easier. We started by defining how a translator can “tag” a translated source file with the revision of the last fully translated English version. The idea we came up was:

Each translated file will contain a pair of tags called “%SOURCE%” and “%SRCID%“. The %SOURCE tag will point to the relative path of the English text under the doc/ tree. The %SRCID% tag will refer to the last fully translated revision of the %SOURCE% file.

An example for one of the translated Greek articles is:

$ pwd
/ws/bsd/doc
$ head -10 el_GR.ISO8859-7/articles/new-users/article.sgml
<!--

  $FreeBSD: doc/el_GR.ISO8859-7/articles/new-users/article.sgml,v 1.4 2008/01/14 14:19:42 keramida Exp $

  Για Χρήστες Νέους τόσο στο FreeBSD όσο και στο Unix

  The FreeBSD Greek Documentation Project

  %SOURCE%      en_US.ISO8859-1/articles/new-users/article.sgml
  %SRCID%       1.24
$

Then we wrote a Python script that can “parse” the %SOURCE% and %SRCID% tags, look up the CVS (or Subversion) revision number of the original English text, and report any differences. The “interface” of the script was quite simple: a list of filenames is fed to the script through standard input, and it assumes they are relative pathnames under the top of a doc/ checkout. This way, to check all the files of the Greek translation one would run:

$ pwd
/ws/bsd/doc
$ find el_GR.ISO8859-7 | checkupdate

To check multiple translations trees at once it would be possible either to loop through the translations:

$ pwd
/ws/bsd/doc
$ for dname in el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 ; do \
    find "${dname}" | checkupdate ; \
done

or just pass their names directly to find:

$ pwd
/ws/bsd/doc
$ find el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 | checkupdate

The first version of the script tried to include as much information about each translated file as possible, so it used a relatively verbose output format. This is the default output format even today. For the current version of the el_GR.ISO8859-7 translation tree the checkupdate script output includes the following:

$ find el_GR.ISO8859-7 | checkupdate
el_GR.ISO8859-7/articles/Makefile rev. 1.16
    1.39       -> 1.60        en_US.ISO8859-1/articles/Makefile

el_GR.ISO8859-7/articles/laptop/article.sgml rev. 1.4
    1.9        -> 1.25        en_US.ISO8859-1/articles/laptop/article.sgml

[...]

Gabor (pgj) later added an option for compact output, because he likes seeing one line of output for each file. The compact mode is enabled with the -c option of the checkupdate script:

$ find el_GR.ISO8859-7 | checkupdate -c
1.39       -> 1.60       el_GR.ISO8859-7/articles/Makefile
1.9        -> 1.25       el_GR.ISO8859-7/articles/laptop/article.sgml
[...]

The checkupdate script has now been committed to the FreeBSD doc/ tree in CVS, and it includes a short manpage too. The script and manpage sources are browsable online at:

http://cvsweb.freebsd.org/doc/el_GR.ISO8859-7/share/tools/checkupdate/

Updating translations (new style)

Using the checkupdate script and a CVS checkout of the doc/ tree is much easier now. I usually open two side-by-side terminals, and keep running CVS diff commands in one of them and checkupdate in the other. A typical MFen session for one of the Greek articles includes:

  • Picking one of the translated files to update, from the output of checkupdate. For this example, let’s assume I want to update the laptop/article.sgml file.
  • Running “cvs log” and “cvs diff” in the second terminal window, to look at each change committed in CVS:

    $ cvs log -r1.9:1.25 en_US.ISO8859-1/articles/laptop/article.sgml | more
    $ cvs diff -r1.9 -r1.25 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • If the diffs seem to large to translate in one go, I may opt to translate each CVS change as a separate piece. The FreeBSD doc committers try to keep content and indentation changes separate, so it is often the case that translating revision 1.9 (a content change) as a standalone change is a lot easier than trying to decipher what changed between 1.8 and 1.10 (because revision 1.10 rewrapped and reformatted lots of text and it makes looking for the content changes of 1.9 unnecessarily hard).
  • Looking at only one revision of a file is slightly boring in CVS, but not really tough:

    $ cvs diff -r1.8 -r1.9 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • When the translation of revision 1.9 is done, I commit it to the Mercurial tree I am using for local work, taking care to update the %SRCID% comment in the file to show that it is now synchronized with English revision 1.9.
  • Some time later, a bunch of changes are pushed to the main Mercurial tree at http://hg.hellug.gr/freebsd/doc-el/.

Recent updates

aUsing the checkupdate script and the CVS diff commands described so far, I merged from the English text a fair number of updates since last night. The commit email started tricking in late at night, when I extracted the patches from my personal Mercurial tree and committed them into CVS:

2008-08-31 [  29: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/books/developers-handbook/policies chapter.s$
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml mailing-lists.ent
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml freebsd.ent
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books Makefile.inc
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/releng extra.css
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  13: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook colophon.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  13: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml

The number of commits looks scary, but in reality this was only because I was experimenting with separate MFen commits of each English revision.

In retrospect, this may not be a very good idea. We don’t really need *all* the English versions translated in CVS (some may be broken, others may be intermediate commits, or may be missing some bits). It doesn’t make sense to include all the false starts of the English docs in the el_GR.ISO8859-7 tree too. So the last two commits to CVS included a bunch of English revision merges in “collapsed” form:

keramida    2008-09-02 13:56:43 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  Log:
  MFen: 1.11 -> 1.13  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.5       +9 -4      doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

keramida    2008-09-02 13:57:41 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  Log:
  MFen: 1.13 -> 1.17  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.6       +198 -3    doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

I think I like this commit style a bit better, and after a short discussion in the mailing list of the translators, Manolis seems to like this style too.

About these ads
This entry was posted in Computers, Free software, FreeBSD, Mercurial, Open source, Programming, Software and tagged , , , , , , , . Bookmark the permalink.

One Response to A bunch of updates for the Greek FreeBSD/doc translations

  1. Manolis says:

    And may I add, that thanks to the checkupdate script we also have this page – which makes my recent work very easy:

    http://www.freebsdgr.org/sgml.php

    which shows in color what we need to update: black=in sync, red=behind en_US revision, blue=ahead en_US revision (either wrong value of %SRCID% or cvs merge needed)

    I am quite glad we have introduced this system, it takes out a lot of tedious work ;)

Comments are closed.