The Cat’s Tongue

Documentation is good and all, but it’s not much good if you can’t understand a word of it. I’ve often said that man pages are great if you’ve learned how to read man pages, and people often cite man man as a way to learn how. You know what, you still need to know how to read them (and know that the jargon file exists) to read that. Have you seen its synopsis?

With the recent release of the 1.5 series of Mahara, we’ve been putting a lot of work into making the manual translatable so that users don’t necessarily need to learn English to be able to benefit from the Mahara manual. Currently, the manual is hosted on readthedocs.com but sadly the site doesn’t yet have support for generating translated versions on the fly.

Overall, we needed to:

  • Have translated versions of the manual.
  • Have the screenshots be translatable.
  • Limit the learning curve for translators.
  • Avoid maintaining multiple copies of the source
  • Make it automated
  • Avoid having to re-deploy every time a new translation is started
  • Should be via apt-get, not via easy-install etc.

Sounds like it should be simple? I wish! There were quite a few hiccups in getting this all going:

  • The first challenge was to get Sphinx to use the translations we had. And after a ridiculous amount of fiddling and cursing, it turns out that in Ubuntu releases preceding 12.04, the .mo files did not get packaged with the locales.
  • Unicode support isn’t very well supported in the default sphinx setup, so I decided to swap over to XeLaTeX. This involved a substantial amount of tweaking.
  • Docutils changed its api (in version 0.8 onwards, which is in Ubuntu 12.04) and thus began reporting, for example, ‘ngerman’ instead of ‘de’. Sphinx wasn’t expecting that. This is fixed in the bleeding edge version of sphinx, and via a patch in ubuntu precise/quantal and debian sid.
  • If a language isn’t supported natively by Sphinx, it will not apply the gettext translations to certain build types.
  • I had never used Sphinx, rST or LaTeX before, and my python is effectively non-existent, so I have no idea what I’m doing.

Finally, after working out how to get around some of those, I think I have it all working. The resulting solution:

  • Uses packages in the Ubuntu repositories
  • Grabs the .po files from launchpad (where the translating happens), converts them to .mo files and places them under the necessary paths in source/locales
  • Grabs the translated image sets from git and drops them over the default english version in source/images
  • Patches the generated LaTeX source and its pdf Makefile to use XeLaTeX cleanly.

You’ll need to install:

gettext, git-core, bzr, make, ttf-wqy-microhei, ttf-freefont, mendexk, texlive-latex-extra, texlive-fonts-recommended, texlive-latex-recommended, texlive-xetex, ttf-indic-fonts-core, texlive-lang-all, python-pybabel

For grabbing the .po files from launchpad, I wrote a little bash script which gets passed a version number ($1 = the mahara version, such as 1.5):

#!/bin/bash

if which bzr >/dev/null; then
echo "Starting import of translations..."
else
echo "Please install bzr before continuing."
exit
fi

if [ ! -d launchpad ]; then
echo "Checking out the launchpad .po files"
bzr branch lp:~mahara-lang/mahara-manual/$1_STABLE-export launchpad
else
echo "Updating .po collection from launchpad"
cd launchpad && bzr pull && cd ..
fi

echo "Cleaning up from last time"
rm -r source/locales # msgfmt will do merging otherwise

for dir in launchpad/potfiles/*; do
echo "Creating $dir .mo files"
for file in $dir/*; do
mofile="$(basename $file | sed s%.po$%%)/LC_MESSAGES$(echo $dir | sed s%launchpad/potfiles%%).mo"
mkdir -p "source/locales/$(basename $file | sed s%.po$%%)/LC_MESSAGES"
msgfmt "$dir/$(basename $file)" -o "source/locales/$mofile"
done
done

To avoid the translators being dumped into a pile of Sphinx configuration and rST source, I set up an external git repo with the necessary assortment of image directories. From there, I added it as a git submodule. Getting the images is now a case of using another small script (again, $1 = the mahara version):

#!/bin/bash

if which git >/dev/null; then
echo "Starting import of localised images..."
else
echo "Please install git before continuing."
exit
fi

echo "Updating the image submodule"
git submodule init
git submodule update
echo "Updating image collection from gitorious"
cd localeimages
git checkout $1_STABLE
git pull
cd ..

That’s the external data fetched, now the real fun begins; the main makefile of sphinx required quite some mutilation.

I needed to teach it about locales and give it a way to pass in the Mahara version, since there are several versions of the documentation:

MAHARA        =
CLEAN         = bn cs da de en es et fa fi fr hr hu it lt lv ne nl pl pt_BR ru sk sl sv tr uk_UA
PATCHED       = ca hi ja ko zh_CN zh_TW
UNSUPPORTED   = hi
TRANSLATIONS  = $(CLEAN) $(PATCHED)

All of these can be overridden when invoking Make.

  • “mahara” is the mahara documentation version we’re building.
  • “clean” means patching of the LaTeX and generated Makefile is unnecessary.
  • “patched” means the LaTeX and generated Makefile need some extra tweaking to do what we wanted.
  • “unsupported” means that it’s a language not supported natively by Sphinx.
  • “translations” is just a grouping of the patched and clean collections for convenience.

Then I tweaked the cleanup and added a new make call for getting updates without blowing everything away.

clean:
-rm -rf $(BUILDDIR)/*
-rm -rf source/locales/*

update:
git checkout .
git checkout $(MAHARA)_STABLE
git pull
sh generate-mo-files.sh $(MAHARA)
sh get-localised-images.sh $(MAHARA)

For each manual format, I needed to make it iterate over the translations. This is the example for the html export.

html:
$(foreach TRANSLATION,$(TRANSLATIONS), \
git checkout source/images ; \
cp -ra localeimages/$(TRANSLATION)/* source/images/ ; \
$(SPHINXBUILD) -a -D language=$(TRANSLATION) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/$(MAHARA)/html/$(TRANSLATION) \
;)
git checkout source/images
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/$(MAHARA)/html/."

And that’s roughly how it looks for all the formats…

Except the latexpdf option.

It genuinely surprises me how clumsy LaTeX is when it comes to unicode (yes, I know TeX predates unicode, but still!), and it surprises me more that sphinx doesn’t use a more unicode-friendly parser such as XeLaTeX. Hence, to get a more unicode-friendly process going so that we could use trivial things such as Writing Systems That Aren’t Latin and do fun things like “→” instead of “->”, I used the Japanese support as a guide. It has its own separate pdf compilation script in the LaTeX Makefile, so I took that and made it awesome by converting it from pLaTeX to XeLaTeX, and use it for ALL the locales.

All the locales get modified by patches. There’s a patch to swap out the pLaTeX Makefile stuff with XeLaTeX, and tweak a .sty that was overriding the heading font. The .tex file is modified by another patch to make it work with XeLaTeX. Finally, a handful of patches get applied to a few individual translations to ensure that they are using decent fonts for their writing systems and other select tweaks.

The majority of the XeLaTeX changes could be put into the preamble in the conf.py file:

latex_preamble = '''
\\RequirePackage{ifxetex}
\\RequireXeTeX
\\usepackage{xltxtra} %xltxtra = fontspec, xunicode, etc.
\\usepackage{verbatim}
\\usepackage{url}
\\usepackage{fontspec}
\\setmainfont{FreeSerif}
\\usepackage{amsmath}
\\usepackage{amsfonts}
\\usepackage{xunicode}
'''

The resulting Make incantation is thus (and not for the faint of heart):

latexpdf:
$(foreach TRANSLATION,$(UNSUPPORTED), \
mkdir -p source/locales/$(TRANSLATION)/LC_MESSAGES
cp -n /usr/share/locale-langpack/en_AU/LC_MESSAGES/sphinx.mo source/locales/$(TRANSLATION)/LC_MESSAGES/sphinx.mo \
;)
$(foreach TRANSLATION,$(TRANSLATIONS), \
git checkout source/images ; \
cp -ra localeimages/$(TRANSLATION)/* source/images ; \
$(SPHINXBUILD) -a -D language=$(TRANSLATION) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
cp patches/makesty.patch $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
cp patches/tex.patch $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
patch --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/makesty.patch; \
patch --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/tex.patch \
;)
$(foreach TRANSLATION,$(CLEAN), \
make -C $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) all-pdf-ja \
;)
$(foreach TRANSLATION,$(PATCHED), \
cp patches/$(TRANSLATION).patch $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
patch --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/$(TRANSLATION).patch; \
make -C $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) all-pdf-ja;  \
patch -R --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/$(TRANSLATION).patch \
;)
$(foreach TRANSLATION,$(TRANSLATIONS), \
patch -R --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/tex.patch; \
patch -R --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/makesty.patch \
;)
git checkout source/images
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/$(MAHARA)/latex."

In summary, what this does is:

  • Copies an existing sphinx.mo into the locales directory for Unsupported languages
  • Makes sure the images are default as per git, then copies the localised images over the top of the default ones for each translation
  • Then runs sphinx-build
  • And does the common patching
  • Invokes make on the locales that do not need additional patching
  • Performs the additional patching
  • Invokes make on the locales which did need additional patching
  • Reverses all the patches
  • Makes sure the images are clean as per git once more.

After all of that, cron (or whatever triggers the build) should run something like this:

make update html epub latexpdf MAHARA=1.5

Once we have a server for this to live on and more people contribute translations, there are undoubtedly some edge cases that we’ll come across that I haven’t accounted for (I just don’t have the data right now to find them,) and the process will need tweaking.

I’m pretty sure there are better ways to do some of this (since I was learning many of the components while winging it,) but this is what I’ve ended up with. What started out a seemingly simple task, wasn’t.

This entry was posted in Unsorted and tagged . Bookmark the permalink.

One Response to The Cat’s Tongue

  1. XeLaTeX is great for its unicode support, but unless you need specific functionalities (such as support for specific OTF functionalities, polyglossia, or microtypography), I’d recommend using LuaLaTeX instead, since it is based on PDFLaTeX and will most likely replace PDFLaTeX as the main compiler eventually.

    Unicode support is just about as good in LuaLaTeX as in XeLaTeX.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>