The Cat’s Tongue

Documentation is good and all, but it’s not much good if you can’t understand a word of it. I’ve often said that man pages are great if you’ve learned how to read man pages, and people often cite man man as a way to learn how. You know what, you still need to know how to read them (and know that the jargon file exists) to read that. Have you seen its synopsis?

With the recent release of the 1.5 series of Mahara, we’ve been putting a lot of work into making the manual translatable so that users don’t necessarily need to learn English to be able to benefit from the Mahara manual. Currently, the manual is hosted on readthedocs.com but sadly the site doesn’t yet have support for generating translated versions on the fly.

Overall, we needed to:

  • Have translated versions of the manual.
  • Have the screenshots be translatable.
  • Limit the learning curve for translators.
  • Avoid maintaining multiple copies of the source
  • Make it automated
  • Avoid having to re-deploy every time a new translation is started
  • Should be via apt-get, not via easy-install etc.

Sounds like it should be simple? I wish! There were quite a few hiccups in getting this all going:

  • The first challenge was to get Sphinx to use the translations we had. And after a ridiculous amount of fiddling and cursing, it turns out that in Ubuntu releases preceding 12.04, the .mo files did not get packaged with the locales.
  • Unicode support isn’t very well supported in the default sphinx setup, so I decided to swap over to XeLaTeX. This involved a substantial amount of tweaking.
  • Docutils changed its api (in version 0.8 onwards, which is in Ubuntu 12.04) and thus began reporting, for example, ‘ngerman’ instead of ‘de’. Sphinx wasn’t expecting that. This is fixed in the bleeding edge version of sphinx, and via a patch in ubuntu precise/quantal and debian sid.
  • If a language isn’t supported natively by Sphinx, it will not apply the gettext translations to certain build types.
  • I had never used Sphinx, rST or LaTeX before, and my python is effectively non-existent, so I have no idea what I’m doing.

Finally, after working out how to get around some of those, I think I have it all working. The resulting solution:

  • Uses packages in the Ubuntu repositories
  • Grabs the .po files from launchpad (where the translating happens), converts them to .mo files and places them under the necessary paths in source/locales
  • Grabs the translated image sets from git and drops them over the default english version in source/images
  • Patches the generated LaTeX source and its pdf Makefile to use XeLaTeX cleanly.

You’ll need to install:

gettext, git-core, bzr, make, ttf-wqy-microhei, ttf-freefont, mendexk, texlive-latex-extra, texlive-fonts-recommended, texlive-latex-recommended, texlive-xetex, ttf-indic-fonts-core, texlive-lang-all, python-pybabel

For grabbing the .po files from launchpad, I wrote a little bash script which gets passed a version number ($1 = the mahara version, such as 1.5):

#!/bin/bash

if which bzr >/dev/null; then
echo "Starting import of translations..."
else
echo "Please install bzr before continuing."
exit
fi

if [ ! -d launchpad ]; then
echo "Checking out the launchpad .po files"
bzr branch lp:~mahara-lang/mahara-manual/$1_STABLE-export launchpad
else
echo "Updating .po collection from launchpad"
cd launchpad && bzr pull && cd ..
fi

echo "Cleaning up from last time"
rm -r source/locales # msgfmt will do merging otherwise

for dir in launchpad/potfiles/*; do
echo "Creating $dir .mo files"
for file in $dir/*; do
mofile="$(basename $file | sed s%.po$%%)/LC_MESSAGES$(echo $dir | sed s%launchpad/potfiles%%).mo"
mkdir -p "source/locales/$(basename $file | sed s%.po$%%)/LC_MESSAGES"
msgfmt "$dir/$(basename $file)" -o "source/locales/$mofile"
done
done

To avoid the translators being dumped into a pile of Sphinx configuration and rST source, I set up an external git repo with the necessary assortment of image directories. From there, I added it as a git submodule. Getting the images is now a case of using another small script (again, $1 = the mahara version):

#!/bin/bash

if which git >/dev/null; then
echo "Starting import of localised images..."
else
echo "Please install git before continuing."
exit
fi

echo "Updating the image submodule"
git submodule init
git submodule update
echo "Updating image collection from gitorious"
cd localeimages
git checkout $1_STABLE
git pull
cd ..

That’s the external data fetched, now the real fun begins; the main makefile of sphinx required quite some mutilation.

I needed to teach it about locales and give it a way to pass in the Mahara version, since there are several versions of the documentation:

MAHARA        =
CLEAN         = bn cs da de en es et fa fi fr hr hu it lt lv ne nl pl pt_BR ru sk sl sv tr uk_UA
PATCHED       = ca hi ja ko zh_CN zh_TW
UNSUPPORTED   = hi
TRANSLATIONS  = $(CLEAN) $(PATCHED)

All of these can be overridden when invoking Make.

  • “mahara” is the mahara documentation version we’re building.
  • “clean” means patching of the LaTeX and generated Makefile is unnecessary.
  • “patched” means the LaTeX and generated Makefile need some extra tweaking to do what we wanted.
  • “unsupported” means that it’s a language not supported natively by Sphinx.
  • “translations” is just a grouping of the patched and clean collections for convenience.

Then I tweaked the cleanup and added a new make call for getting updates without blowing everything away.

clean:
-rm -rf $(BUILDDIR)/*
-rm -rf source/locales/*

update:
git checkout .
git checkout $(MAHARA)_STABLE
git pull
sh generate-mo-files.sh $(MAHARA)
sh get-localised-images.sh $(MAHARA)

For each manual format, I needed to make it iterate over the translations. This is the example for the html export.

html:
$(foreach TRANSLATION,$(TRANSLATIONS), \
git checkout source/images ; \
cp -ra localeimages/$(TRANSLATION)/* source/images/ ; \
$(SPHINXBUILD) -a -D language=$(TRANSLATION) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/$(MAHARA)/html/$(TRANSLATION) \
;)
git checkout source/images
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/$(MAHARA)/html/."

And that’s roughly how it looks for all the formats…

Except the latexpdf option.

It genuinely surprises me how clumsy LaTeX is when it comes to unicode (yes, I know TeX predates unicode, but still!), and it surprises me more that sphinx doesn’t use a more unicode-friendly parser such as XeLaTeX. Hence, to get a more unicode-friendly process going so that we could use trivial things such as Writing Systems That Aren’t Latin and do fun things like “→” instead of “->”, I used the Japanese support as a guide. It has its own separate pdf compilation script in the LaTeX Makefile, so I took that and made it awesome by converting it from pLaTeX to XeLaTeX, and use it for ALL the locales.

All the locales get modified by patches. There’s a patch to swap out the pLaTeX Makefile stuff with XeLaTeX, and tweak a .sty that was overriding the heading font. The .tex file is modified by another patch to make it work with XeLaTeX. Finally, a handful of patches get applied to a few individual translations to ensure that they are using decent fonts for their writing systems and other select tweaks.

The majority of the XeLaTeX changes could be put into the preamble in the conf.py file:

latex_preamble = '''
\\RequirePackage{ifxetex}
\\RequireXeTeX
\\usepackage{xltxtra} %xltxtra = fontspec, xunicode, etc.
\\usepackage{verbatim}
\\usepackage{url}
\\usepackage{fontspec}
\\setmainfont{FreeSerif}
\\usepackage{amsmath}
\\usepackage{amsfonts}
\\usepackage{xunicode}
'''

The resulting Make incantation is thus (and not for the faint of heart):

latexpdf:
$(foreach TRANSLATION,$(UNSUPPORTED), \
mkdir -p source/locales/$(TRANSLATION)/LC_MESSAGES
cp -n /usr/share/locale-langpack/en_AU/LC_MESSAGES/sphinx.mo source/locales/$(TRANSLATION)/LC_MESSAGES/sphinx.mo \
;)
$(foreach TRANSLATION,$(TRANSLATIONS), \
git checkout source/images ; \
cp -ra localeimages/$(TRANSLATION)/* source/images ; \
$(SPHINXBUILD) -a -D language=$(TRANSLATION) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
cp patches/makesty.patch $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
cp patches/tex.patch $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
patch --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/makesty.patch; \
patch --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/tex.patch \
;)
$(foreach TRANSLATION,$(CLEAN), \
make -C $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) all-pdf-ja \
;)
$(foreach TRANSLATION,$(PATCHED), \
cp patches/$(TRANSLATION).patch $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION); \
patch --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/$(TRANSLATION).patch; \
make -C $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) all-pdf-ja;  \
patch -R --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/$(TRANSLATION).patch \
;)
$(foreach TRANSLATION,$(TRANSLATIONS), \
patch -R --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/tex.patch; \
patch -R --directory=$(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION) -p1 < $(BUILDDIR)/$(MAHARA)/latex/$(TRANSLATION)/makesty.patch \
;)
git checkout source/images
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/$(MAHARA)/latex."

In summary, what this does is:

  • Copies an existing sphinx.mo into the locales directory for Unsupported languages
  • Makes sure the images are default as per git, then copies the localised images over the top of the default ones for each translation
  • Then runs sphinx-build
  • And does the common patching
  • Invokes make on the locales that do not need additional patching
  • Performs the additional patching
  • Invokes make on the locales which did need additional patching
  • Reverses all the patches
  • Makes sure the images are clean as per git once more.

After all of that, cron (or whatever triggers the build) should run something like this:

make update html epub latexpdf MAHARA=1.5

Once we have a server for this to live on and more people contribute translations, there are undoubtedly some edge cases that we’ll come across that I haven’t accounted for (I just don’t have the data right now to find them,) and the process will need tweaking.

I’m pretty sure there are better ways to do some of this (since I was learning many of the components while winging it,) but this is what I’ve ended up with. What started out a seemingly simple task, wasn’t.

Posted in Unsorted | Tagged | 1 Comment

Open Source Library system threatened by trademark

Koha is an open source library management system that has been around for about 12 years. It began its life when a rural New Zealand library decided to bake their own system rather than change to something else. The Horowhenua Library Trust opened the source and gave Koha to the world. Koha is now used in a huge number of libraries globally and is highly respected.

I found out on Tuesday via a colleague* that the project is now facing a legal threat in the form of a trademark application.

While I won’t go into the finer details about the offending company in question (the last link has hints if you’re so inclined), I can assure you that they are not acting in the best interests of this open source project.

Koha is actually a word in the local Te Reo Māori language that means gift, and is also the custom of gift giving. The offending company is not only threatening the name of a project, they are attempting to trademark an important word in the local culture. Te Reo Māori is not a dead language by far; over 130000 people can hold at least some conversation in it, and in New Zealand many signs (especially government ones) include a Te Reo translation.

The Trust has made up its mind as to how they are going to deal with this, and that is their choice and their choice alone; they are going to defend the name and the Māori word by challenging the trademark application. To do this however, they will require help.

Right now the best help you can bestow upon the Trust and the project is to chip in a few dollars to help with the associated costs, or if the pennies are tight, you can assist in spreading the word.

After speaking to Jono on Tuesday about the situation, he graciously tweeted to help — please follow his example.

If you don’t feel this is worthwhile , then please please please don’t interject to be dismissive. Be courteous and respectful; this is far more than a trademark issue to many of the people involved. The decision to defend the name is their prerogative, either help or let it be.

 

* Full disclosure: The company I work for contributes to Koha and offers training and other related services.

Posted in Unsorted | Tagged , , , | 2 Comments

Countries and avatars and PEARs

Nearly three months ago now I left Australia and moved to New Zealand. I was fortunate enough to essentially transfer within the company I work for, so it was reasonably painless (except for the 9 weeks of waiting for my boxes of stuff to get shipped).

The office here is significantly larger than the Australian office (being where the company started and all) and I’m really enjoying having lots of smart people around and getting to work on a much larger variety of projects.

As with how these things go in free/libre/opensource-land, one can accidentally get involved in the projects of colleagues.

Libravatar is an AGPL federated alternative for the Gravatar service, allowing domain holders to serve their own avatars for email and openid from their own infrastructure, rather than trusting external closed services. Francois will be presenting about it at OSCON. He has stickers.

One of the things we’re working on to make for better adoption of Libravatar is making sure there are libraries available for various languages so applications can incorporate it as easily as they can Gravatar (or easier!). I volunteered to have a crack at making a PEAR package.

I’d never done PEAR before. The documentation is somewhat obscure, but I found a good article on, of all places, Zend.com.

So that’s sweet. Write your class (in the correct file structure), copy and customise the makepackage.php file from the docs for PEAR_PackageFileManager2 (pear install PEAR_PackageFileManager2), run php makepackage.php make to generate the xml build instructions (which expire at midnight for fun!) and then simply pear package. Voila, you made a PEAR package. Amazingly simple once you know how.

The next part, however, isn’t simple.

If you want your package to be in the Official PEAR Channel, you have to walk over some burning coals, hop through a ring of fire and some random other tribulations thrown in for good measure.

The easiest of this is getting your code to comply with the PEAR Coding Standards. I strongly recommend employing phpcs aka PHP_CodeSniffer (it’s a PEAR package! pear install PHP_CodeSniffer).

You may be tempted to have some documentation generated with phpdoc in your actual package. Because it’s useful and will make your stuff more usable. You can’t. The .css files don’t pass the PHP_CodeSniffer PEAR standards tests. It took me nearly a month to get a definitive answer on whether it was possible to keep them in.

The next part is the peer review. I first proposed the package about a month ago. I got some suggestions about style, some pointing at the coding standards and file structure, and a suggestion to depend on a PEAR package to do what a core php function does better. Because of php 5.2. Which is EOL’d. I refused the dependency for dependency sake suggestion. Because, lolwat?

What this review stage didn’t pick up was a handful of bugs that made the package not work right. They’ve been fixed.

Earlier this week I moved the proposal to the voting stage. Now, apparently, this is where the code review actually starts. So far I’ve had suggestions to… basically rewrite it, and dependency for dependency sake, again. This was stated as conditional for a +1 vote.

Sigh.

Well, I’ll see how it goes. I cannot see technical reasoning other than preference for the suggestions, and nor can others who are fed up with the PEAR proposal process just from watching me go through it. Even if it fails to get into the main PEAR channel, it’s not game over.

Back when I was looking for an answer as to whether I could keep the phpdoc generated documentation in the package, I asked a friend who is involved in the php community if she knew. She asked around for me, and the overwhelming response she got was that PEAR was out of fashion because of the politics, and to not bother. Given how other languages successfully manage to have extension management, and how predominant php is, it’s truly a crying shame.

The cpan library and pypi library have been available for a while now either as part of the gravatar library (cpan) or independently (pypi). PEAR doesn’t even offer a Gravatar library. Just sayin’!

Posted in Unsorted | 2 Comments

Ouch, my brain…

I’m in the web development business. I realise fully that consolidating signon stuff is complex and all that but surely, surely, there’s got to be something better than this:

  1. Clicking a login link on Ubuntu wiki to tend to a page.
  2. Being told that the “OpenID verification requires that you click this button”
  3. Being redirected to a form for my username and password.
  4. Being logged in and reaching the SSO profile page with the list of places I’ve logged in to ever. No indication of a “where you tried to log in to” link.
  5. Clicking the wiki.ubuntu.com one because that’s where I was trying to get to.
  6. Getting sent to the front page of wiki.ubuntu.com… logged out.
  7. Clicking the login link, again.
  8. Being told that the “OpenID verification requires that you click this button”
  9. Confirm that the information on the screen is me.
  10. Getting sent back to the wiki.ubuntu.com front page.
  11. Now all I have to do is find the page I was trying to edit.

I’m really hoping this was not normal operation, or I that did something wrong.

Is this really what our new contributors face?

Please say no. Pretty please. With a cherry…

Posted in Unsorted | 5 Comments

Dear USA, This is a tea party. Love, the Brits.

Dear USians, *this* is a tea party, love, the Brits

DoctorMo and some fellow UK folk find a quiet corner at UDS for a real tea party. Complete with biscuits.

Posted in Unsorted | 6 Comments