Internationalization and Localization

Genshi provides basic supporting infrastructure for internationalizing and localizing templates. That includes functionality for extracting localizable strings from templates, as well as a template filter that can apply translations to templates as they get rendered.

This support is based on gettext message catalogs and the gettext Python module. The extraction process can be used from the API level, or through the front-ends implemented by the Babel project, for which Genshi provides a plugin.

1   Basics

The simplest way to internationalize and translate templates would be to wrap all localizable strings in a gettext() function call (which is often aliased to _() for brevity). In that case, no extra template filter is required.

<p>${_("Hello, world!")}</p>

However, this approach results in significant “character noise” in templates, making them harder to read and preview.

The genshi.filters.Translator filter allows you to get rid of the explicit gettext function calls, so you can continue to just write:

<p>Hello, world!</p>

This text will still be extracted and translated as if you had wrapped it in a _() call.


For parameterized or pluralizable messages, you need to continue using the appropriate gettext functions.

You can control which tags should be ignored by this process; for example, it doesn't really make sense to translate the content of the HTML <script></script> element. Both <script> and <style> are excluded by default.

Attribute values can also be automatically translated. The default is to consider the attributes abbr, alt, label, prompt, standby, summary, and title, which is a list that makes sense for HTML documents. Of course, you can tell the translator to use a different set of attribute names, or none at all.

In addition, you can control automatic translation in your templates using the xml:lang attribute. If the value of that attribute is a literal string, the contents and attributes of the element will be ignored:

<p xml:lang="en">Hello, world!</p>

On the other hand, if the value of the xml:lang attribute contains a Python expression, the element contents and attributes are still considered for automatic translation:

<html xml:lang="$locale">

2   Extraction

The Translator class provides a class method called extract, which is a generator yielding all localizable strings found in a template or markup stream. This includes both literal strings in text nodes and attribute values, as well as strings in gettext() calls in embedded Python code. See the API documentation for details on how to use this method directly.

This functionality is integrated into the message extraction framework provided by the Babel project. Babel provides a command-line interface as well as commands that can be used from setup.py scripts using Setuptools or Distutils.

The first thing you need to do to make Babel extract messages from Genshi templates is to let Babel know which files are Genshi templates. This is done using a “mapping configuration”, which can be stored in a configuration file, or specified directly in your setup.py.

In a configuration file, the mapping may look like this:

# Python souce
# Genshi templates
include_attrs = title
template_class = genshi.template.TextTemplate
encoding = latin-1

Please consult the Babel documentation for details on configuration.

If all goes well, running the extraction with Babel should create a POT file containing the strings from your Genshi templates and your Python source files.


Genshi currently does not support “translator comments”, i.e. text in template comments that would get added to the POT file. This support may or may not be added in future versions.

2.1   Configuration Options

The Genshi extraction plugin for Babel supports the following options:

2.1.1   template_class

The concrete Template class that the file should be loaded with. Specify the package/module name and the class name, separated by a colon.

The default is to use genshi.template:MarkupTemplate, and you'll want to set it to genshi.template:TextTemplate for text templates.

2.1.2   encoding

The encoding of the template file. This is only used for text templates. The default is to assume “utf-8”.

2.1.3   include_attrs

Comma-separated list of attribute names that should be considered to have localizable values. Only used for markup templates.

2.1.4   ignore_tags

Comma-separated list of tag names that should be ignored. Only used for markup templates.

2.1.5   extract_text

Whether text outside explicit gettext function calls should be extracted. By default, any text nodes not inside ignored tags, and values of attribute in the include_attrs list are extracted. If this option is disabled, only strings in gettext function calls are extracted.


If you disable this option, it's not necessary to add the translation filter as described above. You only need to make sure that the template has access to the gettext functions it uses.

3   Translation

If you have prepared MO files for use with Genshi using the appropriate tools, you can access the message catalogs with the gettext Python module. You'll probably want to create a gettext.GNUTranslations instance, and make the translation functions it provides available to your templates by putting them in the template context.

The Translator filter needs to be added to the filters of the template (applying it as a stream filter will likely not have the desired effect). Furthermore it needs to be the first filter in the list, including the internal filters that Genshi adds itself:

from genshi.filters import Translator
from genshi.template import MarkupTemplate
template = MarkupTemplate("...")
template.filters.insert(0, Translator(translations.ugettext))

If you're using TemplateLoader, you should specify a callback function in which you add the filter:

from genshi.filters import Translator
from genshi.template import TemplateLoader
def template_loaded(template):
    template.filters.insert(0, Translator(translations.ugettext))
loader = TemplateLoader('templates', callback=template_loaded)
template = loader.load("...")

This approach ensures that the filter is not added everytime the template is loaded, and thus being applied multiple times.

