Edgewall Software

source: trunk/doc/i18n.txt

Last change on this file was 1150, checked in by jruigrok, 13 years ago

Pull up r1147 to trunk.

Correct reference to i18n namespace in documentation.

  • Property svn:eol-style set to native
  • Property svn:mime-type set to text/x-rst
File size: 18.2 KB
RevLine 
[634]1.. -*- mode: rst; encoding: utf-8 -*-
2
3=====================================
4Internationalization and Localization
5=====================================
6
[1109]7Genshi provides comprehensive supporting infrastructure for internationalizing
[1108]8and localizing templates. That includes functionality for extracting
[1113]9localizable strings from templates, as well as a template filter and special
10directives that can apply translations to templates as they get rendered.
[634]11
12This support is based on `gettext`_ message catalogs and the `gettext Python
[1108]13module`_. The extraction process can be used from the API level, or through
14the front-ends implemented by the `Babel`_ project, for which Genshi provides
15a plugin.
[634]16
17.. _`gettext`: http://www.gnu.org/software/gettext/
18.. _`gettext python module`: http://docs.python.org/lib/module-gettext.html
19.. _`babel`: http://babel.edgewall.org/
20
21
22.. contents:: Contents
23   :depth: 2
24.. sectnum::
25
26
27Basics
28======
29
30The simplest way to internationalize and translate templates would be to wrap
[1108]31all localizable strings in a ``gettext()`` function call (which is often
32aliased to ``_()`` for brevity). In that case, no extra template filter is
33required.
[634]34
35.. code-block:: genshi
36
37  <p>${_("Hello, world!")}</p>
38
[1108]39However, this approach results in significant “character noise” in templates,
[634]40making them harder to read and preview.
41
42The ``genshi.filters.Translator`` filter allows you to get rid of the
[1113]43explicit `gettext`_ function calls, so you can (often) just continue to write:
[634]44
45.. code-block:: genshi
46
47  <p>Hello, world!</p>
48
49This text will still be extracted and translated as if you had wrapped it in a
50``_()`` call.
51
[1108]52.. note:: For parameterized or pluralizable messages, you need to use the
53          special `template directives`_ described below, or use the
54          corresponding ``gettext`` function in embedded Python expressions.
[634]55
[1108]56You can control which tags should be ignored by this process; for example, it
[634]57doesn't really make sense to translate the content of the HTML
58``<script></script>`` element. Both ``<script>`` and ``<style>`` are excluded
59by default.
60
61Attribute values can also be automatically translated. The default is to
[1108]62consider the attributes ``abbr``, ``alt``, ``label``, ``prompt``, ``standby``,
63``summary``, and ``title``, which is a list that makes sense for HTML
64documents.  Of course, you can tell the translator to use a different set of
65attribute names, or none at all.
[634]66
[1108]67----------------
68Language Tagging
69----------------
[634]70
[1108]71You can control automatic translation in your templates using the ``xml:lang``
72attribute. If the value of that attribute is a literal string, the contents and
73attributes of the element will be ignored:
74
[634]75.. code-block:: genshi
76
77  <p xml:lang="en">Hello, world!</p>
78
79On the other hand, if the value of the ``xml:lang`` attribute contains a Python
80expression, the element contents and attributes are still considered for
81automatic translation:
82
83.. code-block:: genshi
84
85  <html xml:lang="$locale">
86    ...
87  </html>
88
89
[1108]90.. _`template directives`:
91
92Template Directives
93===================
94
95Sometimes localizable strings in templates may contain dynamic parameters, or
96they may depend on the numeric value of some variable to choose a proper
97plural form. Sometimes the strings contain embedded markup, such as tags for
98emphasis or hyperlinks, and you don't want to rely on the people doing the
99translations to know the syntax and escaping rules of HTML and XML.
100
101In those cases the simple text extraction and translation process described
102above is not sufficient. You could just use ``gettext`` API functions in
[1113]103embedded Python expressions for parameters and pluralization, but that does
104not help when messages contain embedded markup. Genshi provides special
105template directives for internationalization that attempt to provide a
106comprehensive solution for this problem space.
[1108]107
108To enable these directives, you'll need to register them with the templates
109they are used in. You can do this by adding them manually via the
110``Template.add_directives(namespace, factory)`` (where ``namespace`` would be
111“http://genshi.edgewall.org/i18n” and ``factory`` would be an instance of the
112``Translator`` class). Or you can just call the ``Translator.setup(template)``
113class method, which both registers the directives and adds the translation
114filter.
115
[1113]116After the directives have been registered with the template engine on the
117Python side of your application, you need to declare the corresponding
118directive namespace in all markup templates that use them. For example:
119
120.. code-block:: genshi
121
122  <html xmlns:py="http://genshi.edgewall.org/"
[1150]123        xmlns:i18n="http://genshi.edgewall.org/i18n">
[1113]124    …
125  </html>
126
127These directives only make sense in the context of `markup templates`_. For
128`text templates`_, you can just use the corresponding ``gettext`` API calls as needed.
129
[1108]130.. note:: The internationalization directives are still somewhat experimental
131          and have some known issues. However, the attribute language they
132          implement should be stable and is not subject to change
133          substantially in future versions.
134
[1113]135.. _`markup templates`: xml-templates.html
136.. _`text templates`: text-templates.html
[1108]137
138--------
139Messages
140--------
141
142``i18n:msg``
143------------
144
145This is the basic directive for defining localizable text passages that
146contain parameters and/or markup.
147
148For example, consider the following template snippet:
149
150.. code-block:: genshi
151
152  <p>
153    Please visit <a href="${site.url}">${site.name}</a> for help.
154  </p>
155
156Without further annotation, the translation filter would treat this sentence
[1110]157as two separate messages (“Please visit” and “for help”), and the translator
158would have no control over the position of the link in the sentence.
[1108]159
160However, when you use the Genshi internationalization directives, you simply
161add an ``i18n:msg`` attribute to the enclosing ``<p>`` element:
162
163.. code-block:: genshi
164
165  <p i18n:msg="name">
166    Please visit <a href="${site.url}">${site.name}</a> for help.
167  </p>
168
169Genshi is then able to identify the text in the ``<p>`` element as a single
170message for translation purposes. You'll see the following string in your
171message catalog::
172
173  Please visit [1:%(name)s] for help.
174
175The `<a>` element with its attribute has been replaced by a part in square
176brackets, which does not include the tag name or the attributes of the element.
177
178The value of the ``i18n:msg`` attribute is a comma-separated list of parameter
179names, which serve as simplified aliases for the actual Python expressions the
180message contains. The order of the paramer names in the list must correspond
181to the order of the expressions in the text. In this example, there is only
182one parameter: its alias for translation is “name”, while the corresponding
183expression is ``${site.name}``.
184
185The translator now has complete control over the structure of the sentence. He
186or she certainly does need to make sure that any bracketed parts are not
187removed, and that the ``name`` parameter is preserved correctly. But those are
188things that can be easily checked by validating the message catalogs. The
189important thing is that the translator can change the sentence structure, and
190has no way to break the application by forgetting to close a tag, for example.
191
192So if the German translator of this snippet decided to translate it to::
193
194  Um Hilfe zu erhalten, besuchen Sie bitte [1:%(name)s]
195
196The resulting output might be:
197
[1112]198.. code-block:: xml
[1108]199
200  <p>
201    Um Hilfe zu erhalten, besuchen Sie bitte
202    <a href="http://example.com/">Example</a>
203  </p>
204
[1110]205Messages may contain multiple tags, and they may also be nested. For example:
[1108]206
207.. code-block:: genshi
208
209  <p i18n:msg="name">
210    <i>Please</i> visit <b>the site <a href="${site.url}">${site.name}</a></b>
211    for help.
212  </p>
213
214This would result in the following message ID::
215
216  [1:Please] visit [2:the site [3:%(name)s]] for help.
217
218Again, the translator has full control over the structure of the sentence. So
219the German translation could actually look like this::
220
221  Um Hilfe zu erhalten besuchen Sie [1:bitte]
222  [3:%(name)s], [2:das ist eine Web-Site]
223
224Which Genshi would recompose into the following outout:
225
[1112]226.. code-block:: xml
[1108]227
228  <p>
229    Um Hilfe zu erhalten besuchen Sie <i>bitte</i>
230    <a href="http://example.com/">Example</a>, <b>das ist eine Web-Site</b>
231  </p>
232
233Note how the translation has changed the order and even the nesting of the
234tags.
235
236.. warning:: Please note that ``i18n:msg`` directives do not support other
237             nested directives. Directives commonly change the structure of
238             the generated markup dynamically, which often would result in the
239             structure of the text changing, thus making translation as a
240             single message ineffective.
241
[1111]242``i18n:choose``, ``i18n:singular``, ``i18n:plural``
243---------------------------------------------------
244
245Translatable strings that vary based on some number of objects, such as “You
246have 1 new message” or “You have 3 new messages”, present their own challenge,
247in particular when you consider that different languages have different rules
248for pluralization. For example, while English and most western languages have
249two plural forms (one for ``n=1`` and an other for ``n<>1``), Welsh has five
250different plural forms, while Hungarian only has one.
251
252The ``gettext`` framework has long supported this via the ``ngettext()``
253family of functions. You specify two default messages, one singular and one
254plural, and the number of items. The translations however may contain any
255number of plural forms for the message, depending on how many are commonly
256used in the language. ``ngettext`` will choose the correct plural form of the
257translated message based on the specified number of items.
258
259Genshi provides a variant of the ``i18n:msg`` directive described above that
260allows choosing the proper plural form based on the numeric value of a given
261variable. The pluralization support is implemented in a set of three
262directives that must be used together: ``i18n:choose``, ``i18n:singular``, and
263``i18n:plural``.
264
265The ``i18n:choose`` directive is used to set up the context of the message: it
266simply wraps the singular and plural variants.
267
268The value of this directive is split into two parts: the first is the
269*numeral*, a Python expression that evaluates to a number to determine which
270plural form should be chosen. The second part, separated by a semicolon, lists
271the parameter names. This part is equivalent to the value of the ``i18n:msg``
272directive.
273
274For example:
275
276.. code-block:: genshi
277
278  <p i18n:choose="len(messages); num">
279    <i18n:singular>You have <b>${len(messages)}</b> new message.</i18n:singular>
280    <i18n:plural>You have <b>${len(messages)}</b> new messages.</i18n:plural>
281  </p>
282
283All three directives can be used either as elements or attribute. So the above
284example could also be written as follows:
285
286.. code-block:: genshi
287
288  <i18n:choose numeral="len(messages)" params="num">
289    <p i18n:singular="">You have <b>${len(messages)}</b> new message.</p>
290    <p i18n:plural="">You have <b>${len(messages)}</b> new messages.</p>
291  </i18n:choose>
292
293When used as an element, the two parts of the ``i18n:choose`` value are split
294into two different attributes: ``numeral`` and ``params``. The
295``i18n:singular`` and ``i18n:plural`` directives do not require or support any
296value (or any extra attributes).
297
298--------------------
299Comments and Domains
300--------------------
301
[1108]302``i18n:comment``
303----------------
304
305The ``i18n:comment`` directive can be used to supply a comment for the
306translator. For example, if a template snippet is not easily understood
307outside of its context, you can add a translator comment to help the
308translator understand in what context the message will be used:
309
310.. code-block:: genshi
311
312  <p i18n:msg="name" i18n:comment="Link to the relevant support site">
[1110]313    Please visit <a href="${site.url}">${site.name}</a> for help.
[1108]314  </p>
315
316This comment will be extracted together with the message itself, and will
317commonly be placed along the message in the message catalog, so that it is
318easily visible to the person doing the translation.
319
320This directive has no impact on how the template is rendered, and is ignored
321outside of the extraction process.
322
[1111]323``i18n:domain``
324---------------
[1108]325
[1111]326In larger projects, message catalogs are commonly split up into different
327*domains*. For example, you might have a core application domain, and then
328separate domains for extensions or libraries.
[1108]329
[1111]330Genshi provides a directive called ``i18n:domain`` that lets you choose the
331translation domain for a particular scope. For example:
[1110]332
[1111]333.. code-block:: genshi
[1110]334
[1111]335  <div i18n:domain="examples">
336    <p>Hello, world!</p>
337  </div>
[1110]338
[1108]339
[634]340Extraction
341==========
342
343The ``Translator`` class provides a class method called ``extract``, which is
344a generator yielding all localizable strings found in a template or markup
345stream. This includes both literal strings in text nodes and attribute values,
346as well as strings in ``gettext()`` calls in embedded Python code. See the API
347documentation for details on how to use this method directly.
348
[1108]349-----------------
350Babel Integration
351-----------------
352
353This functionality is integrated with the message extraction framework provided
[634]354by the `Babel`_ project. Babel provides a command-line interface as well as
355commands that can be used from ``setup.py`` scripts using `Setuptools`_ or
356`Distutils`_.
357
358.. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools
359.. _`distutils`: http://docs.python.org/dist/dist.html
360
361The first thing you need to do to make Babel extract messages from Genshi
362templates is to let Babel know which files are Genshi templates. This is done
363using a “mapping configuration”, which can be stored in a configuration file,
364or specified directly in your ``setup.py``.
365
366In a configuration file, the mapping may look like this:
367
368.. code-block:: ini
369
370  # Python souce
371  [python:**.py]
372
373  # Genshi templates
374  [genshi:**/templates/**.html]
375  include_attrs = title
376
377  [genshi:**/templates/**.txt]
378  template_class = genshi.template.TextTemplate
379  encoding = latin-1
380
381Please consult the Babel documentation for details on configuration.
382
383If all goes well, running the extraction with Babel should create a POT file
384containing the strings from your Genshi templates and your Python source files.
385
386
387---------------------
388Configuration Options
389---------------------
390
391The Genshi extraction plugin for Babel supports the following options:
392
393``template_class``
394------------------
395The concrete ``Template`` class that the file should be loaded with. Specify
396the package/module name and the class name, separated by a colon.
397
398The default is to use ``genshi.template:MarkupTemplate``, and you'll want to
399set it to ``genshi.template:TextTemplate`` for `text templates`_.
400
401.. _`text templates`: text-templates.html
402
403``encoding``
[708]404------------
[634]405The encoding of the template file. This is only used for text templates. The
406default is to assume “utf-8”.
407
408``include_attrs``
[708]409-----------------
[634]410Comma-separated list of attribute names that should be considered to have
411localizable values. Only used for markup templates.
412
[708]413``ignore_tags``
414---------------
[634]415Comma-separated list of tag names that should be ignored. Only used for markup
416templates.
417
[708]418``extract_text``
419----------------
420Whether text outside explicit ``gettext`` function calls should be extracted.
421By default, any text nodes not inside ignored tags, and values of attribute in
422the ``include_attrs`` list are extracted. If this option is disabled, only
423strings in ``gettext`` function calls are extracted.
[634]424
[1113]425.. note:: If you disable this option, and do not make use of the
426          internationalization directives, it's not necessary to add the
427          translation filter as described above. You only need to make sure
428          that the template has access to the ``gettext`` functions it uses.
[708]429
430
[634]431Translation
432===========
433
434If you have prepared MO files for use with Genshi using the appropriate tools,
435you can access the message catalogs with the `gettext Python module`_. You'll
436probably want to create a ``gettext.GNUTranslations`` instance, and make the
437translation functions it provides available to your templates by putting them
438in the template context.
439
440The ``Translator`` filter needs to be added to the filters of the template
441(applying it as a stream filter will likely not have the desired effect).
442Furthermore it needs to be the first filter in the list, including the internal
443filters that Genshi adds itself:
444
445.. code-block:: python
446
447  from genshi.filters import Translator
448  from genshi.template import MarkupTemplate
449 
450  template = MarkupTemplate("...")
451  template.filters.insert(0, Translator(translations.ugettext))
452
[1113]453The ``Translator`` class also provides the convenience method ``setup()``,
454which will both add the filter and register the i18n directives:
[634]455
456.. code-block:: python
457
458  from genshi.filters import Translator
[1113]459  from genshi.template import MarkupTemplate
[634]460 
[1113]461  template = MarkupTemplate("...")
462  translator = Translator(translations.ugettext)
463  translator.setup(template)
[634]464
[1113]465.. warning:: If you're using ``TemplateLoader``, you should specify a
466            `callback function`_ in which you add the filter. That ensures
467            that the filter is not added everytime the template is rendered,
468            thereby being applied multiple times.
[634]469
[1113]470.. _`callback function`: loader.html#callback-interface
[634]471
[1113]472
[634]473Related Considerations
474======================
475
476If you intend to produce an application that is fully prepared for an
477international audience, there are a couple of other things to keep in mind:
478
479-------
480Unicode
481-------
482
483Use ``unicode`` internally, not encoded bytestrings. Only encode/decode where
484data enters or exits the system. This means that your code works with characters
485and not just with bytes, which is an important distinction for example when
486calculating the length of a piece of text. When you need to decode/encode, it's
487probably a good idea to use UTF-8.
488
489-------------
490Date and Time
491-------------
492
493If your application uses datetime information that should be displayed to users
494in different timezones, you should try to work with UTC (universal time)
495internally. Do the conversion from and to "local time" when the data enters or
496exits the system. Make use the Python `datetime`_ module and the third-party
497`pytz`_ package.
498
499--------------------------
500Formatting and Locale Data
501--------------------------
502
503Make sure you check out the functionality provided by the `Babel`_ project for
504things like number and date formatting, locale display strings, etc.
505
506.. _`datetime`: http://docs.python.org/lib/module-datetime.html
507.. _`pytz`: http://pytz.sourceforge.net/
Note: See TracBrowser for help on using the repository browser.