view doc/i18n.txt @ 886:9bd255289d75

Set reST MIME type on new loader doc page.
author cmlenz
date Fri, 16 Apr 2010 22:44:29 +0000
parents 80f80f2eec6a
children 148c17f49111
line wrap: on
line source
.. -*- mode: rst; encoding: utf-8 -*-

=====================================
Internationalization and Localization
=====================================

Genshi provides comprehensive supporting infrastructure for internationalizing
and localizing templates. That includes functionality for extracting
localizable strings from templates, as well as a template filter that can
apply translations to templates as they get rendered.

This support is based on `gettext`_ message catalogs and the `gettext Python 
module`_. The extraction process can be used from the API level, or through
the front-ends implemented by the `Babel`_ project, for which Genshi provides
a plugin.

.. _`gettext`: http://www.gnu.org/software/gettext/
.. _`gettext python module`: http://docs.python.org/lib/module-gettext.html
.. _`babel`: http://babel.edgewall.org/


.. contents:: Contents
   :depth: 2
.. sectnum::


Basics
======

The simplest way to internationalize and translate templates would be to wrap
all localizable strings in a ``gettext()`` function call (which is often
aliased to ``_()`` for brevity). In that case, no extra template filter is
required.

.. code-block:: genshi

  <p>${_("Hello, world!")}</p>

However, this approach results in significant “character noise” in templates,
making them harder to read and preview.

The ``genshi.filters.Translator`` filter allows you to get rid of the 
explicit `gettext`_ function calls, so you can continue to just write:

.. code-block:: genshi

  <p>Hello, world!</p>

This text will still be extracted and translated as if you had wrapped it in a
``_()`` call.

.. note:: For parameterized or pluralizable messages, you need to use the
          special `template directives`_ described below, or use the
          corresponding ``gettext`` function in embedded Python expressions.

You can control which tags should be ignored by this process; for example, it
doesn't really make sense to translate the content of the HTML 
``<script></script>`` element. Both ``<script>`` and ``<style>`` are excluded
by default.

Attribute values can also be automatically translated. The default is to 
consider the attributes ``abbr``, ``alt``, ``label``, ``prompt``, ``standby``,
``summary``, and ``title``, which is a list that makes sense for HTML
documents.  Of course, you can tell the translator to use a different set of
attribute names, or none at all.

----------------
Language Tagging
----------------

You can control automatic translation in your templates using the ``xml:lang``
attribute. If the value of that attribute is a literal string, the contents and
attributes of the element will be ignored:

.. code-block:: genshi

  <p xml:lang="en">Hello, world!</p>

On the other hand, if the value of the ``xml:lang`` attribute contains a Python
expression, the element contents and attributes are still considered for 
automatic translation:

.. code-block:: genshi

  <html xml:lang="$locale">
    ...
  </html>


.. _`template directives`:

Template Directives
===================

Sometimes localizable strings in templates may contain dynamic parameters, or
they may depend on the numeric value of some variable to choose a proper
plural form. Sometimes the strings contain embedded markup, such as tags for
emphasis or hyperlinks, and you don't want to rely on the people doing the
translations to know the syntax and escaping rules of HTML and XML.

In those cases the simple text extraction and translation process described
above is not sufficient. You could just use ``gettext`` API functions in
embedded Python expressions for parameters and pluralization, but Genshi also
provides special template directives for internationalization that attempt to
provide a comprehensive solution for this problem space.

To enable these directives, you'll need to register them with the templates
they are used in. You can do this by adding them manually via the
``Template.add_directives(namespace, factory)`` (where ``namespace`` would be
“http://genshi.edgewall.org/i18n” and ``factory`` would be an instance of the
``Translator`` class). Or you can just call the ``Translator.setup(template)``
class method, which both registers the directives and adds the translation
filter.

.. note:: The internationalization directives are still somewhat experimental
          and have some known issues. However, the attribute language they
          implement should be stable and is not subject to change
          substantially in future versions.


--------
Messages
--------

``i18n:msg``
------------

This is the basic directive for defining localizable text passages that
contain parameters and/or markup.

For example, consider the following template snippet:

.. code-block:: genshi

  <p>
    Please visit <a href="${site.url}">${site.name}</a> for help.
  </p>

Without further annotation, the translation filter would treat this sentence
as two separate messages (“Please click” and “here”), and the translator would
have no control over the position of the link in the sentence.

However, when you use the Genshi internationalization directives, you simply
add an ``i18n:msg`` attribute to the enclosing ``<p>`` element:

.. code-block:: genshi

  <p i18n:msg="name">
    Please visit <a href="${site.url}">${site.name}</a> for help.
  </p>

Genshi is then able to identify the text in the ``<p>`` element as a single
message for translation purposes. You'll see the following string in your
message catalog::

  Please visit [1:%(name)s] for help.

The `<a>` element with its attribute has been replaced by a part in square
brackets, which does not include the tag name or the attributes of the element.

The value of the ``i18n:msg`` attribute is a comma-separated list of parameter
names, which serve as simplified aliases for the actual Python expressions the
message contains. The order of the paramer names in the list must correspond
to the order of the expressions in the text. In this example, there is only
one parameter: its alias for translation is “name”, while the corresponding
expression is ``${site.name}``.

The translator now has complete control over the structure of the sentence. He
or she certainly does need to make sure that any bracketed parts are not
removed, and that the ``name`` parameter is preserved correctly. But those are
things that can be easily checked by validating the message catalogs. The
important thing is that the translator can change the sentence structure, and
has no way to break the application by forgetting to close a tag, for example.

So if the German translator of this snippet decided to translate it to::

  Um Hilfe zu erhalten, besuchen Sie bitte [1:%(name)s]

The resulting output might be:

.. code-block:: html

  <p>
    Um Hilfe zu erhalten, besuchen Sie bitte
    <a href="http://example.com/">Example</a>
  </p>

Please note that messages may contain multiple tags, and they may also be
nested. For example:

.. code-block:: genshi

  <p i18n:msg="name">
    <i>Please</i> visit <b>the site <a href="${site.url}">${site.name}</a></b>
    for help.
  </p>

This would result in the following message ID::

  [1:Please] visit [2:the site [3:%(name)s]] for help.

Again, the translator has full control over the structure of the sentence. So
the German translation could actually look like this::

  Um Hilfe zu erhalten besuchen Sie [1:bitte]
  [3:%(name)s], [2:das ist eine Web-Site]

Which Genshi would recompose into the following outout:

.. code-block:: html

  <p>
    Um Hilfe zu erhalten besuchen Sie <i>bitte</i>
    <a href="http://example.com/">Example</a>, <b>das ist eine Web-Site</b>
  </p>

Note how the translation has changed the order and even the nesting of the
tags.

.. warning:: Please note that ``i18n:msg`` directives do not support other
             nested directives. Directives commonly change the structure of
             the generated markup dynamically, which often would result in the
             structure of the text changing, thus making translation as a
             single message ineffective.

``i18n:comment``
----------------

The ``i18n:comment`` directive can be used to supply a comment for the
translator. For example, if a template snippet is not easily understood
outside of its context, you can add a translator comment to help the
translator understand in what context the message will be used:

.. code-block:: genshi

  <p i18n:msg="name" i18n:comment="Link to the relevant support site">
    Please visit <a href="${site.url}">${site.name}</a> help.
  </p>

This comment will be extracted together with the message itself, and will
commonly be placed along the message in the message catalog, so that it is
easily visible to the person doing the translation.

This directive has no impact on how the template is rendered, and is ignored
outside of the extraction process.


-------------
Pluralization
-------------

``i18n:choose``, ``i18n:singular``, ``i18n:plural``
---------------------------------------------------

TODO

-------------------
Translation Domains
-------------------

``i18n:domain``
---------------

TODO

Extraction
==========

The ``Translator`` class provides a class method called ``extract``, which is
a generator yielding all localizable strings found in a template or markup 
stream. This includes both literal strings in text nodes and attribute values,
as well as strings in ``gettext()`` calls in embedded Python code. See the API
documentation for details on how to use this method directly.

-----------------
Babel Integration
-----------------

This functionality is integrated with the message extraction framework provided
by the `Babel`_ project. Babel provides a command-line interface as well as 
commands that can be used from ``setup.py`` scripts using `Setuptools`_ or 
`Distutils`_.

.. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools
.. _`distutils`: http://docs.python.org/dist/dist.html

The first thing you need to do to make Babel extract messages from Genshi 
templates is to let Babel know which files are Genshi templates. This is done
using a “mapping configuration”, which can be stored in a configuration file,
or specified directly in your ``setup.py``.

In a configuration file, the mapping may look like this:

.. code-block:: ini

  # Python souce
  [python:**.py]

  # Genshi templates
  [genshi:**/templates/**.html]
  include_attrs = title

  [genshi:**/templates/**.txt]
  template_class = genshi.template.TextTemplate
  encoding = latin-1

Please consult the Babel documentation for details on configuration.

If all goes well, running the extraction with Babel should create a POT file
containing the strings from your Genshi templates and your Python source files.


---------------------
Configuration Options
---------------------

The Genshi extraction plugin for Babel supports the following options:

``template_class``
------------------
The concrete ``Template`` class that the file should be loaded with. Specify
the package/module name and the class name, separated by a colon.

The default is to use ``genshi.template:MarkupTemplate``, and you'll want to
set it to ``genshi.template:TextTemplate`` for `text templates`_.

.. _`text templates`: text-templates.html

``encoding``
------------
The encoding of the template file. This is only used for text templates. The
default is to assume “utf-8”.

``include_attrs``
-----------------
Comma-separated list of attribute names that should be considered to have 
localizable values. Only used for markup templates.

``ignore_tags``
---------------
Comma-separated list of tag names that should be ignored. Only used for markup 
templates.

``extract_text``
----------------
Whether text outside explicit ``gettext`` function calls should be extracted.
By default, any text nodes not inside ignored tags, and values of attribute in
the ``include_attrs`` list are extracted. If this option is disabled, only 
strings in ``gettext`` function calls are extracted.

.. note:: If you disable this option, it's not necessary to add the translation
          filter as described above. You only need to make sure that the
          template has access to the ``gettext`` functions it uses.


Translation
===========

If you have prepared MO files for use with Genshi using the appropriate tools,
you can access the message catalogs with the `gettext Python module`_. You'll
probably want to create a ``gettext.GNUTranslations`` instance, and make the
translation functions it provides available to your templates by putting them
in the template context.

The ``Translator`` filter needs to be added to the filters of the template
(applying it as a stream filter will likely not have the desired effect).
Furthermore it needs to be the first filter in the list, including the internal
filters that Genshi adds itself:

.. code-block:: python

  from genshi.filters import Translator
  from genshi.template import MarkupTemplate
  
  template = MarkupTemplate("...")
  template.filters.insert(0, Translator(translations.ugettext))

If you're using `TemplateLoader`, you should specify a callback function in 
which you add the filter:

.. code-block:: python

  from genshi.filters import Translator
  from genshi.template import TemplateLoader
  
  def template_loaded(template):
      Translator(translations.ugettext).setup(template)
  
  loader = TemplateLoader('templates', callback=template_loaded)
  template = loader.load('test.html')

This approach ensures that the filter is not added everytime the template is 
loaded, and thus being applied multiple times.


Related Considerations
======================

If you intend to produce an application that is fully prepared for an 
international audience, there are a couple of other things to keep in mind:

-------
Unicode
-------

Use ``unicode`` internally, not encoded bytestrings. Only encode/decode where
data enters or exits the system. This means that your code works with characters
and not just with bytes, which is an important distinction for example when 
calculating the length of a piece of text. When you need to decode/encode, it's
probably a good idea to use UTF-8.

-------------
Date and Time
-------------

If your application uses datetime information that should be displayed to users 
in different timezones, you should try to work with UTC (universal time) 
internally. Do the conversion from and to "local time" when the data enters or 
exits the system. Make use the Python `datetime`_ module and the third-party 
`pytz`_ package.

--------------------------
Formatting and Locale Data
--------------------------

Make sure you check out the functionality provided by the `Babel`_ project for 
things like number and date formatting, locale display strings, etc.

.. _`datetime`: http://docs.python.org/lib/module-datetime.html
.. _`pytz`: http://pytz.sourceforge.net/
Copyright (C) 2012-2017 Edgewall Software