Mercurial > genshi > genshi-test
diff doc/i18n.txt @ 528:f38ce008ab0a
Integrated [http://babel.edgewall.org/ Babel] message extraction plugin, and added I18n doc page.
author | cmlenz |
---|---|
date | Wed, 20 Jun 2007 09:48:55 +0000 |
parents | |
children | 2a6cf641cb5e |
line wrap: on
line diff
new file mode 100644 --- /dev/null +++ b/doc/i18n.txt @@ -0,0 +1,237 @@ +.. -*- mode: rst; encoding: utf-8 -*- + +===================================== +Internationalization and Localization +===================================== + +Genshi provides basic supporting infrastructure for internationalizing +and localizing templates. That includes functionality for extracting localizable +strings from templates, as well as a template filter that can apply translations +to templates as they get rendered. + +This support is based on `gettext`_ message catalogs and the `gettext Python +module`_. The extraction process can be used from the API level, or through the +front-ends implemented by the `Babel`_ project, for which Genshi provides a +plugin. + +.. _`gettext`: http://www.gnu.org/software/gettext/ +.. _`gettext python module`: http://docs.python.org/lib/module-gettext.html +.. _`babel`: http://babel.edgewall.org/ + + +.. contents:: Contents + :depth: 2 +.. sectnum:: + + +Basics +====== + +The simplest way to internationalize and translate templates would be to wrap +all localizable strings in a ``gettext()`` function call (which is often aliased +to ``_()`` for brevity). In that case, no extra template filter is required. + +.. code-block:: genshi + + <p>${_("Hello, world!")}</p> + +However, this approach results in significant “character noise” in templates, +making them harder to read and preview. + +The ``genshi.filters.Translator`` filter allows you to get rid of the +explicit `gettext`_ function calls, so you can continue to just write: + +.. code-block:: genshi + + <p>Hello, world!</p> + +This text will still be extracted and translated as if you had wrapped it in a +``_()`` call. + +.. note:: For parameterized or pluralizable messages, you need to continue using + the appropriate ``gettext`` functions. + +You can control which tags should be ignored by this process; for example, it +doesn't really make sense to translate the content of the HTML +``<script></script>`` element. Both ``<script>`` and ``<style>`` are excluded +by default. + +Attribute values can also be automatically translated. The default is to +consider the attributes ``abbr``, ``alt``, ``label``, ``prompt``, ``standby``, +``summary``, and ``title``, which is a list that makes sense for HTML documents. +Of course, you can tell the translator to use a different set of attribute +names, or none at all. + +In addition, you can control automatic translation in your templates using the +``xml:lang`` attribute. If the value of that attribute is a literal string, the +contents and attributes of the element will be ignored: + +.. code-block:: genshi + + <p xml:lang="en">Hello, world!</p> + +On the other hand, if the value of the ``xml:lang`` attribute contains a Python +expression, the element contents and attributes are still considered for +automatic translation: + +.. code-block:: genshi + + <html xml:lang="$locale"> + ... + </html> + + +Extraction +========== + +The ``Translator`` class provides a class method called ``extract``, which is +a generator yielding all localizable strings found in a template or markup +stream. This includes both literal strings in text nodes and attribute values, +as well as strings in ``gettext()`` calls in embedded Python code. See the API +documentation for details on how to use this method directly. + +This functionality is integrated into the message extraction framework provided +by the `Babel`_ project. Babel provides a command-line interface as well as +commands that can be used from ``setup.py`` scripts using `Setuptools`_ or +`Distutils`_. + +.. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools +.. _`distutils`: http://docs.python.org/dist/dist.html + +The first thing you need to do to make Babel extract messages from Genshi +templates is to let Babel know which files are Genshi templates. This is done +using a “mapping configuration”, which can be stored in a configuration file, +or specified directly in your ``setup.py``. + +In a configuration file, the mapping may look like this: + +.. code-block:: ini + + # Python souce + [python:**.py] + + # Genshi templates + [genshi:**/templates/**.html] + include_attrs = title + + [genshi:**/templates/**.txt] + template_class = genshi.template.TextTemplate + encoding = latin-1 + +Please consult the Babel documentation for details on configuration. + +If all goes well, running the extraction with Babel should create a POT file +containing the strings from your Genshi templates and your Python source files. + +.. note:: Genshi currently does not support “translator comments”, i.e. text in + template comments that would get added to the POT file. This support + may or may not be added in future versions. + + +--------------------- +Configuration Options +--------------------- + +The Genshi extraction plugin for Babel supports the following options: + +``template_class`` +------------------ +The concrete ``Template`` class that the file should be loaded with. Specify +the package/module name and the class name, separated by a colon. + +The default is to use ``genshi.template:MarkupTemplate``, and you'll want to +set it to ``genshi.template:TextTemplate`` for `text templates`_. + +.. _`text templates`: text-templates.html + +``encoding`` +------------------ +The encoding of the template file. This is only used for text templates. The +default is to assume “utf-8”. + +``include_attrs`` +------------------ +Comma-separated list of attribute names that should be considered to have +localizable values. Only used for markup templates. + +``include_tags`` +------------------ +Comma-separated list of tag names that should be ignored. Only used for markup +templates. + + +Translation +=========== + +If you have prepared MO files for use with Genshi using the appropriate tools, +you can access the message catalogs with the `gettext Python module`_. You'll +probably want to create a ``gettext.GNUTranslations`` instance, and make the +translation functions it provides available to your templates by putting them +in the template context. + +The ``Translator`` filter needs to be added to the filters of the template +(applying it as a stream filter will likely not have the desired effect). +Furthermore it needs to be the first filter in the list, including the internal +filters that Genshi adds itself: + +.. code-block:: python + + from genshi.filters import Translator + from genshi.template import MarkupTemplate + + template = MarkupTemplate("...") + template.filters.insert(0, Translator(translations.ugettext)) + +If you're using `TemplateLoader`, you should specify a callback function in +which you add the filter: + +.. code-block:: python + + from genshi.filters import Translator + from genshi.template import TemplateLoader + + def template_loaded(template): + template.filters.insert(0, , Translator(translations.ugettext)) + + loader = TemplateLoader('templates', callback=template_loaded) + template = loader.load("...") + +This approach ensures that the filter is not added everytime the template is +loaded, and thus being applied multiple times. + + +Related Considerations +====================== + +If you intend to produce an application that is fully prepared for an +international audience, there are a couple of other things to keep in mind: + +------- +Unicode +------- + +Use ``unicode`` internally, not encoded bytestrings. Only encode/decode where +data enters or exits the system. This means that your code works with characters +and not just with bytes, which is an important distinction for example when +calculating the length of a piece of text. When you need to decode/encode, it's +probably a good idea to use UTF-8. + +------------- +Date and Time +------------- + +If your application uses datetime information that should be displayed to users +in different timezones, you should try to work with UTC (universal time) +internally. Do the conversion from and to "local time" when the data enters or +exits the system. Make use the Python `datetime`_ module and the third-party +`pytz`_ package. + +-------------------------- +Formatting and Locale Data +-------------------------- + +Make sure you check out the functionality provided by the `Babel`_ project for +things like number and date formatting, locale display strings, etc. + +.. _`datetime`: http://docs.python.org/lib/module-datetime.html +.. _`pytz`: http://pytz.sourceforge.net/