Mercurial > babel > old > mirror
diff 0.8.x/doc/messages.txt @ 142:4a7af44e6695 stable
Create branch for 0.8.x releases.
author | cmlenz |
---|---|
date | Wed, 20 Jun 2007 10:09:07 +0000 |
parents | |
children | 32a242175da5 |
line wrap: on
line diff
new file mode 100644 --- /dev/null +++ b/0.8.x/doc/messages.txt @@ -0,0 +1,276 @@ +.. -*- mode: rst; encoding: utf-8 -*- + +============================= +Working with Message Catalogs +============================= + +.. contents:: Contents + :depth: 2 +.. sectnum:: + + +Introduction +============ + +The ``gettext`` translation system enables you to mark any strings used in your +application as subject to localization, by wrapping them in functions such as +``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the +``gettext`` function is often aliased to ``_(str)``, so you can write: + +.. code-block:: python + + print _("Hello") + +instead of just: + +.. code-block:: python + + print "Hello" + +to make the string "Hello" localizable. + +Message catalogs are collections of translations for such localizable messages +used in an application. They are commonly stored in PO (Portable Object) and MO +(Machine Object) files, the formats of which are defined by the GNU `gettext`_ +tools and the GNU `translation project`_. + + .. _`gettext`: http://www.gnu.org/software/gettext/ + .. _`translation project`: http://sourceforge.net/projects/translation + +The general procedure for building message catalogs looks something like this: + + * use a tool (such as ``xgettext``) to extract localizable strings from the + code base and write them to a POT (PO Template) file. + * make a copy of the POT file for a specific locale (for example, "en_US") + and start translating the messages + * use a tool such as ``msgfmt`` to compile the locale PO file into an binary + MO file + * later, when code changes make it necessary to update the translations, you + regenerate the POT file and merge the changes into the various + locale-specific PO files, for example using ``msgmerge`` + +Python provides the `gettext module`_ as part of the standard library, which +enables applications to work with appropriately generated MO files. + + .. _`gettext module`: http://docs.python.org/lib/module-gettext.html + +As ``gettext`` provides a solid and well supported foundation for translating +application messages, Babel does not reinvent the wheel, but rather reuses this +infrastructure, and makes it easier to build message catalogs for Python +applications. + + +Message Extraction +================== + +Babel provides functionality similar to that of the ``xgettext`` program, +except that only extraction from Python source files is built-in, while support +for other file formats can be added using a simple extension mechanism. + +Unlike ``xgettext``, which is usually invoked once for every file, the routines +for message extraction in Babel operate on directories. While the per-file +approach of ``xgettext`` works nicely with projects using a ``Makefile``, +Python projects rarely use ``make``, and thus a different mechanism is needed +for extracting messages from the heterogeneous collection of source files that +many Python projects are composed of. + +When message extraction is based on directories instead of individual files, +there needs to be a way to configure which files should be treated in which +manner. For example, while many projects may contain ``.html`` files, some of +those files may be static HTML files that don't contain localizable message, +while others may be `Django`_ templates, and still others may contain `Genshi`_ +markup templates. Some projects may even mix HTML files for different templates +languages (for whatever reason). Therefore the way in which messages are +extracted from source files can not only depend on the file extension, but +needs to be controllable in a precise manner. + +.. _`Django`: http://www.djangoproject.com/ +.. _`Genshi`: http://genshi.edgewall.org/ + +Babel accepts a configuration file to specify this mapping of files to +extraction methods, which is described below. + + +.. _`mapping`: + +------------------------------------------- +Extraction Method Mapping and Configuration +------------------------------------------- + +The mapping of extraction methods to files in Babel is done via a configuration +file. This file maps extended glob patterns to the names of the extraction +methods, and can also set various options for each pattern (which options are +available depends on the specific extraction method). + +For example, the following configuration adds extraction of messages from both +Genshi markup templates and text templates: + +.. code-block:: ini + + # Extraction from Python source files + + [python: foobar/**.py] + + # Extraction from Genshi HTML and text templates + + [genshi: foobar/**/templates/**.html] + ignore_tags = script,style + include_attrs = alt title summary + + [genshi: foobar/**/templates/**.txt] + template_class = genshi.template.text:TextTemplate + encoding = ISO-8819-15 + +The configuration file syntax is based on the format commonly found in ``.INI`` +files on Windows systems, and as supported by the ``ConfigParser`` module in +the Python standard libraries. Section names (the strings enclosed in square +brackets) specify both the name of the extraction method, and the extended glob +pattern to specify the files that this extraction method should be used for, +separated by a colon. The options in the sections are passed to the extraction +method. Which options are available is specific to the extraction method used. + +The extended glob patterns used in this configuration are similar to the glob +patterns provided by most shells. A single asterisk (``*``) is a wildcard for +any number of characters (except for the pathname component separator "/"), +while a question mark (``?``) only matches a single character. In addition, +two subsequent asterisk characters (``**``) can be used to make the wildcard +match any directory level, so the pattern ``**.txt`` matches any file with the +extension ``.txt`` in any directory. + +Lines that start with a ``#`` or ``;`` character are ignored and can be used +for comments. Empty lines are also ignored, too. + +.. note:: if you're performing message extraction using the command Babel + provides for integration into ``setup.py`` scripts (see below), you + can also provide this configuration in a different way, namely as a + keyword argument to the ``setup()`` function. + + +---------- +Front-Ends +---------- + +Babel provides two different front-ends to access its functionality for working +with message catalogs: + + * A `Command-line interface <cmdline.html>`_, and + * `Integration with distutils/setuptools <setup.html>`_ + +Which one you choose depends on the nature of your project. For most modern +Python projects, the distutils/setuptools integration is probably more +convenient. + + +-------------------------- +Writing Extraction Methods +-------------------------- + +Adding new methods for extracting localizable methods is easy. First, you'll +need to implement a function that complies with the following interface: + +.. code-block:: python + + def extract_xxx(fileobj, keywords, comment_tags, options): + """Extract messages from XXX files. + + :param fileobj: the file-like object the messages should be extracted + from + :param keywords: a list of keywords (i.e. function names) that should + be recognized as translation functions + :param comment_tags: a list of translator tags to search for and + include in the results + :param options: a dictionary of additional options (optional) + :return: an iterator over ``(lineno, funcname, message, comments)`` + tuples + :rtype: ``iterator`` + """ + +.. note:: Any strings in the tuples produced by this function must be either + ``unicode`` objects, or ``str`` objects using plain ASCII characters. + That means that if sources contain strings using other encodings, it + is the job of the extractor implementation to do the decoding to + ``unicode`` objects. + +Next, you should register that function as an entry point. This requires your +``setup.py`` script to use `setuptools`_, and your package to be installed with +the necessary metadata. If that's taken care of, add something like the +following to your ``setup.py`` script: + +.. code-block:: python + + def setup(... + + entry_points = """ + [babel.extractors] + xxx = your.package:extract_xxx + """, + +That is, add your extraction method to the entry point group +``babel.extractors``, where the name of the entry point is the name that people +will use to reference the extraction method, and the value being the module and +the name of the function (separated by a colon) implementing the actual +extraction. + +.. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools + +Comments Tags And Translator Comments Explanation +................................................. + +First of all what are comments tags. Comments tags are excerpts of text to +search for in comments, only comments, right before the `python gettext`_ +calls, as shown on the following example: + + .. _`python gettext`: http://docs.python.org/lib/module-gettext.html + +.. code-block:: python + + # NOTE: This is a comment about `Foo Bar` + _('Foo Bar') + +The comments tag for the above example would be ``NOTE:``, and the translator +comment for that tag would be ``This is a comment about `Foo Bar```. + +The resulting output in the catalog template would be something like:: + + #. This is a comment about `Foo Bar` + #: main.py:2 + msgid "Foo Bar" + msgstr "" + +Now, you might ask, why would I need that? + +Consider this simple case; you have a menu item called “Manual”. You know what +it means, but when the translator sees this they will wonder did you mean: + +1. a document or help manual, or +2. a manual process? + +This is the simplest case where a translation comment such as +“The installation manual” helps to clarify the situation and makes a translator +more productive. + +**More examples of the need for translation comments** + +Real world examples are best. This is a discussion over the use of the word +“Forward” in Northern Sotho: + +“When you go forward. You go ‘Pele’, but when you forward the document, +you ‘Fetišetša pele’. So if you just say forward, we don’t know what you are +talking about. +It is better if it's in a sentence. But in this case i think we will use ‘pele’ +because on the string no. 86 and 88 there is “show previous page in history” +and “show next page in history”. + +Were the translators guess correct? I think so, but it makes it so much easier +if they don’t need to be super `sleuths`_ as well as translators. + + .. _`sleuths`: http://www.thefreedictionary.com/sleuth + + +*Explanation Borrowed From:* `Wordforge`_ + + .. _`Wordforge`: http://www.wordforge.org/static/translation_comments.html + +**Note**: Translator comments are currently only supported in python source +code. +