cmlenz@142: .. -*- mode: rst; encoding: utf-8 -*- cmlenz@142: cmlenz@142: ============================= cmlenz@142: Working with Message Catalogs cmlenz@142: ============================= cmlenz@142: cmlenz@142: .. contents:: Contents cmlenz@142: :depth: 2 cmlenz@142: .. sectnum:: cmlenz@142: cmlenz@142: cmlenz@142: Introduction cmlenz@142: ============ cmlenz@142: cmlenz@142: The ``gettext`` translation system enables you to mark any strings used in your cmlenz@142: application as subject to localization, by wrapping them in functions such as cmlenz@142: ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the cmlenz@142: ``gettext`` function is often aliased to ``_(str)``, so you can write: cmlenz@142: cmlenz@142: .. code-block:: python cmlenz@142: cmlenz@142: print _("Hello") cmlenz@142: cmlenz@142: instead of just: cmlenz@142: cmlenz@142: .. code-block:: python cmlenz@142: cmlenz@142: print "Hello" cmlenz@142: cmlenz@142: to make the string "Hello" localizable. cmlenz@142: cmlenz@142: Message catalogs are collections of translations for such localizable messages cmlenz@142: used in an application. They are commonly stored in PO (Portable Object) and MO cmlenz@142: (Machine Object) files, the formats of which are defined by the GNU `gettext`_ cmlenz@142: tools and the GNU `translation project`_. cmlenz@142: cmlenz@142: .. _`gettext`: http://www.gnu.org/software/gettext/ cmlenz@142: .. _`translation project`: http://sourceforge.net/projects/translation cmlenz@142: cmlenz@142: The general procedure for building message catalogs looks something like this: cmlenz@142: cmlenz@142: * use a tool (such as ``xgettext``) to extract localizable strings from the cmlenz@142: code base and write them to a POT (PO Template) file. cmlenz@142: * make a copy of the POT file for a specific locale (for example, "en_US") cmlenz@142: and start translating the messages cmlenz@142: * use a tool such as ``msgfmt`` to compile the locale PO file into an binary cmlenz@142: MO file cmlenz@142: * later, when code changes make it necessary to update the translations, you cmlenz@142: regenerate the POT file and merge the changes into the various cmlenz@142: locale-specific PO files, for example using ``msgmerge`` cmlenz@142: cmlenz@142: Python provides the `gettext module`_ as part of the standard library, which cmlenz@142: enables applications to work with appropriately generated MO files. cmlenz@142: cmlenz@142: .. _`gettext module`: http://docs.python.org/lib/module-gettext.html cmlenz@142: cmlenz@142: As ``gettext`` provides a solid and well supported foundation for translating cmlenz@142: application messages, Babel does not reinvent the wheel, but rather reuses this cmlenz@142: infrastructure, and makes it easier to build message catalogs for Python cmlenz@142: applications. cmlenz@142: cmlenz@142: cmlenz@142: Message Extraction cmlenz@142: ================== cmlenz@142: cmlenz@142: Babel provides functionality similar to that of the ``xgettext`` program, cmlenz@142: except that only extraction from Python source files is built-in, while support cmlenz@142: for other file formats can be added using a simple extension mechanism. cmlenz@142: cmlenz@142: Unlike ``xgettext``, which is usually invoked once for every file, the routines cmlenz@142: for message extraction in Babel operate on directories. While the per-file cmlenz@142: approach of ``xgettext`` works nicely with projects using a ``Makefile``, cmlenz@142: Python projects rarely use ``make``, and thus a different mechanism is needed cmlenz@142: for extracting messages from the heterogeneous collection of source files that cmlenz@142: many Python projects are composed of. cmlenz@142: cmlenz@142: When message extraction is based on directories instead of individual files, cmlenz@142: there needs to be a way to configure which files should be treated in which cmlenz@142: manner. For example, while many projects may contain ``.html`` files, some of cmlenz@142: those files may be static HTML files that don't contain localizable message, cmlenz@142: while others may be `Django`_ templates, and still others may contain `Genshi`_ cmlenz@142: markup templates. Some projects may even mix HTML files for different templates cmlenz@142: languages (for whatever reason). Therefore the way in which messages are cmlenz@142: extracted from source files can not only depend on the file extension, but cmlenz@142: needs to be controllable in a precise manner. cmlenz@142: cmlenz@142: .. _`Django`: http://www.djangoproject.com/ cmlenz@142: .. _`Genshi`: http://genshi.edgewall.org/ cmlenz@142: cmlenz@142: Babel accepts a configuration file to specify this mapping of files to cmlenz@142: extraction methods, which is described below. cmlenz@142: cmlenz@142: cmlenz@142: .. _`mapping`: cmlenz@142: cmlenz@142: ------------------------------------------- cmlenz@142: Extraction Method Mapping and Configuration cmlenz@142: ------------------------------------------- cmlenz@142: cmlenz@142: The mapping of extraction methods to files in Babel is done via a configuration cmlenz@142: file. This file maps extended glob patterns to the names of the extraction cmlenz@142: methods, and can also set various options for each pattern (which options are cmlenz@142: available depends on the specific extraction method). cmlenz@142: cmlenz@142: For example, the following configuration adds extraction of messages from both cmlenz@142: Genshi markup templates and text templates: cmlenz@142: cmlenz@142: .. code-block:: ini cmlenz@142: cmlenz@142: # Extraction from Python source files cmlenz@142: cmlenz@142: [python: foobar/**.py] cmlenz@142: cmlenz@142: # Extraction from Genshi HTML and text templates cmlenz@142: cmlenz@142: [genshi: foobar/**/templates/**.html] cmlenz@142: ignore_tags = script,style cmlenz@142: include_attrs = alt title summary cmlenz@142: cmlenz@142: [genshi: foobar/**/templates/**.txt] cmlenz@142: template_class = genshi.template.text:TextTemplate cmlenz@142: encoding = ISO-8819-15 cmlenz@142: cmlenz@142: The configuration file syntax is based on the format commonly found in ``.INI`` cmlenz@142: files on Windows systems, and as supported by the ``ConfigParser`` module in cmlenz@142: the Python standard libraries. Section names (the strings enclosed in square cmlenz@142: brackets) specify both the name of the extraction method, and the extended glob cmlenz@142: pattern to specify the files that this extraction method should be used for, cmlenz@142: separated by a colon. The options in the sections are passed to the extraction cmlenz@142: method. Which options are available is specific to the extraction method used. cmlenz@142: cmlenz@142: The extended glob patterns used in this configuration are similar to the glob cmlenz@142: patterns provided by most shells. A single asterisk (``*``) is a wildcard for cmlenz@142: any number of characters (except for the pathname component separator "/"), cmlenz@142: while a question mark (``?``) only matches a single character. In addition, cmlenz@142: two subsequent asterisk characters (``**``) can be used to make the wildcard cmlenz@142: match any directory level, so the pattern ``**.txt`` matches any file with the cmlenz@142: extension ``.txt`` in any directory. cmlenz@142: cmlenz@142: Lines that start with a ``#`` or ``;`` character are ignored and can be used cmlenz@142: for comments. Empty lines are also ignored, too. cmlenz@142: cmlenz@142: .. note:: if you're performing message extraction using the command Babel cmlenz@142: provides for integration into ``setup.py`` scripts (see below), you cmlenz@142: can also provide this configuration in a different way, namely as a cmlenz@142: keyword argument to the ``setup()`` function. cmlenz@142: cmlenz@142: cmlenz@142: ---------- cmlenz@142: Front-Ends cmlenz@142: ---------- cmlenz@142: cmlenz@142: Babel provides two different front-ends to access its functionality for working cmlenz@142: with message catalogs: cmlenz@142: cmlenz@142: * A `Command-line interface `_, and cmlenz@142: * `Integration with distutils/setuptools `_ cmlenz@142: cmlenz@142: Which one you choose depends on the nature of your project. For most modern cmlenz@142: Python projects, the distutils/setuptools integration is probably more cmlenz@142: convenient. cmlenz@142: cmlenz@142: cmlenz@142: -------------------------- cmlenz@142: Writing Extraction Methods cmlenz@142: -------------------------- cmlenz@142: cmlenz@142: Adding new methods for extracting localizable methods is easy. First, you'll cmlenz@142: need to implement a function that complies with the following interface: cmlenz@142: cmlenz@142: .. code-block:: python cmlenz@142: cmlenz@142: def extract_xxx(fileobj, keywords, comment_tags, options): cmlenz@142: """Extract messages from XXX files. cmlenz@142: cmlenz@142: :param fileobj: the file-like object the messages should be extracted cmlenz@142: from cmlenz@142: :param keywords: a list of keywords (i.e. function names) that should cmlenz@142: be recognized as translation functions cmlenz@142: :param comment_tags: a list of translator tags to search for and cmlenz@142: include in the results cmlenz@142: :param options: a dictionary of additional options (optional) cmlenz@142: :return: an iterator over ``(lineno, funcname, message, comments)`` cmlenz@142: tuples cmlenz@142: :rtype: ``iterator`` cmlenz@142: """ cmlenz@142: cmlenz@142: .. note:: Any strings in the tuples produced by this function must be either cmlenz@142: ``unicode`` objects, or ``str`` objects using plain ASCII characters. cmlenz@142: That means that if sources contain strings using other encodings, it cmlenz@142: is the job of the extractor implementation to do the decoding to cmlenz@142: ``unicode`` objects. cmlenz@142: cmlenz@142: Next, you should register that function as an entry point. This requires your cmlenz@142: ``setup.py`` script to use `setuptools`_, and your package to be installed with cmlenz@142: the necessary metadata. If that's taken care of, add something like the cmlenz@142: following to your ``setup.py`` script: cmlenz@142: cmlenz@142: .. code-block:: python cmlenz@142: cmlenz@142: def setup(... cmlenz@142: cmlenz@142: entry_points = """ cmlenz@142: [babel.extractors] cmlenz@142: xxx = your.package:extract_xxx cmlenz@142: """, cmlenz@142: cmlenz@142: That is, add your extraction method to the entry point group cmlenz@142: ``babel.extractors``, where the name of the entry point is the name that people cmlenz@142: will use to reference the extraction method, and the value being the module and cmlenz@142: the name of the function (separated by a colon) implementing the actual cmlenz@142: extraction. cmlenz@142: cmlenz@142: .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools cmlenz@142: cmlenz@142: Comments Tags And Translator Comments Explanation cmlenz@142: ................................................. cmlenz@142: cmlenz@142: First of all what are comments tags. Comments tags are excerpts of text to cmlenz@142: search for in comments, only comments, right before the `python gettext`_ cmlenz@142: calls, as shown on the following example: cmlenz@142: cmlenz@142: .. _`python gettext`: http://docs.python.org/lib/module-gettext.html cmlenz@142: cmlenz@142: .. code-block:: python cmlenz@142: cmlenz@142: # NOTE: This is a comment about `Foo Bar` cmlenz@142: _('Foo Bar') cmlenz@142: cmlenz@142: The comments tag for the above example would be ``NOTE:``, and the translator cmlenz@142: comment for that tag would be ``This is a comment about `Foo Bar```. cmlenz@142: cmlenz@142: The resulting output in the catalog template would be something like:: cmlenz@142: cmlenz@142: #. This is a comment about `Foo Bar` cmlenz@142: #: main.py:2 cmlenz@142: msgid "Foo Bar" cmlenz@142: msgstr "" cmlenz@142: cmlenz@142: Now, you might ask, why would I need that? cmlenz@142: cmlenz@142: Consider this simple case; you have a menu item called “Manual”. You know what cmlenz@142: it means, but when the translator sees this they will wonder did you mean: cmlenz@142: cmlenz@142: 1. a document or help manual, or cmlenz@142: 2. a manual process? cmlenz@142: cmlenz@142: This is the simplest case where a translation comment such as cmlenz@142: “The installation manual” helps to clarify the situation and makes a translator cmlenz@142: more productive. cmlenz@142: cmlenz@142: **More examples of the need for translation comments** cmlenz@142: cmlenz@142: Real world examples are best. This is a discussion over the use of the word cmlenz@142: “Forward” in Northern Sotho: cmlenz@142: cmlenz@142: “When you go forward. You go ‘Pele’, but when you forward the document, cmlenz@142: you ‘Fetišetša pele’. So if you just say forward, we don’t know what you are cmlenz@142: talking about. cmlenz@142: It is better if it's in a sentence. But in this case i think we will use ‘pele’ cmlenz@142: because on the string no. 86 and 88 there is “show previous page in history” cmlenz@142: and “show next page in history”. cmlenz@142: cmlenz@142: Were the translators guess correct? I think so, but it makes it so much easier cmlenz@142: if they don’t need to be super `sleuths`_ as well as translators. cmlenz@142: cmlenz@142: .. _`sleuths`: http://www.thefreedictionary.com/sleuth cmlenz@142: cmlenz@142: cmlenz@142: *Explanation Borrowed From:* `Wordforge`_ cmlenz@142: cmlenz@142: .. _`Wordforge`: http://www.wordforge.org/static/translation_comments.html cmlenz@142: cmlenz@142: **Note**: Translator comments are currently only supported in python source cmlenz@142: code. cmlenz@142: