cmlenz@2: .. -*- mode: rst; encoding: utf-8 -*- cmlenz@2: cmlenz@2: ============================= cmlenz@2: Working with Message Catalogs cmlenz@2: ============================= cmlenz@2: cmlenz@2: .. contents:: Contents cmlenz@2: :depth: 2 cmlenz@2: .. sectnum:: cmlenz@2: cmlenz@2: cmlenz@2: Introduction cmlenz@2: ============ cmlenz@2: cmlenz@2: The ``gettext`` translation system enables you to mark any strings used in your cmlenz@2: application as subject to localization, by wrapping them in functions such as cmlenz@2: ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the cmlenz@40: ``gettext`` function is often aliased to ``_(str)``, so you can write: cmlenz@40: cmlenz@40: .. code-block:: python cmlenz@2: cmlenz@2: print _("Hello") cmlenz@2: cmlenz@40: instead of just: cmlenz@40: cmlenz@40: .. code-block:: python cmlenz@2: cmlenz@2: print "Hello" cmlenz@2: cmlenz@2: to make the string "Hello" localizable. cmlenz@2: cmlenz@2: Message catalogs are collections of translations for such localizable messages cmlenz@2: used in an application. They are commonly stored in PO (Portable Object) and MO cmlenz@2: (Machine Object) files, the formats of which are defined by the GNU `gettext`_ cmlenz@2: tools and the GNU `translation project`_. cmlenz@2: cmlenz@2: .. _`gettext`: http://www.gnu.org/software/gettext/ cmlenz@2: .. _`translation project`: http://sourceforge.net/projects/translation cmlenz@2: cmlenz@2: The general procedure for building message catalogs looks something like this: cmlenz@2: cmlenz@2: * use a tool (such as ``xgettext``) to extract localizable strings from the cmlenz@2: code base and write them to a POT (PO Template) file. cmlenz@2: * make a copy of the POT file for a specific locale (for example, "en_US") cmlenz@2: and start translating the messages cmlenz@2: * use a tool such as ``msgfmt`` to compile the locale PO file into an binary cmlenz@2: MO file cmlenz@2: * later, when code changes make it necessary to update the translations, you cmlenz@2: regenerate the POT file and merge the changes into the various cmlenz@2: locale-specific PO files, for example using ``msgmerge`` cmlenz@2: cmlenz@2: Python provides the `gettext module`_ as part of the standard library, which cmlenz@2: enables applications to work with appropriately generated MO files. cmlenz@2: cmlenz@2: .. _`gettext module`: http://docs.python.org/lib/module-gettext.html cmlenz@2: cmlenz@2: As ``gettext`` provides a solid and well supported foundation for translating cmlenz@2: application messages, Babel does not reinvent the wheel, but rather reuses this cmlenz@2: infrastructure, and makes it easier to build message catalogs for Python cmlenz@2: applications. cmlenz@2: cmlenz@2: cmlenz@2: Message Extraction cmlenz@2: ================== cmlenz@2: cmlenz@2: Babel provides functionality similar to that of the ``xgettext`` program, cmlenz@2: except that only extraction from Python source files is built-in, while support cmlenz@2: for other file formats can be added using a simple extension mechanism. cmlenz@2: cmlenz@48: Unlike ``xgettext``, which is usually invoked once for every file, the routines cmlenz@48: for message extraction in Babel operate on directories. While the per-file cmlenz@48: approach of ``xgettext`` works nicely with projects using a ``Makefile``, cmlenz@48: Python projects rarely use ``make``, and thus a different mechanism is needed cmlenz@48: for extracting messages from the heterogeneous collection of source files that cmlenz@48: many Python projects are composed of. cmlenz@48: cmlenz@48: When message extraction is based on directories instead of individual files, cmlenz@48: there needs to be a way to configure which files should be treated in which cmlenz@48: manner. For example, while many projects may contain ``.html`` files, some of cmlenz@48: those files may be static HTML files that don't contain localizable message, cmlenz@48: while others may be `Django`_ templates, and still others may contain `Genshi`_ cmlenz@48: markup templates. Some projects may even mix HTML files for different templates cmlenz@48: languages (for whatever reason). Therefore the way in which messages are cmlenz@48: extracted from source files can not only depend on the file extension, but cmlenz@48: needs to be controllable in a precise manner. cmlenz@48: cmlenz@48: .. _`Django`: http://www.djangoproject.com/ cmlenz@48: .. _`Genshi`: http://genshi.edgewall.org/ cmlenz@48: cmlenz@48: Babel accepts a configuration file to specify this mapping of files to cmlenz@48: extraction methods, which is described below. cmlenz@2: cmlenz@2: cmlenz@48: .. _`mapping`: cmlenz@2: cmlenz@48: ------------------------------------------- cmlenz@48: Extraction Method Mapping and Configuration cmlenz@48: ------------------------------------------- cmlenz@48: cmlenz@48: The mapping of extraction methods to files in Babel is done via a configuration cmlenz@48: file. This file maps extended glob patterns to the names of the extraction cmlenz@48: methods, and can also set various options for each pattern (which options are cmlenz@48: available depends on the specific extraction method). cmlenz@48: cmlenz@48: For example, the following configuration adds extraction of messages from both cmlenz@48: Genshi markup templates and text templates: cmlenz@48: cmlenz@48: .. code-block:: ini cmlenz@48: cmlenz@48: # Extraction from Python source files cmlenz@48: cmlenz@48: [python: foobar/**.py] cmlenz@48: cmlenz@48: # Extraction from Genshi HTML and text templates cmlenz@48: cmlenz@48: [genshi: foobar/**/templates/**.html] cmlenz@48: ignore_tags = script,style cmlenz@48: include_attrs = alt title summary cmlenz@48: cmlenz@48: [genshi: foobar/**/templates/**.txt] cmlenz@48: template_class = genshi.template.text:TextTemplate cmlenz@48: encoding = ISO-8819-15 cmlenz@48: cmlenz@48: The configuration file syntax is based on the format commonly found in ``.INI`` cmlenz@48: files on Windows systems, and as supported by the ``ConfigParser`` module in cmlenz@48: the Python standard libraries. Section names (the strings enclosed in square cmlenz@48: brackets) specify both the name of the extraction method, and the extended glob cmlenz@48: pattern to specify the files that this extraction method should be used for, cmlenz@48: separated by a colon. The options in the sections are passed to the extraction cmlenz@48: method. Which options are available is specific to the extraction method used. cmlenz@48: cmlenz@48: The extended glob patterns used in this configuration are similar to the glob cmlenz@48: patterns provided by most shells. A single asterisk (``*``) is a wildcard for cmlenz@48: any number of characters (except for the pathname component separator "/"), cmlenz@48: while a question mark (``?``) only matches a single character. In addition, cmlenz@48: two subsequent asterisk characters (``**``) can be used to make the wildcard cmlenz@48: match any directory level, so the pattern ``**.txt`` matches any file with the cmlenz@48: extension ``.txt`` in any directory. cmlenz@48: cmlenz@48: Lines that start with a ``#`` or ``;`` character are ignored and can be used cmlenz@48: for comments. Empty lines are also ignored, too. cmlenz@48: cmlenz@49: .. note:: if you're performing message extraction using the command Babel cmlenz@49: provides for integration into ``setup.py`` scripts (see below), you cmlenz@49: can also provide this configuration in a different way, namely as a cmlenz@49: keyword argument to the ``setup()`` function. cmlenz@2: cmlenz@2: cmlenz@49: ---------- cmlenz@49: Front-Ends cmlenz@49: ---------- cmlenz@2: cmlenz@49: Babel provides two different front-ends to access its functionality for working cmlenz@49: with message catalogs: cmlenz@2: cmlenz@49: * A `Command-line interface `_, and cmlenz@49: * `Integeration with distutils/setuptools `_ cmlenz@49: cmlenz@49: Which one you choose depends on the nature of your project. For most modern cmlenz@49: Python projects, the distutils/setuptools integration is probably more cmlenz@49: convenient. cmlenz@49: cmlenz@2: cmlenz@48: -------------------------- cmlenz@48: Writing Extraction Methods cmlenz@48: -------------------------- cmlenz@48: cmlenz@48: (TODO: write) cmlenz@48: cmlenz@48: cmlenz@2: cmlenz@2: Extended ``Translations`` Class cmlenz@2: =============================== cmlenz@2: cmlenz@2: Many web-based applications are composed of a variety of different components cmlenz@2: (possibly using some kind of plugin system), and some of those components may cmlenz@2: provide their own message catalogs that need to be integrated into the larger cmlenz@2: system. cmlenz@2: cmlenz@2: To support this usage pattern, Babel provides a ``Translations`` class that is cmlenz@2: derived from the ``GNUTranslations`` class in the ``gettext`` module. This cmlenz@2: class adds a ``merge()`` method that takes another ``Translations`` instance, cmlenz@48: and merges its contents into the catalog: cmlenz@2: cmlenz@40: .. code-block:: python cmlenz@40: cmlenz@2: translations = Translations.load('main') cmlenz@2: translations.merge(Translations.load('plugin1'))