cmlenz@4: .. -*- mode: rst; encoding: utf-8 -*- cmlenz@4: cmlenz@4: ============================= cmlenz@4: Working with Message Catalogs cmlenz@4: ============================= cmlenz@4: cmlenz@4: .. contents:: Contents cmlenz@4: :depth: 2 cmlenz@4: .. sectnum:: cmlenz@4: cmlenz@4: cmlenz@4: Introduction cmlenz@4: ============ cmlenz@4: cmlenz@4: The ``gettext`` translation system enables you to mark any strings used in your cmlenz@4: application as subject to localization, by wrapping them in functions such as cmlenz@4: ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the cmlenz@42: ``gettext`` function is often aliased to ``_(str)``, so you can write: cmlenz@42: cmlenz@42: .. code-block:: python cmlenz@4: cmlenz@4: print _("Hello") cmlenz@4: cmlenz@42: instead of just: cmlenz@42: cmlenz@42: .. code-block:: python cmlenz@4: cmlenz@4: print "Hello" cmlenz@4: cmlenz@4: to make the string "Hello" localizable. cmlenz@4: cmlenz@4: Message catalogs are collections of translations for such localizable messages cmlenz@4: used in an application. They are commonly stored in PO (Portable Object) and MO cmlenz@4: (Machine Object) files, the formats of which are defined by the GNU `gettext`_ cmlenz@4: tools and the GNU `translation project`_. cmlenz@4: cmlenz@4: .. _`gettext`: http://www.gnu.org/software/gettext/ cmlenz@4: .. _`translation project`: http://sourceforge.net/projects/translation cmlenz@4: cmlenz@4: The general procedure for building message catalogs looks something like this: cmlenz@4: cmlenz@4: * use a tool (such as ``xgettext``) to extract localizable strings from the cmlenz@4: code base and write them to a POT (PO Template) file. cmlenz@4: * make a copy of the POT file for a specific locale (for example, "en_US") cmlenz@4: and start translating the messages cmlenz@4: * use a tool such as ``msgfmt`` to compile the locale PO file into an binary cmlenz@4: MO file cmlenz@4: * later, when code changes make it necessary to update the translations, you cmlenz@4: regenerate the POT file and merge the changes into the various cmlenz@4: locale-specific PO files, for example using ``msgmerge`` cmlenz@4: cmlenz@4: Python provides the `gettext module`_ as part of the standard library, which cmlenz@4: enables applications to work with appropriately generated MO files. cmlenz@4: cmlenz@4: .. _`gettext module`: http://docs.python.org/lib/module-gettext.html cmlenz@4: cmlenz@4: As ``gettext`` provides a solid and well supported foundation for translating cmlenz@4: application messages, Babel does not reinvent the wheel, but rather reuses this cmlenz@4: infrastructure, and makes it easier to build message catalogs for Python cmlenz@4: applications. cmlenz@4: cmlenz@4: cmlenz@4: Message Extraction cmlenz@4: ================== cmlenz@4: cmlenz@4: Babel provides functionality similar to that of the ``xgettext`` program, cmlenz@4: except that only extraction from Python source files is built-in, while support cmlenz@4: for other file formats can be added using a simple extension mechanism. cmlenz@4: cmlenz@50: Unlike ``xgettext``, which is usually invoked once for every file, the routines cmlenz@50: for message extraction in Babel operate on directories. While the per-file cmlenz@50: approach of ``xgettext`` works nicely with projects using a ``Makefile``, cmlenz@50: Python projects rarely use ``make``, and thus a different mechanism is needed cmlenz@50: for extracting messages from the heterogeneous collection of source files that cmlenz@50: many Python projects are composed of. cmlenz@50: cmlenz@50: When message extraction is based on directories instead of individual files, cmlenz@50: there needs to be a way to configure which files should be treated in which cmlenz@50: manner. For example, while many projects may contain ``.html`` files, some of cmlenz@50: those files may be static HTML files that don't contain localizable message, cmlenz@50: while others may be `Django`_ templates, and still others may contain `Genshi`_ cmlenz@50: markup templates. Some projects may even mix HTML files for different templates cmlenz@50: languages (for whatever reason). Therefore the way in which messages are cmlenz@50: extracted from source files can not only depend on the file extension, but cmlenz@50: needs to be controllable in a precise manner. cmlenz@50: cmlenz@50: .. _`Django`: http://www.djangoproject.com/ cmlenz@50: .. _`Genshi`: http://genshi.edgewall.org/ cmlenz@50: cmlenz@50: Babel accepts a configuration file to specify this mapping of files to cmlenz@50: extraction methods, which is described below. cmlenz@4: cmlenz@4: cmlenz@50: .. _`mapping`: cmlenz@4: cmlenz@50: ------------------------------------------- cmlenz@50: Extraction Method Mapping and Configuration cmlenz@50: ------------------------------------------- cmlenz@50: cmlenz@50: The mapping of extraction methods to files in Babel is done via a configuration cmlenz@50: file. This file maps extended glob patterns to the names of the extraction cmlenz@50: methods, and can also set various options for each pattern (which options are cmlenz@50: available depends on the specific extraction method). cmlenz@50: cmlenz@50: For example, the following configuration adds extraction of messages from both cmlenz@50: Genshi markup templates and text templates: cmlenz@50: cmlenz@50: .. code-block:: ini cmlenz@50: cmlenz@50: # Extraction from Python source files cmlenz@50: cmlenz@50: [python: foobar/**.py] cmlenz@50: cmlenz@50: # Extraction from Genshi HTML and text templates cmlenz@50: cmlenz@50: [genshi: foobar/**/templates/**.html] cmlenz@50: ignore_tags = script,style cmlenz@50: include_attrs = alt title summary cmlenz@50: cmlenz@50: [genshi: foobar/**/templates/**.txt] cmlenz@50: template_class = genshi.template.text:TextTemplate cmlenz@50: encoding = ISO-8819-15 cmlenz@50: cmlenz@50: The configuration file syntax is based on the format commonly found in ``.INI`` cmlenz@50: files on Windows systems, and as supported by the ``ConfigParser`` module in cmlenz@50: the Python standard libraries. Section names (the strings enclosed in square cmlenz@50: brackets) specify both the name of the extraction method, and the extended glob cmlenz@50: pattern to specify the files that this extraction method should be used for, cmlenz@50: separated by a colon. The options in the sections are passed to the extraction cmlenz@50: method. Which options are available is specific to the extraction method used. cmlenz@50: cmlenz@50: The extended glob patterns used in this configuration are similar to the glob cmlenz@50: patterns provided by most shells. A single asterisk (``*``) is a wildcard for cmlenz@50: any number of characters (except for the pathname component separator "/"), cmlenz@50: while a question mark (``?``) only matches a single character. In addition, cmlenz@50: two subsequent asterisk characters (``**``) can be used to make the wildcard cmlenz@50: match any directory level, so the pattern ``**.txt`` matches any file with the cmlenz@50: extension ``.txt`` in any directory. cmlenz@50: cmlenz@50: Lines that start with a ``#`` or ``;`` character are ignored and can be used cmlenz@50: for comments. Empty lines are also ignored, too. cmlenz@50: cmlenz@51: .. note:: if you're performing message extraction using the command Babel cmlenz@51: provides for integration into ``setup.py`` scripts (see below), you cmlenz@51: can also provide this configuration in a different way, namely as a cmlenz@51: keyword argument to the ``setup()`` function. cmlenz@4: cmlenz@4: cmlenz@51: ---------- cmlenz@51: Front-Ends cmlenz@51: ---------- cmlenz@4: cmlenz@51: Babel provides two different front-ends to access its functionality for working cmlenz@51: with message catalogs: cmlenz@4: cmlenz@51: * A `Command-line interface `_, and cmlenz@51: * `Integeration with distutils/setuptools `_ cmlenz@51: cmlenz@51: Which one you choose depends on the nature of your project. For most modern cmlenz@51: Python projects, the distutils/setuptools integration is probably more cmlenz@51: convenient. cmlenz@51: cmlenz@4: cmlenz@50: -------------------------- cmlenz@50: Writing Extraction Methods cmlenz@50: -------------------------- cmlenz@50: cmlenz@75: Adding new methods for extracting localizable methods is easy. First, you'll cmlenz@75: need to implement a function that complies with the following interface: cmlenz@4: cmlenz@42: .. code-block:: python cmlenz@42: cmlenz@75: def extract_xxx(fileobj, keywords, options): cmlenz@75: """Extract messages from XXX files. cmlenz@75: cmlenz@75: :param fileobj: the file-like object the messages should be extracted cmlenz@75: from cmlenz@75: :param keywords: a list of keywords (i.e. function names) that should cmlenz@75: be recognized as translation functions cmlenz@75: :param options: a dictionary of additional options (optional) cmlenz@75: :return: an iterator over ``(lineno, funcname, message)`` tuples cmlenz@75: :rtype: ``iterator`` cmlenz@75: """ cmlenz@75: cmlenz@75: Next, you should register that function as an entry point. This requires your cmlenz@75: ``setup.py`` script to use `setuptools`_, and your package to be installed with cmlenz@75: the necessary metadata. If that's taken care of, add something like the cmlenz@75: following to your ``setup.py`` script: cmlenz@75: cmlenz@75: .. code-block:: python cmlenz@75: cmlenz@75: def setup(... cmlenz@75: cmlenz@75: entry_points = """ cmlenz@75: [babel.extractors] cmlenz@75: xxx = your.package:extract_xxx cmlenz@75: """, cmlenz@75: cmlenz@75: That is, add your extraction method to the entry point group cmlenz@75: ``babel.extractors``, where the name of the entry point is the name that people cmlenz@75: will use to reference the extraction method, and the value being the module and cmlenz@75: the name of the function (separated by a colon) implementing the actual cmlenz@75: extraction. cmlenz@75: cmlenz@75: .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools