cmlenz@263: .. -*- mode: rst; encoding: utf-8 -*- cmlenz@263: cmlenz@263: ============================= cmlenz@263: Working with Message Catalogs cmlenz@263: ============================= cmlenz@263: cmlenz@263: .. contents:: Contents cmlenz@263: :depth: 3 cmlenz@263: .. sectnum:: cmlenz@263: cmlenz@263: cmlenz@263: Introduction cmlenz@263: ============ cmlenz@263: cmlenz@263: The ``gettext`` translation system enables you to mark any strings used in your cmlenz@263: application as subject to localization, by wrapping them in functions such as cmlenz@263: ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the cmlenz@263: ``gettext`` function is often aliased to ``_(str)``, so you can write: cmlenz@263: cmlenz@263: .. code-block:: python cmlenz@263: cmlenz@263: print _("Hello") cmlenz@263: cmlenz@263: instead of just: cmlenz@263: cmlenz@263: .. code-block:: python cmlenz@263: cmlenz@263: print "Hello" cmlenz@263: cmlenz@263: to make the string "Hello" localizable. cmlenz@263: cmlenz@263: Message catalogs are collections of translations for such localizable messages cmlenz@263: used in an application. They are commonly stored in PO (Portable Object) and MO cmlenz@263: (Machine Object) files, the formats of which are defined by the GNU `gettext`_ cmlenz@263: tools and the GNU `translation project`_. cmlenz@263: cmlenz@263: .. _`gettext`: http://www.gnu.org/software/gettext/ cmlenz@263: .. _`translation project`: http://sourceforge.net/projects/translation cmlenz@263: cmlenz@263: The general procedure for building message catalogs looks something like this: cmlenz@263: cmlenz@263: * use a tool (such as ``xgettext``) to extract localizable strings from the cmlenz@263: code base and write them to a POT (PO Template) file. cmlenz@263: * make a copy of the POT file for a specific locale (for example, "en_US") cmlenz@263: and start translating the messages cmlenz@263: * use a tool such as ``msgfmt`` to compile the locale PO file into an binary cmlenz@263: MO file cmlenz@263: * later, when code changes make it necessary to update the translations, you cmlenz@263: regenerate the POT file and merge the changes into the various cmlenz@263: locale-specific PO files, for example using ``msgmerge`` cmlenz@263: cmlenz@263: Python provides the `gettext module`_ as part of the standard library, which cmlenz@263: enables applications to work with appropriately generated MO files. cmlenz@263: cmlenz@263: .. _`gettext module`: http://docs.python.org/lib/module-gettext.html cmlenz@263: cmlenz@263: As ``gettext`` provides a solid and well supported foundation for translating cmlenz@263: application messages, Babel does not reinvent the wheel, but rather reuses this cmlenz@263: infrastructure, and makes it easier to build message catalogs for Python cmlenz@263: applications. cmlenz@263: cmlenz@263: cmlenz@263: Message Extraction cmlenz@263: ================== cmlenz@263: cmlenz@263: Babel provides functionality similar to that of the ``xgettext`` program, cmlenz@263: except that only extraction from Python source files is built-in, while support cmlenz@263: for other file formats can be added using a simple extension mechanism. cmlenz@263: cmlenz@263: Unlike ``xgettext``, which is usually invoked once for every file, the routines cmlenz@263: for message extraction in Babel operate on directories. While the per-file cmlenz@263: approach of ``xgettext`` works nicely with projects using a ``Makefile``, cmlenz@263: Python projects rarely use ``make``, and thus a different mechanism is needed cmlenz@263: for extracting messages from the heterogeneous collection of source files that cmlenz@263: many Python projects are composed of. cmlenz@263: cmlenz@263: When message extraction is based on directories instead of individual files, cmlenz@263: there needs to be a way to configure which files should be treated in which cmlenz@263: manner. For example, while many projects may contain ``.html`` files, some of cmlenz@263: those files may be static HTML files that don't contain localizable message, cmlenz@263: while others may be `Django`_ templates, and still others may contain `Genshi`_ cmlenz@263: markup templates. Some projects may even mix HTML files for different templates cmlenz@263: languages (for whatever reason). Therefore the way in which messages are cmlenz@263: extracted from source files can not only depend on the file extension, but cmlenz@263: needs to be controllable in a precise manner. cmlenz@263: cmlenz@263: .. _`Django`: http://www.djangoproject.com/ cmlenz@263: .. _`Genshi`: http://genshi.edgewall.org/ cmlenz@263: cmlenz@263: Babel accepts a configuration file to specify this mapping of files to cmlenz@263: extraction methods, which is described below. cmlenz@263: cmlenz@263: cmlenz@263: .. _`frontends`: cmlenz@263: cmlenz@263: ---------- cmlenz@263: Front-Ends cmlenz@263: ---------- cmlenz@263: cmlenz@263: Babel provides two different front-ends to access its functionality for working cmlenz@263: with message catalogs: cmlenz@263: cmlenz@263: * A `Command-line interface `_, and cmlenz@263: * `Integration with distutils/setuptools `_ cmlenz@263: cmlenz@263: Which one you choose depends on the nature of your project. For most modern cmlenz@263: Python projects, the distutils/setuptools integration is probably more cmlenz@263: convenient. cmlenz@263: cmlenz@263: cmlenz@263: .. _`mapping`: cmlenz@263: cmlenz@263: ------------------------------------------- cmlenz@263: Extraction Method Mapping and Configuration cmlenz@263: ------------------------------------------- cmlenz@263: cmlenz@263: The mapping of extraction methods to files in Babel is done via a configuration cmlenz@263: file. This file maps extended glob patterns to the names of the extraction cmlenz@263: methods, and can also set various options for each pattern (which options are cmlenz@263: available depends on the specific extraction method). cmlenz@263: cmlenz@263: For example, the following configuration adds extraction of messages from both cmlenz@263: Genshi markup templates and text templates: cmlenz@263: cmlenz@263: .. code-block:: ini cmlenz@263: cmlenz@263: # Extraction from Python source files cmlenz@263: cmlenz@263: [python: **.py] cmlenz@263: cmlenz@263: # Extraction from Genshi HTML and text templates cmlenz@263: cmlenz@263: [genshi: **/templates/**.html] cmlenz@263: ignore_tags = script,style cmlenz@263: include_attrs = alt title summary cmlenz@263: cmlenz@263: [genshi: **/templates/**.txt] cmlenz@263: template_class = genshi.template:TextTemplate cmlenz@263: encoding = ISO-8819-15 cmlenz@263: cmlenz@263: The configuration file syntax is based on the format commonly found in ``.INI`` cmlenz@263: files on Windows systems, and as supported by the ``ConfigParser`` module in cmlenz@263: the Python standard library. Section names (the strings enclosed in square cmlenz@263: brackets) specify both the name of the extraction method, and the extended glob cmlenz@263: pattern to specify the files that this extraction method should be used for, cmlenz@263: separated by a colon. The options in the sections are passed to the extraction cmlenz@263: method. Which options are available is specific to the extraction method used. cmlenz@263: cmlenz@263: The extended glob patterns used in this configuration are similar to the glob cmlenz@263: patterns provided by most shells. A single asterisk (``*``) is a wildcard for cmlenz@263: any number of characters (except for the pathname component separator "/"), cmlenz@263: while a question mark (``?``) only matches a single character. In addition, cmlenz@263: two subsequent asterisk characters (``**``) can be used to make the wildcard cmlenz@263: match any directory level, so the pattern ``**.txt`` matches any file with the cmlenz@263: extension ``.txt`` in any directory. cmlenz@263: cmlenz@263: Lines that start with a ``#`` or ``;`` character are ignored and can be used cmlenz@263: for comments. Empty lines are ignored, too. cmlenz@263: cmlenz@263: .. note:: if you're performing message extraction using the command Babel cmlenz@263: provides for integration into ``setup.py`` scripts, you can also cmlenz@263: provide this configuration in a different way, namely as a keyword cmlenz@263: argument to the ``setup()`` function. See `Distutils/Setuptools cmlenz@263: Integration`_ for more information. cmlenz@263: cmlenz@263: .. _`distutils/setuptools integration`: setup.html cmlenz@263: cmlenz@263: cmlenz@263: Default Extraction Methods cmlenz@263: -------------------------- cmlenz@263: cmlenz@263: Babel comes with only two builtin extractors: ``python`` (which extracts cmlenz@263: messages from Python source files) and ``ignore`` (which extracts nothing). cmlenz@263: cmlenz@263: The ``python`` extractor is by default mapped to the glob pattern ``**.py``, cmlenz@263: meaning it'll be applied to all files with the ``.py`` extension in any cmlenz@263: directory. If you specify your own mapping configuration, this default mapping cmlenz@272: is discarded, so you need to explicitly add it to your mapping (as shown in the cmlenz@272: example above.) cmlenz@263: cmlenz@263: cmlenz@263: .. _`referencing extraction methods`: cmlenz@263: cmlenz@263: Referencing Extraction Methods cmlenz@263: ------------------------------ cmlenz@263: cmlenz@263: To be able to use short extraction method names such as “genshi”, you need to cmlenz@263: have `pkg_resources`_ installed, and the package implementing that extraction cmlenz@263: method needs to have been installed with its meta data (the `egg-info`_). cmlenz@263: cmlenz@263: If this is not possible for some reason, you need to map the short names to cmlenz@263: fully qualified function names in an extract section in the mapping cmlenz@263: configuration. For example: cmlenz@263: cmlenz@263: .. code-block:: ini cmlenz@263: cmlenz@263: # Some custom extraction method cmlenz@263: cmlenz@263: [extractors] cmlenz@263: custom = mypackage.module:extract_custom cmlenz@263: cmlenz@263: [custom: **.ctm] cmlenz@263: some_option = foo cmlenz@263: cmlenz@263: Note that the builtin extraction methods ``python`` and ``ignore`` are available cmlenz@263: by default, even if `pkg_resources`_ is not installed. You should never need to cmlenz@263: explicitly define them in the ``[extractors]`` section. cmlenz@263: cmlenz@263: .. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs cmlenz@263: .. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources cmlenz@263: cmlenz@263: cmlenz@263: -------------------------- cmlenz@263: Writing Extraction Methods cmlenz@263: -------------------------- cmlenz@263: cmlenz@263: Adding new methods for extracting localizable methods is easy. First, you'll cmlenz@263: need to implement a function that complies with the following interface: cmlenz@263: cmlenz@263: .. code-block:: python cmlenz@263: cmlenz@263: def extract_xxx(fileobj, keywords, comment_tags, options): cmlenz@263: """Extract messages from XXX files. cmlenz@263: cmlenz@263: :param fileobj: the file-like object the messages should be extracted cmlenz@263: from cmlenz@263: :param keywords: a list of keywords (i.e. function names) that should cmlenz@263: be recognized as translation functions cmlenz@263: :param comment_tags: a list of translator tags to search for and cmlenz@263: include in the results cmlenz@263: :param options: a dictionary of additional options (optional) cmlenz@263: :return: an iterator over ``(lineno, funcname, message, comments)`` cmlenz@263: tuples cmlenz@263: :rtype: ``iterator`` cmlenz@263: """ cmlenz@263: cmlenz@263: .. note:: Any strings in the tuples produced by this function must be either cmlenz@263: ``unicode`` objects, or ``str`` objects using plain ASCII characters. cmlenz@263: That means that if sources contain strings using other encodings, it cmlenz@263: is the job of the extractor implementation to do the decoding to cmlenz@263: ``unicode`` objects. cmlenz@263: cmlenz@263: Next, you should register that function as an entry point. This requires your cmlenz@263: ``setup.py`` script to use `setuptools`_, and your package to be installed with cmlenz@263: the necessary metadata. If that's taken care of, add something like the cmlenz@263: following to your ``setup.py`` script: cmlenz@263: cmlenz@263: .. code-block:: python cmlenz@263: cmlenz@263: def setup(... cmlenz@263: cmlenz@263: entry_points = """ cmlenz@263: [babel.extractors] cmlenz@263: xxx = your.package:extract_xxx cmlenz@263: """, cmlenz@263: cmlenz@263: That is, add your extraction method to the entry point group cmlenz@263: ``babel.extractors``, where the name of the entry point is the name that people cmlenz@263: will use to reference the extraction method, and the value being the module and cmlenz@263: the name of the function (separated by a colon) implementing the actual cmlenz@263: extraction. cmlenz@263: cmlenz@263: .. note:: As shown in `Referencing Extraction Methods`_, declaring an entry cmlenz@263: point is not strictly required, as users can still reference the cmlenz@263: extraction function directly. But whenever possible, the entry point cmlenz@263: should be declared to make configuration more convenient. cmlenz@263: cmlenz@263: .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools cmlenz@263: cmlenz@263: cmlenz@263: ------------------- cmlenz@263: Translator Comments cmlenz@263: ------------------- cmlenz@263: cmlenz@263: First of all what are comments tags. Comments tags are excerpts of text to cmlenz@263: search for in comments, only comments, right before the `python gettext`_ cmlenz@263: calls, as shown on the following example: cmlenz@263: cmlenz@263: .. _`python gettext`: http://docs.python.org/lib/module-gettext.html cmlenz@263: cmlenz@263: .. code-block:: python cmlenz@263: cmlenz@263: # NOTE: This is a comment about `Foo Bar` cmlenz@263: _('Foo Bar') cmlenz@263: cmlenz@263: The comments tag for the above example would be ``NOTE:``, and the translator cmlenz@263: comment for that tag would be ``This is a comment about `Foo Bar```. cmlenz@263: cmlenz@263: The resulting output in the catalog template would be something like:: cmlenz@263: cmlenz@263: #. This is a comment about `Foo Bar` cmlenz@263: #: main.py:2 cmlenz@263: msgid "Foo Bar" cmlenz@263: msgstr "" cmlenz@263: cmlenz@263: Now, you might ask, why would I need that? cmlenz@263: cmlenz@263: Consider this simple case; you have a menu item called “manual”. You know what cmlenz@263: it means, but when the translator sees this they will wonder did you mean: cmlenz@263: cmlenz@263: 1. a document or help manual, or cmlenz@263: 2. a manual process? cmlenz@263: cmlenz@263: This is the simplest case where a translation comment such as cmlenz@263: “The installation manual” helps to clarify the situation and makes a translator cmlenz@263: more productive. cmlenz@263: cmlenz@263: .. note:: Whether translator comments can be extracted depends on the extraction cmlenz@263: method in use. The Python extractor provided by Babel does implement cmlenz@263: this feature, but others may not.