cmlenz@4: .. -*- mode: rst; encoding: utf-8 -*- cmlenz@4: cmlenz@4: ============================= cmlenz@4: Working with Message Catalogs cmlenz@4: ============================= cmlenz@4: cmlenz@4: .. contents:: Contents cmlenz@252: :depth: 3 cmlenz@4: .. sectnum:: cmlenz@4: cmlenz@4: cmlenz@4: Introduction cmlenz@4: ============ cmlenz@4: cmlenz@4: The ``gettext`` translation system enables you to mark any strings used in your cmlenz@4: application as subject to localization, by wrapping them in functions such as cmlenz@4: ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the cmlenz@42: ``gettext`` function is often aliased to ``_(str)``, so you can write: cmlenz@42: cmlenz@42: .. code-block:: python cmlenz@4: cmlenz@4: print _("Hello") cmlenz@4: cmlenz@42: instead of just: cmlenz@42: cmlenz@42: .. code-block:: python cmlenz@4: cmlenz@4: print "Hello" cmlenz@4: cmlenz@4: to make the string "Hello" localizable. cmlenz@4: cmlenz@4: Message catalogs are collections of translations for such localizable messages cmlenz@4: used in an application. They are commonly stored in PO (Portable Object) and MO cmlenz@4: (Machine Object) files, the formats of which are defined by the GNU `gettext`_ cmlenz@4: tools and the GNU `translation project`_. cmlenz@4: cmlenz@4: .. _`gettext`: http://www.gnu.org/software/gettext/ cmlenz@4: .. _`translation project`: http://sourceforge.net/projects/translation cmlenz@4: cmlenz@4: The general procedure for building message catalogs looks something like this: cmlenz@4: cmlenz@4: * use a tool (such as ``xgettext``) to extract localizable strings from the cmlenz@4: code base and write them to a POT (PO Template) file. cmlenz@4: * make a copy of the POT file for a specific locale (for example, "en_US") cmlenz@4: and start translating the messages cmlenz@4: * use a tool such as ``msgfmt`` to compile the locale PO file into an binary cmlenz@4: MO file cmlenz@4: * later, when code changes make it necessary to update the translations, you cmlenz@4: regenerate the POT file and merge the changes into the various cmlenz@4: locale-specific PO files, for example using ``msgmerge`` cmlenz@4: cmlenz@4: Python provides the `gettext module`_ as part of the standard library, which cmlenz@4: enables applications to work with appropriately generated MO files. cmlenz@4: cmlenz@4: .. _`gettext module`: http://docs.python.org/lib/module-gettext.html cmlenz@4: cmlenz@4: As ``gettext`` provides a solid and well supported foundation for translating cmlenz@4: application messages, Babel does not reinvent the wheel, but rather reuses this cmlenz@4: infrastructure, and makes it easier to build message catalogs for Python cmlenz@4: applications. cmlenz@4: cmlenz@4: cmlenz@4: Message Extraction cmlenz@4: ================== cmlenz@4: cmlenz@4: Babel provides functionality similar to that of the ``xgettext`` program, cmlenz@4: except that only extraction from Python source files is built-in, while support cmlenz@4: for other file formats can be added using a simple extension mechanism. cmlenz@4: cmlenz@50: Unlike ``xgettext``, which is usually invoked once for every file, the routines cmlenz@50: for message extraction in Babel operate on directories. While the per-file cmlenz@50: approach of ``xgettext`` works nicely with projects using a ``Makefile``, cmlenz@50: Python projects rarely use ``make``, and thus a different mechanism is needed cmlenz@50: for extracting messages from the heterogeneous collection of source files that cmlenz@50: many Python projects are composed of. cmlenz@50: cmlenz@50: When message extraction is based on directories instead of individual files, cmlenz@50: there needs to be a way to configure which files should be treated in which cmlenz@50: manner. For example, while many projects may contain ``.html`` files, some of cmlenz@50: those files may be static HTML files that don't contain localizable message, cmlenz@50: while others may be `Django`_ templates, and still others may contain `Genshi`_ cmlenz@50: markup templates. Some projects may even mix HTML files for different templates cmlenz@50: languages (for whatever reason). Therefore the way in which messages are cmlenz@50: extracted from source files can not only depend on the file extension, but cmlenz@50: needs to be controllable in a precise manner. cmlenz@50: cmlenz@50: .. _`Django`: http://www.djangoproject.com/ cmlenz@50: .. _`Genshi`: http://genshi.edgewall.org/ cmlenz@50: cmlenz@50: Babel accepts a configuration file to specify this mapping of files to cmlenz@50: extraction methods, which is described below. cmlenz@4: cmlenz@4: cmlenz@252: .. _`frontends`: cmlenz@252: cmlenz@252: ---------- cmlenz@252: Front-Ends cmlenz@252: ---------- cmlenz@252: cmlenz@252: Babel provides two different front-ends to access its functionality for working cmlenz@252: with message catalogs: cmlenz@252: cmlenz@252: * A `Command-line interface `_, and cmlenz@252: * `Integration with distutils/setuptools `_ cmlenz@252: cmlenz@252: Which one you choose depends on the nature of your project. For most modern cmlenz@252: Python projects, the distutils/setuptools integration is probably more cmlenz@252: convenient. cmlenz@252: cmlenz@252: cmlenz@50: .. _`mapping`: cmlenz@4: cmlenz@50: ------------------------------------------- cmlenz@50: Extraction Method Mapping and Configuration cmlenz@50: ------------------------------------------- cmlenz@50: cmlenz@50: The mapping of extraction methods to files in Babel is done via a configuration cmlenz@50: file. This file maps extended glob patterns to the names of the extraction cmlenz@50: methods, and can also set various options for each pattern (which options are cmlenz@50: available depends on the specific extraction method). cmlenz@50: cmlenz@50: For example, the following configuration adds extraction of messages from both cmlenz@50: Genshi markup templates and text templates: cmlenz@50: cmlenz@50: .. code-block:: ini cmlenz@50: cmlenz@50: # Extraction from Python source files cmlenz@50: cmlenz@252: [python: **.py] cmlenz@50: cmlenz@50: # Extraction from Genshi HTML and text templates cmlenz@50: cmlenz@252: [genshi: **/templates/**.html] cmlenz@50: ignore_tags = script,style cmlenz@50: include_attrs = alt title summary cmlenz@50: cmlenz@252: [genshi: **/templates/**.txt] cmlenz@146: template_class = genshi.template:TextTemplate cmlenz@50: encoding = ISO-8819-15 cmlenz@50: cmlenz@50: The configuration file syntax is based on the format commonly found in ``.INI`` cmlenz@50: files on Windows systems, and as supported by the ``ConfigParser`` module in cmlenz@252: the Python standard library. Section names (the strings enclosed in square cmlenz@50: brackets) specify both the name of the extraction method, and the extended glob cmlenz@50: pattern to specify the files that this extraction method should be used for, cmlenz@50: separated by a colon. The options in the sections are passed to the extraction cmlenz@50: method. Which options are available is specific to the extraction method used. cmlenz@50: cmlenz@50: The extended glob patterns used in this configuration are similar to the glob cmlenz@50: patterns provided by most shells. A single asterisk (``*``) is a wildcard for cmlenz@50: any number of characters (except for the pathname component separator "/"), cmlenz@50: while a question mark (``?``) only matches a single character. In addition, cmlenz@50: two subsequent asterisk characters (``**``) can be used to make the wildcard cmlenz@50: match any directory level, so the pattern ``**.txt`` matches any file with the cmlenz@50: extension ``.txt`` in any directory. cmlenz@50: cmlenz@50: Lines that start with a ``#`` or ``;`` character are ignored and can be used cmlenz@252: for comments. Empty lines are ignored, too. cmlenz@50: cmlenz@51: .. note:: if you're performing message extraction using the command Babel cmlenz@252: provides for integration into ``setup.py`` scripts, you can also cmlenz@252: provide this configuration in a different way, namely as a keyword cmlenz@252: argument to the ``setup()`` function. See `Distutils/Setuptools cmlenz@252: Integration`_ for more information. cmlenz@252: cmlenz@252: .. _`distutils/setuptools integration`: setup.html cmlenz@4: cmlenz@4: cmlenz@252: Default Extraction Methods cmlenz@252: -------------------------- cmlenz@4: cmlenz@252: Babel comes with only two builtin extractors: ``python`` (which extracts cmlenz@252: messages from Python source files) and ``ignore`` (which extracts nothing). cmlenz@4: cmlenz@252: The ``python`` extractor is by default mapped to the glob pattern ``**.py``, cmlenz@252: meaning it'll be applied to all files with the ``.py`` extension in any cmlenz@252: directory. If you specify your own mapping configuration, this default mapping cmlenz@270: is discarded, so you need to explicitly add it to your mapping (as shown in the cmlenz@270: example above.) cmlenz@51: cmlenz@252: cmlenz@252: .. _`referencing extraction methods`: cmlenz@252: cmlenz@252: Referencing Extraction Methods cmlenz@252: ------------------------------ cmlenz@252: cmlenz@252: To be able to use short extraction method names such as “genshi”, you need to cmlenz@252: have `pkg_resources`_ installed, and the package implementing that extraction cmlenz@252: method needs to have been installed with its meta data (the `egg-info`_). cmlenz@252: cmlenz@252: If this is not possible for some reason, you need to map the short names to cmlenz@252: fully qualified function names in an extract section in the mapping cmlenz@252: configuration. For example: cmlenz@252: cmlenz@252: .. code-block:: ini cmlenz@252: cmlenz@252: # Some custom extraction method cmlenz@252: cmlenz@252: [extractors] cmlenz@252: custom = mypackage.module:extract_custom cmlenz@252: cmlenz@252: [custom: **.ctm] cmlenz@252: some_option = foo cmlenz@252: cmlenz@252: Note that the builtin extraction methods ``python`` and ``ignore`` are available cmlenz@252: by default, even if `pkg_resources`_ is not installed. You should never need to cmlenz@252: explicitly define them in the ``[extractors]`` section. cmlenz@252: cmlenz@252: .. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs cmlenz@252: .. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources cmlenz@51: cmlenz@4: cmlenz@50: -------------------------- cmlenz@50: Writing Extraction Methods cmlenz@50: -------------------------- cmlenz@50: cmlenz@75: Adding new methods for extracting localizable methods is easy. First, you'll cmlenz@75: need to implement a function that complies with the following interface: cmlenz@4: cmlenz@42: .. code-block:: python cmlenz@42: cmlenz@86: def extract_xxx(fileobj, keywords, comment_tags, options): cmlenz@75: """Extract messages from XXX files. cmlenz@75: cmlenz@75: :param fileobj: the file-like object the messages should be extracted cmlenz@75: from cmlenz@75: :param keywords: a list of keywords (i.e. function names) that should cmlenz@75: be recognized as translation functions cmlenz@86: :param comment_tags: a list of translator tags to search for and cmlenz@86: include in the results cmlenz@75: :param options: a dictionary of additional options (optional) palgarvio@83: :return: an iterator over ``(lineno, funcname, message, comments)`` palgarvio@83: tuples cmlenz@75: :rtype: ``iterator`` cmlenz@75: """ cmlenz@75: cmlenz@85: .. note:: Any strings in the tuples produced by this function must be either cmlenz@85: ``unicode`` objects, or ``str`` objects using plain ASCII characters. cmlenz@85: That means that if sources contain strings using other encodings, it cmlenz@85: is the job of the extractor implementation to do the decoding to cmlenz@85: ``unicode`` objects. cmlenz@85: cmlenz@75: Next, you should register that function as an entry point. This requires your cmlenz@75: ``setup.py`` script to use `setuptools`_, and your package to be installed with cmlenz@75: the necessary metadata. If that's taken care of, add something like the cmlenz@75: following to your ``setup.py`` script: cmlenz@75: cmlenz@75: .. code-block:: python cmlenz@75: cmlenz@75: def setup(... cmlenz@75: cmlenz@75: entry_points = """ cmlenz@75: [babel.extractors] cmlenz@75: xxx = your.package:extract_xxx cmlenz@75: """, cmlenz@75: cmlenz@75: That is, add your extraction method to the entry point group cmlenz@75: ``babel.extractors``, where the name of the entry point is the name that people cmlenz@75: will use to reference the extraction method, and the value being the module and cmlenz@75: the name of the function (separated by a colon) implementing the actual cmlenz@75: extraction. cmlenz@75: cmlenz@252: .. note:: As shown in `Referencing Extraction Methods`_, declaring an entry cmlenz@252: point is not strictly required, as users can still reference the cmlenz@252: extraction function directly. But whenever possible, the entry point cmlenz@252: should be declared to make configuration more convenient. cmlenz@252: cmlenz@75: .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools palgarvio@83: cmlenz@252: cmlenz@252: ------------------- cmlenz@252: Translator Comments cmlenz@252: ------------------- palgarvio@83: palgarvio@83: First of all what are comments tags. Comments tags are excerpts of text to palgarvio@83: search for in comments, only comments, right before the `python gettext`_ palgarvio@83: calls, as shown on the following example: palgarvio@83: palgarvio@83: .. _`python gettext`: http://docs.python.org/lib/module-gettext.html palgarvio@83: palgarvio@83: .. code-block:: python palgarvio@83: palgarvio@83: # NOTE: This is a comment about `Foo Bar` palgarvio@83: _('Foo Bar') palgarvio@83: palgarvio@83: The comments tag for the above example would be ``NOTE:``, and the translator palgarvio@83: comment for that tag would be ``This is a comment about `Foo Bar```. palgarvio@83: palgarvio@83: The resulting output in the catalog template would be something like:: palgarvio@83: pjenvey@111: #. This is a comment about `Foo Bar` palgarvio@83: #: main.py:2 palgarvio@83: msgid "Foo Bar" palgarvio@83: msgstr "" palgarvio@83: palgarvio@83: Now, you might ask, why would I need that? palgarvio@83: cmlenz@252: Consider this simple case; you have a menu item called “manual”. You know what palgarvio@83: it means, but when the translator sees this they will wonder did you mean: palgarvio@83: palgarvio@83: 1. a document or help manual, or palgarvio@83: 2. a manual process? palgarvio@83: palgarvio@83: This is the simplest case where a translation comment such as palgarvio@83: “The installation manual” helps to clarify the situation and makes a translator palgarvio@83: more productive. palgarvio@83: cmlenz@252: .. note:: Whether translator comments can be extracted depends on the extraction cmlenz@252: method in use. The Python extractor provided by Babel does implement cmlenz@252: this feature, but others may not.