cmlenz@2: .. -*- mode: rst; encoding: utf-8 -*- cmlenz@2: cmlenz@2: ============================= cmlenz@2: Working with Message Catalogs cmlenz@2: ============================= cmlenz@2: cmlenz@2: .. contents:: Contents cmlenz@250: :depth: 3 cmlenz@2: .. sectnum:: cmlenz@2: cmlenz@2: cmlenz@2: Introduction cmlenz@2: ============ cmlenz@2: cmlenz@2: The ``gettext`` translation system enables you to mark any strings used in your cmlenz@2: application as subject to localization, by wrapping them in functions such as cmlenz@2: ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the cmlenz@40: ``gettext`` function is often aliased to ``_(str)``, so you can write: cmlenz@40: cmlenz@40: .. code-block:: python cmlenz@2: cmlenz@2: print _("Hello") cmlenz@2: cmlenz@40: instead of just: cmlenz@40: cmlenz@40: .. code-block:: python cmlenz@2: cmlenz@2: print "Hello" cmlenz@2: cmlenz@2: to make the string "Hello" localizable. cmlenz@2: cmlenz@2: Message catalogs are collections of translations for such localizable messages cmlenz@2: used in an application. They are commonly stored in PO (Portable Object) and MO cmlenz@2: (Machine Object) files, the formats of which are defined by the GNU `gettext`_ cmlenz@2: tools and the GNU `translation project`_. cmlenz@2: cmlenz@2: .. _`gettext`: http://www.gnu.org/software/gettext/ cmlenz@2: .. _`translation project`: http://sourceforge.net/projects/translation cmlenz@2: cmlenz@2: The general procedure for building message catalogs looks something like this: cmlenz@2: cmlenz@2: * use a tool (such as ``xgettext``) to extract localizable strings from the cmlenz@2: code base and write them to a POT (PO Template) file. cmlenz@2: * make a copy of the POT file for a specific locale (for example, "en_US") cmlenz@2: and start translating the messages cmlenz@2: * use a tool such as ``msgfmt`` to compile the locale PO file into an binary cmlenz@2: MO file cmlenz@2: * later, when code changes make it necessary to update the translations, you cmlenz@2: regenerate the POT file and merge the changes into the various cmlenz@2: locale-specific PO files, for example using ``msgmerge`` cmlenz@2: cmlenz@2: Python provides the `gettext module`_ as part of the standard library, which cmlenz@2: enables applications to work with appropriately generated MO files. cmlenz@2: cmlenz@2: .. _`gettext module`: http://docs.python.org/lib/module-gettext.html cmlenz@2: cmlenz@2: As ``gettext`` provides a solid and well supported foundation for translating cmlenz@2: application messages, Babel does not reinvent the wheel, but rather reuses this cmlenz@2: infrastructure, and makes it easier to build message catalogs for Python cmlenz@2: applications. cmlenz@2: cmlenz@2: cmlenz@2: Message Extraction cmlenz@2: ================== cmlenz@2: cmlenz@2: Babel provides functionality similar to that of the ``xgettext`` program, cmlenz@2: except that only extraction from Python source files is built-in, while support cmlenz@2: for other file formats can be added using a simple extension mechanism. cmlenz@2: cmlenz@48: Unlike ``xgettext``, which is usually invoked once for every file, the routines cmlenz@48: for message extraction in Babel operate on directories. While the per-file cmlenz@48: approach of ``xgettext`` works nicely with projects using a ``Makefile``, cmlenz@48: Python projects rarely use ``make``, and thus a different mechanism is needed cmlenz@48: for extracting messages from the heterogeneous collection of source files that cmlenz@48: many Python projects are composed of. cmlenz@48: cmlenz@48: When message extraction is based on directories instead of individual files, cmlenz@48: there needs to be a way to configure which files should be treated in which cmlenz@48: manner. For example, while many projects may contain ``.html`` files, some of cmlenz@48: those files may be static HTML files that don't contain localizable message, cmlenz@48: while others may be `Django`_ templates, and still others may contain `Genshi`_ cmlenz@48: markup templates. Some projects may even mix HTML files for different templates cmlenz@48: languages (for whatever reason). Therefore the way in which messages are cmlenz@48: extracted from source files can not only depend on the file extension, but cmlenz@48: needs to be controllable in a precise manner. cmlenz@48: cmlenz@48: .. _`Django`: http://www.djangoproject.com/ cmlenz@48: .. _`Genshi`: http://genshi.edgewall.org/ cmlenz@48: cmlenz@48: Babel accepts a configuration file to specify this mapping of files to cmlenz@48: extraction methods, which is described below. cmlenz@2: cmlenz@2: cmlenz@250: .. _`frontends`: cmlenz@250: cmlenz@250: ---------- cmlenz@250: Front-Ends cmlenz@250: ---------- cmlenz@250: cmlenz@250: Babel provides two different front-ends to access its functionality for working cmlenz@250: with message catalogs: cmlenz@250: cmlenz@250: * A `Command-line interface `_, and cmlenz@250: * `Integration with distutils/setuptools `_ cmlenz@250: cmlenz@250: Which one you choose depends on the nature of your project. For most modern cmlenz@250: Python projects, the distutils/setuptools integration is probably more cmlenz@250: convenient. cmlenz@250: cmlenz@250: cmlenz@48: .. _`mapping`: cmlenz@2: cmlenz@48: ------------------------------------------- cmlenz@48: Extraction Method Mapping and Configuration cmlenz@48: ------------------------------------------- cmlenz@48: cmlenz@48: The mapping of extraction methods to files in Babel is done via a configuration cmlenz@48: file. This file maps extended glob patterns to the names of the extraction cmlenz@48: methods, and can also set various options for each pattern (which options are cmlenz@48: available depends on the specific extraction method). cmlenz@48: cmlenz@48: For example, the following configuration adds extraction of messages from both cmlenz@48: Genshi markup templates and text templates: cmlenz@48: cmlenz@48: .. code-block:: ini cmlenz@48: cmlenz@48: # Extraction from Python source files cmlenz@48: cmlenz@250: [python: **.py] cmlenz@48: cmlenz@48: # Extraction from Genshi HTML and text templates cmlenz@48: cmlenz@250: [genshi: **/templates/**.html] cmlenz@48: ignore_tags = script,style cmlenz@48: include_attrs = alt title summary cmlenz@48: cmlenz@250: [genshi: **/templates/**.txt] cmlenz@144: template_class = genshi.template:TextTemplate cmlenz@48: encoding = ISO-8819-15 cmlenz@48: jruigrok@552: # Extraction from JavaScript files jruigrok@552: jruigrok@552: [javascript: **.js] jruigrok@552: extract_messages = $._, jQuery._ jruigrok@552: cmlenz@48: The configuration file syntax is based on the format commonly found in ``.INI`` cmlenz@48: files on Windows systems, and as supported by the ``ConfigParser`` module in cmlenz@250: the Python standard library. Section names (the strings enclosed in square cmlenz@48: brackets) specify both the name of the extraction method, and the extended glob cmlenz@48: pattern to specify the files that this extraction method should be used for, cmlenz@48: separated by a colon. The options in the sections are passed to the extraction cmlenz@48: method. Which options are available is specific to the extraction method used. cmlenz@48: cmlenz@48: The extended glob patterns used in this configuration are similar to the glob cmlenz@48: patterns provided by most shells. A single asterisk (``*``) is a wildcard for cmlenz@48: any number of characters (except for the pathname component separator "/"), cmlenz@48: while a question mark (``?``) only matches a single character. In addition, cmlenz@48: two subsequent asterisk characters (``**``) can be used to make the wildcard cmlenz@48: match any directory level, so the pattern ``**.txt`` matches any file with the cmlenz@48: extension ``.txt`` in any directory. cmlenz@48: cmlenz@48: Lines that start with a ``#`` or ``;`` character are ignored and can be used cmlenz@250: for comments. Empty lines are ignored, too. cmlenz@48: cmlenz@49: .. note:: if you're performing message extraction using the command Babel cmlenz@250: provides for integration into ``setup.py`` scripts, you can also cmlenz@250: provide this configuration in a different way, namely as a keyword cmlenz@250: argument to the ``setup()`` function. See `Distutils/Setuptools cmlenz@250: Integration`_ for more information. cmlenz@250: cmlenz@250: .. _`distutils/setuptools integration`: setup.html cmlenz@2: cmlenz@2: cmlenz@250: Default Extraction Methods cmlenz@250: -------------------------- cmlenz@2: jruigrok@553: Babel comes with a few builtin extractors: ``python`` (which extracts jruigrok@553: messages from Python source files), ``javascript``, and ``ignore`` (which jruigrok@553: extracts nothing). cmlenz@2: cmlenz@250: The ``python`` extractor is by default mapped to the glob pattern ``**.py``, cmlenz@250: meaning it'll be applied to all files with the ``.py`` extension in any cmlenz@250: directory. If you specify your own mapping configuration, this default mapping cmlenz@268: is discarded, so you need to explicitly add it to your mapping (as shown in the cmlenz@268: example above.) cmlenz@49: cmlenz@250: cmlenz@250: .. _`referencing extraction methods`: cmlenz@250: cmlenz@250: Referencing Extraction Methods cmlenz@250: ------------------------------ cmlenz@250: cmlenz@250: To be able to use short extraction method names such as “genshi”, you need to cmlenz@250: have `pkg_resources`_ installed, and the package implementing that extraction cmlenz@250: method needs to have been installed with its meta data (the `egg-info`_). cmlenz@250: cmlenz@250: If this is not possible for some reason, you need to map the short names to cmlenz@250: fully qualified function names in an extract section in the mapping cmlenz@250: configuration. For example: cmlenz@250: cmlenz@250: .. code-block:: ini cmlenz@250: cmlenz@250: # Some custom extraction method cmlenz@250: cmlenz@250: [extractors] cmlenz@250: custom = mypackage.module:extract_custom cmlenz@250: cmlenz@250: [custom: **.ctm] cmlenz@250: some_option = foo cmlenz@250: cmlenz@250: Note that the builtin extraction methods ``python`` and ``ignore`` are available cmlenz@250: by default, even if `pkg_resources`_ is not installed. You should never need to cmlenz@250: explicitly define them in the ``[extractors]`` section. cmlenz@250: cmlenz@250: .. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs cmlenz@250: .. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources cmlenz@49: cmlenz@2: cmlenz@48: -------------------------- cmlenz@48: Writing Extraction Methods cmlenz@48: -------------------------- cmlenz@48: cmlenz@73: Adding new methods for extracting localizable methods is easy. First, you'll cmlenz@73: need to implement a function that complies with the following interface: cmlenz@2: cmlenz@40: .. code-block:: python cmlenz@40: cmlenz@84: def extract_xxx(fileobj, keywords, comment_tags, options): cmlenz@73: """Extract messages from XXX files. cmlenz@73: cmlenz@73: :param fileobj: the file-like object the messages should be extracted cmlenz@73: from cmlenz@73: :param keywords: a list of keywords (i.e. function names) that should cmlenz@73: be recognized as translation functions cmlenz@84: :param comment_tags: a list of translator tags to search for and cmlenz@84: include in the results cmlenz@73: :param options: a dictionary of additional options (optional) palgarvio@81: :return: an iterator over ``(lineno, funcname, message, comments)`` palgarvio@81: tuples cmlenz@73: :rtype: ``iterator`` cmlenz@73: """ cmlenz@73: cmlenz@83: .. note:: Any strings in the tuples produced by this function must be either cmlenz@83: ``unicode`` objects, or ``str`` objects using plain ASCII characters. cmlenz@83: That means that if sources contain strings using other encodings, it cmlenz@83: is the job of the extractor implementation to do the decoding to cmlenz@83: ``unicode`` objects. cmlenz@83: cmlenz@73: Next, you should register that function as an entry point. This requires your cmlenz@73: ``setup.py`` script to use `setuptools`_, and your package to be installed with cmlenz@73: the necessary metadata. If that's taken care of, add something like the cmlenz@73: following to your ``setup.py`` script: cmlenz@73: cmlenz@73: .. code-block:: python cmlenz@73: cmlenz@73: def setup(... cmlenz@73: cmlenz@73: entry_points = """ cmlenz@73: [babel.extractors] cmlenz@73: xxx = your.package:extract_xxx cmlenz@73: """, cmlenz@73: cmlenz@73: That is, add your extraction method to the entry point group cmlenz@73: ``babel.extractors``, where the name of the entry point is the name that people cmlenz@73: will use to reference the extraction method, and the value being the module and cmlenz@73: the name of the function (separated by a colon) implementing the actual cmlenz@73: extraction. cmlenz@73: cmlenz@250: .. note:: As shown in `Referencing Extraction Methods`_, declaring an entry cmlenz@250: point is not strictly required, as users can still reference the cmlenz@250: extraction function directly. But whenever possible, the entry point cmlenz@250: should be declared to make configuration more convenient. cmlenz@250: cmlenz@73: .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools palgarvio@81: cmlenz@250: cmlenz@250: ------------------- cmlenz@250: Translator Comments cmlenz@250: ------------------- palgarvio@81: palgarvio@81: First of all what are comments tags. Comments tags are excerpts of text to palgarvio@81: search for in comments, only comments, right before the `python gettext`_ palgarvio@81: calls, as shown on the following example: palgarvio@81: palgarvio@81: .. _`python gettext`: http://docs.python.org/lib/module-gettext.html palgarvio@81: palgarvio@81: .. code-block:: python palgarvio@81: palgarvio@81: # NOTE: This is a comment about `Foo Bar` palgarvio@81: _('Foo Bar') palgarvio@81: palgarvio@81: The comments tag for the above example would be ``NOTE:``, and the translator palgarvio@81: comment for that tag would be ``This is a comment about `Foo Bar```. palgarvio@81: palgarvio@81: The resulting output in the catalog template would be something like:: palgarvio@81: pjenvey@109: #. This is a comment about `Foo Bar` palgarvio@81: #: main.py:2 palgarvio@81: msgid "Foo Bar" palgarvio@81: msgstr "" palgarvio@81: palgarvio@81: Now, you might ask, why would I need that? palgarvio@81: cmlenz@250: Consider this simple case; you have a menu item called “manual”. You know what palgarvio@81: it means, but when the translator sees this they will wonder did you mean: palgarvio@81: palgarvio@81: 1. a document or help manual, or palgarvio@81: 2. a manual process? palgarvio@81: palgarvio@81: This is the simplest case where a translation comment such as palgarvio@81: “The installation manual” helps to clarify the situation and makes a translator palgarvio@81: more productive. palgarvio@81: cmlenz@250: .. note:: Whether translator comments can be extracted depends on the extraction cmlenz@250: method in use. The Python extractor provided by Babel does implement cmlenz@250: this feature, but others may not.