263
|
1 .. -*- mode: rst; encoding: utf-8 -*-
|
|
2
|
|
3 =============================
|
|
4 Working with Message Catalogs
|
|
5 =============================
|
|
6
|
|
7 .. contents:: Contents
|
|
8 :depth: 3
|
|
9 .. sectnum::
|
|
10
|
|
11
|
|
12 Introduction
|
|
13 ============
|
|
14
|
|
15 The ``gettext`` translation system enables you to mark any strings used in your
|
|
16 application as subject to localization, by wrapping them in functions such as
|
|
17 ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the
|
|
18 ``gettext`` function is often aliased to ``_(str)``, so you can write:
|
|
19
|
|
20 .. code-block:: python
|
|
21
|
|
22 print _("Hello")
|
|
23
|
|
24 instead of just:
|
|
25
|
|
26 .. code-block:: python
|
|
27
|
|
28 print "Hello"
|
|
29
|
|
30 to make the string "Hello" localizable.
|
|
31
|
|
32 Message catalogs are collections of translations for such localizable messages
|
|
33 used in an application. They are commonly stored in PO (Portable Object) and MO
|
|
34 (Machine Object) files, the formats of which are defined by the GNU `gettext`_
|
|
35 tools and the GNU `translation project`_.
|
|
36
|
|
37 .. _`gettext`: http://www.gnu.org/software/gettext/
|
|
38 .. _`translation project`: http://sourceforge.net/projects/translation
|
|
39
|
|
40 The general procedure for building message catalogs looks something like this:
|
|
41
|
|
42 * use a tool (such as ``xgettext``) to extract localizable strings from the
|
|
43 code base and write them to a POT (PO Template) file.
|
|
44 * make a copy of the POT file for a specific locale (for example, "en_US")
|
|
45 and start translating the messages
|
|
46 * use a tool such as ``msgfmt`` to compile the locale PO file into an binary
|
|
47 MO file
|
|
48 * later, when code changes make it necessary to update the translations, you
|
|
49 regenerate the POT file and merge the changes into the various
|
|
50 locale-specific PO files, for example using ``msgmerge``
|
|
51
|
|
52 Python provides the `gettext module`_ as part of the standard library, which
|
|
53 enables applications to work with appropriately generated MO files.
|
|
54
|
|
55 .. _`gettext module`: http://docs.python.org/lib/module-gettext.html
|
|
56
|
|
57 As ``gettext`` provides a solid and well supported foundation for translating
|
|
58 application messages, Babel does not reinvent the wheel, but rather reuses this
|
|
59 infrastructure, and makes it easier to build message catalogs for Python
|
|
60 applications.
|
|
61
|
|
62
|
|
63 Message Extraction
|
|
64 ==================
|
|
65
|
|
66 Babel provides functionality similar to that of the ``xgettext`` program,
|
|
67 except that only extraction from Python source files is built-in, while support
|
|
68 for other file formats can be added using a simple extension mechanism.
|
|
69
|
|
70 Unlike ``xgettext``, which is usually invoked once for every file, the routines
|
|
71 for message extraction in Babel operate on directories. While the per-file
|
|
72 approach of ``xgettext`` works nicely with projects using a ``Makefile``,
|
|
73 Python projects rarely use ``make``, and thus a different mechanism is needed
|
|
74 for extracting messages from the heterogeneous collection of source files that
|
|
75 many Python projects are composed of.
|
|
76
|
|
77 When message extraction is based on directories instead of individual files,
|
|
78 there needs to be a way to configure which files should be treated in which
|
|
79 manner. For example, while many projects may contain ``.html`` files, some of
|
|
80 those files may be static HTML files that don't contain localizable message,
|
|
81 while others may be `Django`_ templates, and still others may contain `Genshi`_
|
|
82 markup templates. Some projects may even mix HTML files for different templates
|
|
83 languages (for whatever reason). Therefore the way in which messages are
|
|
84 extracted from source files can not only depend on the file extension, but
|
|
85 needs to be controllable in a precise manner.
|
|
86
|
|
87 .. _`Django`: http://www.djangoproject.com/
|
|
88 .. _`Genshi`: http://genshi.edgewall.org/
|
|
89
|
|
90 Babel accepts a configuration file to specify this mapping of files to
|
|
91 extraction methods, which is described below.
|
|
92
|
|
93
|
|
94 .. _`frontends`:
|
|
95
|
|
96 ----------
|
|
97 Front-Ends
|
|
98 ----------
|
|
99
|
|
100 Babel provides two different front-ends to access its functionality for working
|
|
101 with message catalogs:
|
|
102
|
|
103 * A `Command-line interface <cmdline.html>`_, and
|
|
104 * `Integration with distutils/setuptools <setup.html>`_
|
|
105
|
|
106 Which one you choose depends on the nature of your project. For most modern
|
|
107 Python projects, the distutils/setuptools integration is probably more
|
|
108 convenient.
|
|
109
|
|
110
|
|
111 .. _`mapping`:
|
|
112
|
|
113 -------------------------------------------
|
|
114 Extraction Method Mapping and Configuration
|
|
115 -------------------------------------------
|
|
116
|
|
117 The mapping of extraction methods to files in Babel is done via a configuration
|
|
118 file. This file maps extended glob patterns to the names of the extraction
|
|
119 methods, and can also set various options for each pattern (which options are
|
|
120 available depends on the specific extraction method).
|
|
121
|
|
122 For example, the following configuration adds extraction of messages from both
|
|
123 Genshi markup templates and text templates:
|
|
124
|
|
125 .. code-block:: ini
|
|
126
|
|
127 # Extraction from Python source files
|
|
128
|
|
129 [python: **.py]
|
|
130
|
|
131 # Extraction from Genshi HTML and text templates
|
|
132
|
|
133 [genshi: **/templates/**.html]
|
|
134 ignore_tags = script,style
|
|
135 include_attrs = alt title summary
|
|
136
|
|
137 [genshi: **/templates/**.txt]
|
|
138 template_class = genshi.template:TextTemplate
|
|
139 encoding = ISO-8819-15
|
|
140
|
|
141 The configuration file syntax is based on the format commonly found in ``.INI``
|
|
142 files on Windows systems, and as supported by the ``ConfigParser`` module in
|
|
143 the Python standard library. Section names (the strings enclosed in square
|
|
144 brackets) specify both the name of the extraction method, and the extended glob
|
|
145 pattern to specify the files that this extraction method should be used for,
|
|
146 separated by a colon. The options in the sections are passed to the extraction
|
|
147 method. Which options are available is specific to the extraction method used.
|
|
148
|
|
149 The extended glob patterns used in this configuration are similar to the glob
|
|
150 patterns provided by most shells. A single asterisk (``*``) is a wildcard for
|
|
151 any number of characters (except for the pathname component separator "/"),
|
|
152 while a question mark (``?``) only matches a single character. In addition,
|
|
153 two subsequent asterisk characters (``**``) can be used to make the wildcard
|
|
154 match any directory level, so the pattern ``**.txt`` matches any file with the
|
|
155 extension ``.txt`` in any directory.
|
|
156
|
|
157 Lines that start with a ``#`` or ``;`` character are ignored and can be used
|
|
158 for comments. Empty lines are ignored, too.
|
|
159
|
|
160 .. note:: if you're performing message extraction using the command Babel
|
|
161 provides for integration into ``setup.py`` scripts, you can also
|
|
162 provide this configuration in a different way, namely as a keyword
|
|
163 argument to the ``setup()`` function. See `Distutils/Setuptools
|
|
164 Integration`_ for more information.
|
|
165
|
|
166 .. _`distutils/setuptools integration`: setup.html
|
|
167
|
|
168
|
|
169 Default Extraction Methods
|
|
170 --------------------------
|
|
171
|
|
172 Babel comes with only two builtin extractors: ``python`` (which extracts
|
|
173 messages from Python source files) and ``ignore`` (which extracts nothing).
|
|
174
|
|
175 The ``python`` extractor is by default mapped to the glob pattern ``**.py``,
|
|
176 meaning it'll be applied to all files with the ``.py`` extension in any
|
|
177 directory. If you specify your own mapping configuration, this default mapping
|
|
178 is not discarded, so you need to explicitly add it to your mapping (as shown in
|
|
179 the example above.)
|
|
180
|
|
181
|
|
182 .. _`referencing extraction methods`:
|
|
183
|
|
184 Referencing Extraction Methods
|
|
185 ------------------------------
|
|
186
|
|
187 To be able to use short extraction method names such as “genshi”, you need to
|
|
188 have `pkg_resources`_ installed, and the package implementing that extraction
|
|
189 method needs to have been installed with its meta data (the `egg-info`_).
|
|
190
|
|
191 If this is not possible for some reason, you need to map the short names to
|
|
192 fully qualified function names in an extract section in the mapping
|
|
193 configuration. For example:
|
|
194
|
|
195 .. code-block:: ini
|
|
196
|
|
197 # Some custom extraction method
|
|
198
|
|
199 [extractors]
|
|
200 custom = mypackage.module:extract_custom
|
|
201
|
|
202 [custom: **.ctm]
|
|
203 some_option = foo
|
|
204
|
|
205 Note that the builtin extraction methods ``python`` and ``ignore`` are available
|
|
206 by default, even if `pkg_resources`_ is not installed. You should never need to
|
|
207 explicitly define them in the ``[extractors]`` section.
|
|
208
|
|
209 .. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs
|
|
210 .. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources
|
|
211
|
|
212
|
|
213 --------------------------
|
|
214 Writing Extraction Methods
|
|
215 --------------------------
|
|
216
|
|
217 Adding new methods for extracting localizable methods is easy. First, you'll
|
|
218 need to implement a function that complies with the following interface:
|
|
219
|
|
220 .. code-block:: python
|
|
221
|
|
222 def extract_xxx(fileobj, keywords, comment_tags, options):
|
|
223 """Extract messages from XXX files.
|
|
224
|
|
225 :param fileobj: the file-like object the messages should be extracted
|
|
226 from
|
|
227 :param keywords: a list of keywords (i.e. function names) that should
|
|
228 be recognized as translation functions
|
|
229 :param comment_tags: a list of translator tags to search for and
|
|
230 include in the results
|
|
231 :param options: a dictionary of additional options (optional)
|
|
232 :return: an iterator over ``(lineno, funcname, message, comments)``
|
|
233 tuples
|
|
234 :rtype: ``iterator``
|
|
235 """
|
|
236
|
|
237 .. note:: Any strings in the tuples produced by this function must be either
|
|
238 ``unicode`` objects, or ``str`` objects using plain ASCII characters.
|
|
239 That means that if sources contain strings using other encodings, it
|
|
240 is the job of the extractor implementation to do the decoding to
|
|
241 ``unicode`` objects.
|
|
242
|
|
243 Next, you should register that function as an entry point. This requires your
|
|
244 ``setup.py`` script to use `setuptools`_, and your package to be installed with
|
|
245 the necessary metadata. If that's taken care of, add something like the
|
|
246 following to your ``setup.py`` script:
|
|
247
|
|
248 .. code-block:: python
|
|
249
|
|
250 def setup(...
|
|
251
|
|
252 entry_points = """
|
|
253 [babel.extractors]
|
|
254 xxx = your.package:extract_xxx
|
|
255 """,
|
|
256
|
|
257 That is, add your extraction method to the entry point group
|
|
258 ``babel.extractors``, where the name of the entry point is the name that people
|
|
259 will use to reference the extraction method, and the value being the module and
|
|
260 the name of the function (separated by a colon) implementing the actual
|
|
261 extraction.
|
|
262
|
|
263 .. note:: As shown in `Referencing Extraction Methods`_, declaring an entry
|
|
264 point is not strictly required, as users can still reference the
|
|
265 extraction function directly. But whenever possible, the entry point
|
|
266 should be declared to make configuration more convenient.
|
|
267
|
|
268 .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools
|
|
269
|
|
270
|
|
271 -------------------
|
|
272 Translator Comments
|
|
273 -------------------
|
|
274
|
|
275 First of all what are comments tags. Comments tags are excerpts of text to
|
|
276 search for in comments, only comments, right before the `python gettext`_
|
|
277 calls, as shown on the following example:
|
|
278
|
|
279 .. _`python gettext`: http://docs.python.org/lib/module-gettext.html
|
|
280
|
|
281 .. code-block:: python
|
|
282
|
|
283 # NOTE: This is a comment about `Foo Bar`
|
|
284 _('Foo Bar')
|
|
285
|
|
286 The comments tag for the above example would be ``NOTE:``, and the translator
|
|
287 comment for that tag would be ``This is a comment about `Foo Bar```.
|
|
288
|
|
289 The resulting output in the catalog template would be something like::
|
|
290
|
|
291 #. This is a comment about `Foo Bar`
|
|
292 #: main.py:2
|
|
293 msgid "Foo Bar"
|
|
294 msgstr ""
|
|
295
|
|
296 Now, you might ask, why would I need that?
|
|
297
|
|
298 Consider this simple case; you have a menu item called “manual”. You know what
|
|
299 it means, but when the translator sees this they will wonder did you mean:
|
|
300
|
|
301 1. a document or help manual, or
|
|
302 2. a manual process?
|
|
303
|
|
304 This is the simplest case where a translation comment such as
|
|
305 “The installation manual” helps to clarify the situation and makes a translator
|
|
306 more productive.
|
|
307
|
|
308 .. note:: Whether translator comments can be extracted depends on the extraction
|
|
309 method in use. The Python extractor provided by Babel does implement
|
|
310 this feature, but others may not.
|