142
|
1 .. -*- mode: rst; encoding: utf-8 -*-
|
|
2
|
|
3 =============================
|
|
4 Working with Message Catalogs
|
|
5 =============================
|
|
6
|
|
7 .. contents:: Contents
|
|
8 :depth: 2
|
|
9 .. sectnum::
|
|
10
|
|
11
|
|
12 Introduction
|
|
13 ============
|
|
14
|
|
15 The ``gettext`` translation system enables you to mark any strings used in your
|
|
16 application as subject to localization, by wrapping them in functions such as
|
|
17 ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the
|
|
18 ``gettext`` function is often aliased to ``_(str)``, so you can write:
|
|
19
|
|
20 .. code-block:: python
|
|
21
|
|
22 print _("Hello")
|
|
23
|
|
24 instead of just:
|
|
25
|
|
26 .. code-block:: python
|
|
27
|
|
28 print "Hello"
|
|
29
|
|
30 to make the string "Hello" localizable.
|
|
31
|
|
32 Message catalogs are collections of translations for such localizable messages
|
|
33 used in an application. They are commonly stored in PO (Portable Object) and MO
|
|
34 (Machine Object) files, the formats of which are defined by the GNU `gettext`_
|
|
35 tools and the GNU `translation project`_.
|
|
36
|
|
37 .. _`gettext`: http://www.gnu.org/software/gettext/
|
|
38 .. _`translation project`: http://sourceforge.net/projects/translation
|
|
39
|
|
40 The general procedure for building message catalogs looks something like this:
|
|
41
|
|
42 * use a tool (such as ``xgettext``) to extract localizable strings from the
|
|
43 code base and write them to a POT (PO Template) file.
|
|
44 * make a copy of the POT file for a specific locale (for example, "en_US")
|
|
45 and start translating the messages
|
|
46 * use a tool such as ``msgfmt`` to compile the locale PO file into an binary
|
|
47 MO file
|
|
48 * later, when code changes make it necessary to update the translations, you
|
|
49 regenerate the POT file and merge the changes into the various
|
|
50 locale-specific PO files, for example using ``msgmerge``
|
|
51
|
|
52 Python provides the `gettext module`_ as part of the standard library, which
|
|
53 enables applications to work with appropriately generated MO files.
|
|
54
|
|
55 .. _`gettext module`: http://docs.python.org/lib/module-gettext.html
|
|
56
|
|
57 As ``gettext`` provides a solid and well supported foundation for translating
|
|
58 application messages, Babel does not reinvent the wheel, but rather reuses this
|
|
59 infrastructure, and makes it easier to build message catalogs for Python
|
|
60 applications.
|
|
61
|
|
62
|
|
63 Message Extraction
|
|
64 ==================
|
|
65
|
|
66 Babel provides functionality similar to that of the ``xgettext`` program,
|
|
67 except that only extraction from Python source files is built-in, while support
|
|
68 for other file formats can be added using a simple extension mechanism.
|
|
69
|
|
70 Unlike ``xgettext``, which is usually invoked once for every file, the routines
|
|
71 for message extraction in Babel operate on directories. While the per-file
|
|
72 approach of ``xgettext`` works nicely with projects using a ``Makefile``,
|
|
73 Python projects rarely use ``make``, and thus a different mechanism is needed
|
|
74 for extracting messages from the heterogeneous collection of source files that
|
|
75 many Python projects are composed of.
|
|
76
|
|
77 When message extraction is based on directories instead of individual files,
|
|
78 there needs to be a way to configure which files should be treated in which
|
|
79 manner. For example, while many projects may contain ``.html`` files, some of
|
|
80 those files may be static HTML files that don't contain localizable message,
|
|
81 while others may be `Django`_ templates, and still others may contain `Genshi`_
|
|
82 markup templates. Some projects may even mix HTML files for different templates
|
|
83 languages (for whatever reason). Therefore the way in which messages are
|
|
84 extracted from source files can not only depend on the file extension, but
|
|
85 needs to be controllable in a precise manner.
|
|
86
|
|
87 .. _`Django`: http://www.djangoproject.com/
|
|
88 .. _`Genshi`: http://genshi.edgewall.org/
|
|
89
|
|
90 Babel accepts a configuration file to specify this mapping of files to
|
|
91 extraction methods, which is described below.
|
|
92
|
|
93
|
|
94 .. _`mapping`:
|
|
95
|
|
96 -------------------------------------------
|
|
97 Extraction Method Mapping and Configuration
|
|
98 -------------------------------------------
|
|
99
|
|
100 The mapping of extraction methods to files in Babel is done via a configuration
|
|
101 file. This file maps extended glob patterns to the names of the extraction
|
|
102 methods, and can also set various options for each pattern (which options are
|
|
103 available depends on the specific extraction method).
|
|
104
|
|
105 For example, the following configuration adds extraction of messages from both
|
|
106 Genshi markup templates and text templates:
|
|
107
|
|
108 .. code-block:: ini
|
|
109
|
|
110 # Extraction from Python source files
|
|
111
|
|
112 [python: foobar/**.py]
|
|
113
|
|
114 # Extraction from Genshi HTML and text templates
|
|
115
|
|
116 [genshi: foobar/**/templates/**.html]
|
|
117 ignore_tags = script,style
|
|
118 include_attrs = alt title summary
|
|
119
|
|
120 [genshi: foobar/**/templates/**.txt]
|
147
|
121 template_class = genshi.template:TextTemplate
|
142
|
122 encoding = ISO-8819-15
|
|
123
|
|
124 The configuration file syntax is based on the format commonly found in ``.INI``
|
|
125 files on Windows systems, and as supported by the ``ConfigParser`` module in
|
|
126 the Python standard libraries. Section names (the strings enclosed in square
|
|
127 brackets) specify both the name of the extraction method, and the extended glob
|
|
128 pattern to specify the files that this extraction method should be used for,
|
|
129 separated by a colon. The options in the sections are passed to the extraction
|
|
130 method. Which options are available is specific to the extraction method used.
|
|
131
|
|
132 The extended glob patterns used in this configuration are similar to the glob
|
|
133 patterns provided by most shells. A single asterisk (``*``) is a wildcard for
|
|
134 any number of characters (except for the pathname component separator "/"),
|
|
135 while a question mark (``?``) only matches a single character. In addition,
|
|
136 two subsequent asterisk characters (``**``) can be used to make the wildcard
|
|
137 match any directory level, so the pattern ``**.txt`` matches any file with the
|
|
138 extension ``.txt`` in any directory.
|
|
139
|
|
140 Lines that start with a ``#`` or ``;`` character are ignored and can be used
|
|
141 for comments. Empty lines are also ignored, too.
|
|
142
|
|
143 .. note:: if you're performing message extraction using the command Babel
|
|
144 provides for integration into ``setup.py`` scripts (see below), you
|
|
145 can also provide this configuration in a different way, namely as a
|
|
146 keyword argument to the ``setup()`` function.
|
|
147
|
|
148
|
|
149 ----------
|
|
150 Front-Ends
|
|
151 ----------
|
|
152
|
|
153 Babel provides two different front-ends to access its functionality for working
|
|
154 with message catalogs:
|
|
155
|
|
156 * A `Command-line interface <cmdline.html>`_, and
|
|
157 * `Integration with distutils/setuptools <setup.html>`_
|
|
158
|
|
159 Which one you choose depends on the nature of your project. For most modern
|
|
160 Python projects, the distutils/setuptools integration is probably more
|
|
161 convenient.
|
|
162
|
|
163
|
|
164 --------------------------
|
|
165 Writing Extraction Methods
|
|
166 --------------------------
|
|
167
|
|
168 Adding new methods for extracting localizable methods is easy. First, you'll
|
|
169 need to implement a function that complies with the following interface:
|
|
170
|
|
171 .. code-block:: python
|
|
172
|
|
173 def extract_xxx(fileobj, keywords, comment_tags, options):
|
|
174 """Extract messages from XXX files.
|
|
175
|
|
176 :param fileobj: the file-like object the messages should be extracted
|
|
177 from
|
|
178 :param keywords: a list of keywords (i.e. function names) that should
|
|
179 be recognized as translation functions
|
|
180 :param comment_tags: a list of translator tags to search for and
|
|
181 include in the results
|
|
182 :param options: a dictionary of additional options (optional)
|
|
183 :return: an iterator over ``(lineno, funcname, message, comments)``
|
|
184 tuples
|
|
185 :rtype: ``iterator``
|
|
186 """
|
|
187
|
|
188 .. note:: Any strings in the tuples produced by this function must be either
|
|
189 ``unicode`` objects, or ``str`` objects using plain ASCII characters.
|
|
190 That means that if sources contain strings using other encodings, it
|
|
191 is the job of the extractor implementation to do the decoding to
|
|
192 ``unicode`` objects.
|
|
193
|
|
194 Next, you should register that function as an entry point. This requires your
|
|
195 ``setup.py`` script to use `setuptools`_, and your package to be installed with
|
|
196 the necessary metadata. If that's taken care of, add something like the
|
|
197 following to your ``setup.py`` script:
|
|
198
|
|
199 .. code-block:: python
|
|
200
|
|
201 def setup(...
|
|
202
|
|
203 entry_points = """
|
|
204 [babel.extractors]
|
|
205 xxx = your.package:extract_xxx
|
|
206 """,
|
|
207
|
|
208 That is, add your extraction method to the entry point group
|
|
209 ``babel.extractors``, where the name of the entry point is the name that people
|
|
210 will use to reference the extraction method, and the value being the module and
|
|
211 the name of the function (separated by a colon) implementing the actual
|
|
212 extraction.
|
|
213
|
|
214 .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools
|
|
215
|
|
216 Comments Tags And Translator Comments Explanation
|
|
217 .................................................
|
|
218
|
|
219 First of all what are comments tags. Comments tags are excerpts of text to
|
|
220 search for in comments, only comments, right before the `python gettext`_
|
|
221 calls, as shown on the following example:
|
|
222
|
|
223 .. _`python gettext`: http://docs.python.org/lib/module-gettext.html
|
|
224
|
|
225 .. code-block:: python
|
|
226
|
|
227 # NOTE: This is a comment about `Foo Bar`
|
|
228 _('Foo Bar')
|
|
229
|
|
230 The comments tag for the above example would be ``NOTE:``, and the translator
|
|
231 comment for that tag would be ``This is a comment about `Foo Bar```.
|
|
232
|
|
233 The resulting output in the catalog template would be something like::
|
|
234
|
|
235 #. This is a comment about `Foo Bar`
|
|
236 #: main.py:2
|
|
237 msgid "Foo Bar"
|
|
238 msgstr ""
|
|
239
|
|
240 Now, you might ask, why would I need that?
|
|
241
|
|
242 Consider this simple case; you have a menu item called “Manual”. You know what
|
|
243 it means, but when the translator sees this they will wonder did you mean:
|
|
244
|
|
245 1. a document or help manual, or
|
|
246 2. a manual process?
|
|
247
|
|
248 This is the simplest case where a translation comment such as
|
|
249 “The installation manual” helps to clarify the situation and makes a translator
|
|
250 more productive.
|
|
251
|
|
252 **More examples of the need for translation comments**
|
|
253
|
|
254 Real world examples are best. This is a discussion over the use of the word
|
|
255 “Forward” in Northern Sotho:
|
|
256
|
|
257 “When you go forward. You go ‘Pele’, but when you forward the document,
|
|
258 you ‘Fetišetša pele’. So if you just say forward, we don’t know what you are
|
|
259 talking about.
|
|
260 It is better if it's in a sentence. But in this case i think we will use ‘pele’
|
|
261 because on the string no. 86 and 88 there is “show previous page in history”
|
|
262 and “show next page in history”.
|
|
263
|
|
264 Were the translators guess correct? I think so, but it makes it so much easier
|
|
265 if they don’t need to be super `sleuths`_ as well as translators.
|
|
266
|
|
267 .. _`sleuths`: http://www.thefreedictionary.com/sleuth
|
|
268
|
|
269
|
|
270 *Explanation Borrowed From:* `Wordforge`_
|
|
271
|
|
272 .. _`Wordforge`: http://www.wordforge.org/static/translation_comments.html
|
|
273
|
|
274 **Note**: Translator comments are currently only supported in python source
|
|
275 code.
|
|
276
|