Mercurial > babel > old > mirror
comparison 0.9.x/doc/messages.txt @ 263:5b7d3f9f7d74 stable
Create branch for 0.9.x maintenance.
author | cmlenz |
---|---|
date | Mon, 20 Aug 2007 08:34:32 +0000 |
parents | |
children | cec8c26302bd |
comparison
equal
deleted
inserted
replaced
197:79565db4faf0 | 263:5b7d3f9f7d74 |
---|---|
1 .. -*- mode: rst; encoding: utf-8 -*- | |
2 | |
3 ============================= | |
4 Working with Message Catalogs | |
5 ============================= | |
6 | |
7 .. contents:: Contents | |
8 :depth: 3 | |
9 .. sectnum:: | |
10 | |
11 | |
12 Introduction | |
13 ============ | |
14 | |
15 The ``gettext`` translation system enables you to mark any strings used in your | |
16 application as subject to localization, by wrapping them in functions such as | |
17 ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the | |
18 ``gettext`` function is often aliased to ``_(str)``, so you can write: | |
19 | |
20 .. code-block:: python | |
21 | |
22 print _("Hello") | |
23 | |
24 instead of just: | |
25 | |
26 .. code-block:: python | |
27 | |
28 print "Hello" | |
29 | |
30 to make the string "Hello" localizable. | |
31 | |
32 Message catalogs are collections of translations for such localizable messages | |
33 used in an application. They are commonly stored in PO (Portable Object) and MO | |
34 (Machine Object) files, the formats of which are defined by the GNU `gettext`_ | |
35 tools and the GNU `translation project`_. | |
36 | |
37 .. _`gettext`: http://www.gnu.org/software/gettext/ | |
38 .. _`translation project`: http://sourceforge.net/projects/translation | |
39 | |
40 The general procedure for building message catalogs looks something like this: | |
41 | |
42 * use a tool (such as ``xgettext``) to extract localizable strings from the | |
43 code base and write them to a POT (PO Template) file. | |
44 * make a copy of the POT file for a specific locale (for example, "en_US") | |
45 and start translating the messages | |
46 * use a tool such as ``msgfmt`` to compile the locale PO file into an binary | |
47 MO file | |
48 * later, when code changes make it necessary to update the translations, you | |
49 regenerate the POT file and merge the changes into the various | |
50 locale-specific PO files, for example using ``msgmerge`` | |
51 | |
52 Python provides the `gettext module`_ as part of the standard library, which | |
53 enables applications to work with appropriately generated MO files. | |
54 | |
55 .. _`gettext module`: http://docs.python.org/lib/module-gettext.html | |
56 | |
57 As ``gettext`` provides a solid and well supported foundation for translating | |
58 application messages, Babel does not reinvent the wheel, but rather reuses this | |
59 infrastructure, and makes it easier to build message catalogs for Python | |
60 applications. | |
61 | |
62 | |
63 Message Extraction | |
64 ================== | |
65 | |
66 Babel provides functionality similar to that of the ``xgettext`` program, | |
67 except that only extraction from Python source files is built-in, while support | |
68 for other file formats can be added using a simple extension mechanism. | |
69 | |
70 Unlike ``xgettext``, which is usually invoked once for every file, the routines | |
71 for message extraction in Babel operate on directories. While the per-file | |
72 approach of ``xgettext`` works nicely with projects using a ``Makefile``, | |
73 Python projects rarely use ``make``, and thus a different mechanism is needed | |
74 for extracting messages from the heterogeneous collection of source files that | |
75 many Python projects are composed of. | |
76 | |
77 When message extraction is based on directories instead of individual files, | |
78 there needs to be a way to configure which files should be treated in which | |
79 manner. For example, while many projects may contain ``.html`` files, some of | |
80 those files may be static HTML files that don't contain localizable message, | |
81 while others may be `Django`_ templates, and still others may contain `Genshi`_ | |
82 markup templates. Some projects may even mix HTML files for different templates | |
83 languages (for whatever reason). Therefore the way in which messages are | |
84 extracted from source files can not only depend on the file extension, but | |
85 needs to be controllable in a precise manner. | |
86 | |
87 .. _`Django`: http://www.djangoproject.com/ | |
88 .. _`Genshi`: http://genshi.edgewall.org/ | |
89 | |
90 Babel accepts a configuration file to specify this mapping of files to | |
91 extraction methods, which is described below. | |
92 | |
93 | |
94 .. _`frontends`: | |
95 | |
96 ---------- | |
97 Front-Ends | |
98 ---------- | |
99 | |
100 Babel provides two different front-ends to access its functionality for working | |
101 with message catalogs: | |
102 | |
103 * A `Command-line interface <cmdline.html>`_, and | |
104 * `Integration with distutils/setuptools <setup.html>`_ | |
105 | |
106 Which one you choose depends on the nature of your project. For most modern | |
107 Python projects, the distutils/setuptools integration is probably more | |
108 convenient. | |
109 | |
110 | |
111 .. _`mapping`: | |
112 | |
113 ------------------------------------------- | |
114 Extraction Method Mapping and Configuration | |
115 ------------------------------------------- | |
116 | |
117 The mapping of extraction methods to files in Babel is done via a configuration | |
118 file. This file maps extended glob patterns to the names of the extraction | |
119 methods, and can also set various options for each pattern (which options are | |
120 available depends on the specific extraction method). | |
121 | |
122 For example, the following configuration adds extraction of messages from both | |
123 Genshi markup templates and text templates: | |
124 | |
125 .. code-block:: ini | |
126 | |
127 # Extraction from Python source files | |
128 | |
129 [python: **.py] | |
130 | |
131 # Extraction from Genshi HTML and text templates | |
132 | |
133 [genshi: **/templates/**.html] | |
134 ignore_tags = script,style | |
135 include_attrs = alt title summary | |
136 | |
137 [genshi: **/templates/**.txt] | |
138 template_class = genshi.template:TextTemplate | |
139 encoding = ISO-8819-15 | |
140 | |
141 The configuration file syntax is based on the format commonly found in ``.INI`` | |
142 files on Windows systems, and as supported by the ``ConfigParser`` module in | |
143 the Python standard library. Section names (the strings enclosed in square | |
144 brackets) specify both the name of the extraction method, and the extended glob | |
145 pattern to specify the files that this extraction method should be used for, | |
146 separated by a colon. The options in the sections are passed to the extraction | |
147 method. Which options are available is specific to the extraction method used. | |
148 | |
149 The extended glob patterns used in this configuration are similar to the glob | |
150 patterns provided by most shells. A single asterisk (``*``) is a wildcard for | |
151 any number of characters (except for the pathname component separator "/"), | |
152 while a question mark (``?``) only matches a single character. In addition, | |
153 two subsequent asterisk characters (``**``) can be used to make the wildcard | |
154 match any directory level, so the pattern ``**.txt`` matches any file with the | |
155 extension ``.txt`` in any directory. | |
156 | |
157 Lines that start with a ``#`` or ``;`` character are ignored and can be used | |
158 for comments. Empty lines are ignored, too. | |
159 | |
160 .. note:: if you're performing message extraction using the command Babel | |
161 provides for integration into ``setup.py`` scripts, you can also | |
162 provide this configuration in a different way, namely as a keyword | |
163 argument to the ``setup()`` function. See `Distutils/Setuptools | |
164 Integration`_ for more information. | |
165 | |
166 .. _`distutils/setuptools integration`: setup.html | |
167 | |
168 | |
169 Default Extraction Methods | |
170 -------------------------- | |
171 | |
172 Babel comes with only two builtin extractors: ``python`` (which extracts | |
173 messages from Python source files) and ``ignore`` (which extracts nothing). | |
174 | |
175 The ``python`` extractor is by default mapped to the glob pattern ``**.py``, | |
176 meaning it'll be applied to all files with the ``.py`` extension in any | |
177 directory. If you specify your own mapping configuration, this default mapping | |
178 is not discarded, so you need to explicitly add it to your mapping (as shown in | |
179 the example above.) | |
180 | |
181 | |
182 .. _`referencing extraction methods`: | |
183 | |
184 Referencing Extraction Methods | |
185 ------------------------------ | |
186 | |
187 To be able to use short extraction method names such as “genshi”, you need to | |
188 have `pkg_resources`_ installed, and the package implementing that extraction | |
189 method needs to have been installed with its meta data (the `egg-info`_). | |
190 | |
191 If this is not possible for some reason, you need to map the short names to | |
192 fully qualified function names in an extract section in the mapping | |
193 configuration. For example: | |
194 | |
195 .. code-block:: ini | |
196 | |
197 # Some custom extraction method | |
198 | |
199 [extractors] | |
200 custom = mypackage.module:extract_custom | |
201 | |
202 [custom: **.ctm] | |
203 some_option = foo | |
204 | |
205 Note that the builtin extraction methods ``python`` and ``ignore`` are available | |
206 by default, even if `pkg_resources`_ is not installed. You should never need to | |
207 explicitly define them in the ``[extractors]`` section. | |
208 | |
209 .. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs | |
210 .. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources | |
211 | |
212 | |
213 -------------------------- | |
214 Writing Extraction Methods | |
215 -------------------------- | |
216 | |
217 Adding new methods for extracting localizable methods is easy. First, you'll | |
218 need to implement a function that complies with the following interface: | |
219 | |
220 .. code-block:: python | |
221 | |
222 def extract_xxx(fileobj, keywords, comment_tags, options): | |
223 """Extract messages from XXX files. | |
224 | |
225 :param fileobj: the file-like object the messages should be extracted | |
226 from | |
227 :param keywords: a list of keywords (i.e. function names) that should | |
228 be recognized as translation functions | |
229 :param comment_tags: a list of translator tags to search for and | |
230 include in the results | |
231 :param options: a dictionary of additional options (optional) | |
232 :return: an iterator over ``(lineno, funcname, message, comments)`` | |
233 tuples | |
234 :rtype: ``iterator`` | |
235 """ | |
236 | |
237 .. note:: Any strings in the tuples produced by this function must be either | |
238 ``unicode`` objects, or ``str`` objects using plain ASCII characters. | |
239 That means that if sources contain strings using other encodings, it | |
240 is the job of the extractor implementation to do the decoding to | |
241 ``unicode`` objects. | |
242 | |
243 Next, you should register that function as an entry point. This requires your | |
244 ``setup.py`` script to use `setuptools`_, and your package to be installed with | |
245 the necessary metadata. If that's taken care of, add something like the | |
246 following to your ``setup.py`` script: | |
247 | |
248 .. code-block:: python | |
249 | |
250 def setup(... | |
251 | |
252 entry_points = """ | |
253 [babel.extractors] | |
254 xxx = your.package:extract_xxx | |
255 """, | |
256 | |
257 That is, add your extraction method to the entry point group | |
258 ``babel.extractors``, where the name of the entry point is the name that people | |
259 will use to reference the extraction method, and the value being the module and | |
260 the name of the function (separated by a colon) implementing the actual | |
261 extraction. | |
262 | |
263 .. note:: As shown in `Referencing Extraction Methods`_, declaring an entry | |
264 point is not strictly required, as users can still reference the | |
265 extraction function directly. But whenever possible, the entry point | |
266 should be declared to make configuration more convenient. | |
267 | |
268 .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools | |
269 | |
270 | |
271 ------------------- | |
272 Translator Comments | |
273 ------------------- | |
274 | |
275 First of all what are comments tags. Comments tags are excerpts of text to | |
276 search for in comments, only comments, right before the `python gettext`_ | |
277 calls, as shown on the following example: | |
278 | |
279 .. _`python gettext`: http://docs.python.org/lib/module-gettext.html | |
280 | |
281 .. code-block:: python | |
282 | |
283 # NOTE: This is a comment about `Foo Bar` | |
284 _('Foo Bar') | |
285 | |
286 The comments tag for the above example would be ``NOTE:``, and the translator | |
287 comment for that tag would be ``This is a comment about `Foo Bar```. | |
288 | |
289 The resulting output in the catalog template would be something like:: | |
290 | |
291 #. This is a comment about `Foo Bar` | |
292 #: main.py:2 | |
293 msgid "Foo Bar" | |
294 msgstr "" | |
295 | |
296 Now, you might ask, why would I need that? | |
297 | |
298 Consider this simple case; you have a menu item called “manual”. You know what | |
299 it means, but when the translator sees this they will wonder did you mean: | |
300 | |
301 1. a document or help manual, or | |
302 2. a manual process? | |
303 | |
304 This is the simplest case where a translation comment such as | |
305 “The installation manual” helps to clarify the situation and makes a translator | |
306 more productive. | |
307 | |
308 .. note:: Whether translator comments can be extracted depends on the extraction | |
309 method in use. The Python extractor provided by Babel does implement | |
310 this feature, but others may not. |