comparison 0.9.x/doc/messages.txt @ 263:5b7d3f9f7d74 stable

Create branch for 0.9.x maintenance.
author cmlenz
date Mon, 20 Aug 2007 08:34:32 +0000
parents
children cec8c26302bd
comparison
equal deleted inserted replaced
197:79565db4faf0 263:5b7d3f9f7d74
1 .. -*- mode: rst; encoding: utf-8 -*-
2
3 =============================
4 Working with Message Catalogs
5 =============================
6
7 .. contents:: Contents
8 :depth: 3
9 .. sectnum::
10
11
12 Introduction
13 ============
14
15 The ``gettext`` translation system enables you to mark any strings used in your
16 application as subject to localization, by wrapping them in functions such as
17 ``gettext(str)`` and ``ngettext(singular, plural, num)``. For brevity, the
18 ``gettext`` function is often aliased to ``_(str)``, so you can write:
19
20 .. code-block:: python
21
22 print _("Hello")
23
24 instead of just:
25
26 .. code-block:: python
27
28 print "Hello"
29
30 to make the string "Hello" localizable.
31
32 Message catalogs are collections of translations for such localizable messages
33 used in an application. They are commonly stored in PO (Portable Object) and MO
34 (Machine Object) files, the formats of which are defined by the GNU `gettext`_
35 tools and the GNU `translation project`_.
36
37 .. _`gettext`: http://www.gnu.org/software/gettext/
38 .. _`translation project`: http://sourceforge.net/projects/translation
39
40 The general procedure for building message catalogs looks something like this:
41
42 * use a tool (such as ``xgettext``) to extract localizable strings from the
43 code base and write them to a POT (PO Template) file.
44 * make a copy of the POT file for a specific locale (for example, "en_US")
45 and start translating the messages
46 * use a tool such as ``msgfmt`` to compile the locale PO file into an binary
47 MO file
48 * later, when code changes make it necessary to update the translations, you
49 regenerate the POT file and merge the changes into the various
50 locale-specific PO files, for example using ``msgmerge``
51
52 Python provides the `gettext module`_ as part of the standard library, which
53 enables applications to work with appropriately generated MO files.
54
55 .. _`gettext module`: http://docs.python.org/lib/module-gettext.html
56
57 As ``gettext`` provides a solid and well supported foundation for translating
58 application messages, Babel does not reinvent the wheel, but rather reuses this
59 infrastructure, and makes it easier to build message catalogs for Python
60 applications.
61
62
63 Message Extraction
64 ==================
65
66 Babel provides functionality similar to that of the ``xgettext`` program,
67 except that only extraction from Python source files is built-in, while support
68 for other file formats can be added using a simple extension mechanism.
69
70 Unlike ``xgettext``, which is usually invoked once for every file, the routines
71 for message extraction in Babel operate on directories. While the per-file
72 approach of ``xgettext`` works nicely with projects using a ``Makefile``,
73 Python projects rarely use ``make``, and thus a different mechanism is needed
74 for extracting messages from the heterogeneous collection of source files that
75 many Python projects are composed of.
76
77 When message extraction is based on directories instead of individual files,
78 there needs to be a way to configure which files should be treated in which
79 manner. For example, while many projects may contain ``.html`` files, some of
80 those files may be static HTML files that don't contain localizable message,
81 while others may be `Django`_ templates, and still others may contain `Genshi`_
82 markup templates. Some projects may even mix HTML files for different templates
83 languages (for whatever reason). Therefore the way in which messages are
84 extracted from source files can not only depend on the file extension, but
85 needs to be controllable in a precise manner.
86
87 .. _`Django`: http://www.djangoproject.com/
88 .. _`Genshi`: http://genshi.edgewall.org/
89
90 Babel accepts a configuration file to specify this mapping of files to
91 extraction methods, which is described below.
92
93
94 .. _`frontends`:
95
96 ----------
97 Front-Ends
98 ----------
99
100 Babel provides two different front-ends to access its functionality for working
101 with message catalogs:
102
103 * A `Command-line interface <cmdline.html>`_, and
104 * `Integration with distutils/setuptools <setup.html>`_
105
106 Which one you choose depends on the nature of your project. For most modern
107 Python projects, the distutils/setuptools integration is probably more
108 convenient.
109
110
111 .. _`mapping`:
112
113 -------------------------------------------
114 Extraction Method Mapping and Configuration
115 -------------------------------------------
116
117 The mapping of extraction methods to files in Babel is done via a configuration
118 file. This file maps extended glob patterns to the names of the extraction
119 methods, and can also set various options for each pattern (which options are
120 available depends on the specific extraction method).
121
122 For example, the following configuration adds extraction of messages from both
123 Genshi markup templates and text templates:
124
125 .. code-block:: ini
126
127 # Extraction from Python source files
128
129 [python: **.py]
130
131 # Extraction from Genshi HTML and text templates
132
133 [genshi: **/templates/**.html]
134 ignore_tags = script,style
135 include_attrs = alt title summary
136
137 [genshi: **/templates/**.txt]
138 template_class = genshi.template:TextTemplate
139 encoding = ISO-8819-15
140
141 The configuration file syntax is based on the format commonly found in ``.INI``
142 files on Windows systems, and as supported by the ``ConfigParser`` module in
143 the Python standard library. Section names (the strings enclosed in square
144 brackets) specify both the name of the extraction method, and the extended glob
145 pattern to specify the files that this extraction method should be used for,
146 separated by a colon. The options in the sections are passed to the extraction
147 method. Which options are available is specific to the extraction method used.
148
149 The extended glob patterns used in this configuration are similar to the glob
150 patterns provided by most shells. A single asterisk (``*``) is a wildcard for
151 any number of characters (except for the pathname component separator "/"),
152 while a question mark (``?``) only matches a single character. In addition,
153 two subsequent asterisk characters (``**``) can be used to make the wildcard
154 match any directory level, so the pattern ``**.txt`` matches any file with the
155 extension ``.txt`` in any directory.
156
157 Lines that start with a ``#`` or ``;`` character are ignored and can be used
158 for comments. Empty lines are ignored, too.
159
160 .. note:: if you're performing message extraction using the command Babel
161 provides for integration into ``setup.py`` scripts, you can also
162 provide this configuration in a different way, namely as a keyword
163 argument to the ``setup()`` function. See `Distutils/Setuptools
164 Integration`_ for more information.
165
166 .. _`distutils/setuptools integration`: setup.html
167
168
169 Default Extraction Methods
170 --------------------------
171
172 Babel comes with only two builtin extractors: ``python`` (which extracts
173 messages from Python source files) and ``ignore`` (which extracts nothing).
174
175 The ``python`` extractor is by default mapped to the glob pattern ``**.py``,
176 meaning it'll be applied to all files with the ``.py`` extension in any
177 directory. If you specify your own mapping configuration, this default mapping
178 is not discarded, so you need to explicitly add it to your mapping (as shown in
179 the example above.)
180
181
182 .. _`referencing extraction methods`:
183
184 Referencing Extraction Methods
185 ------------------------------
186
187 To be able to use short extraction method names such as “genshi”, you need to
188 have `pkg_resources`_ installed, and the package implementing that extraction
189 method needs to have been installed with its meta data (the `egg-info`_).
190
191 If this is not possible for some reason, you need to map the short names to
192 fully qualified function names in an extract section in the mapping
193 configuration. For example:
194
195 .. code-block:: ini
196
197 # Some custom extraction method
198
199 [extractors]
200 custom = mypackage.module:extract_custom
201
202 [custom: **.ctm]
203 some_option = foo
204
205 Note that the builtin extraction methods ``python`` and ``ignore`` are available
206 by default, even if `pkg_resources`_ is not installed. You should never need to
207 explicitly define them in the ``[extractors]`` section.
208
209 .. _`egg-info`: http://peak.telecommunity.com/DevCenter/PythonEggs
210 .. _`pkg_resources`: http://peak.telecommunity.com/DevCenter/PkgResources
211
212
213 --------------------------
214 Writing Extraction Methods
215 --------------------------
216
217 Adding new methods for extracting localizable methods is easy. First, you'll
218 need to implement a function that complies with the following interface:
219
220 .. code-block:: python
221
222 def extract_xxx(fileobj, keywords, comment_tags, options):
223 """Extract messages from XXX files.
224
225 :param fileobj: the file-like object the messages should be extracted
226 from
227 :param keywords: a list of keywords (i.e. function names) that should
228 be recognized as translation functions
229 :param comment_tags: a list of translator tags to search for and
230 include in the results
231 :param options: a dictionary of additional options (optional)
232 :return: an iterator over ``(lineno, funcname, message, comments)``
233 tuples
234 :rtype: ``iterator``
235 """
236
237 .. note:: Any strings in the tuples produced by this function must be either
238 ``unicode`` objects, or ``str`` objects using plain ASCII characters.
239 That means that if sources contain strings using other encodings, it
240 is the job of the extractor implementation to do the decoding to
241 ``unicode`` objects.
242
243 Next, you should register that function as an entry point. This requires your
244 ``setup.py`` script to use `setuptools`_, and your package to be installed with
245 the necessary metadata. If that's taken care of, add something like the
246 following to your ``setup.py`` script:
247
248 .. code-block:: python
249
250 def setup(...
251
252 entry_points = """
253 [babel.extractors]
254 xxx = your.package:extract_xxx
255 """,
256
257 That is, add your extraction method to the entry point group
258 ``babel.extractors``, where the name of the entry point is the name that people
259 will use to reference the extraction method, and the value being the module and
260 the name of the function (separated by a colon) implementing the actual
261 extraction.
262
263 .. note:: As shown in `Referencing Extraction Methods`_, declaring an entry
264 point is not strictly required, as users can still reference the
265 extraction function directly. But whenever possible, the entry point
266 should be declared to make configuration more convenient.
267
268 .. _`setuptools`: http://peak.telecommunity.com/DevCenter/setuptools
269
270
271 -------------------
272 Translator Comments
273 -------------------
274
275 First of all what are comments tags. Comments tags are excerpts of text to
276 search for in comments, only comments, right before the `python gettext`_
277 calls, as shown on the following example:
278
279 .. _`python gettext`: http://docs.python.org/lib/module-gettext.html
280
281 .. code-block:: python
282
283 # NOTE: This is a comment about `Foo Bar`
284 _('Foo Bar')
285
286 The comments tag for the above example would be ``NOTE:``, and the translator
287 comment for that tag would be ``This is a comment about `Foo Bar```.
288
289 The resulting output in the catalog template would be something like::
290
291 #. This is a comment about `Foo Bar`
292 #: main.py:2
293 msgid "Foo Bar"
294 msgstr ""
295
296 Now, you might ask, why would I need that?
297
298 Consider this simple case; you have a menu item called “manual”. You know what
299 it means, but when the translator sees this they will wonder did you mean:
300
301 1. a document or help manual, or
302 2. a manual process?
303
304 This is the simplest case where a translation comment such as
305 “The installation manual” helps to clarify the situation and makes a translator
306 more productive.
307
308 .. note:: Whether translator comments can be extracted depends on the extraction
309 method in use. The Python extractor provided by Babel does implement
310 this feature, but others may not.
Copyright (C) 2012-2017 Edgewall Software