annotate babel/messages/extract.py @ 179:31beb381d62f trunk

added 'N_' (gettext noop) to the extractor's default keywords fixes #25
author pjenvey
date Wed, 27 Jun 2007 22:43:26 +0000
parents e1199c0fb3bf
children 4052570f109d
rev   line source
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
2 #
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
3 # Copyright (C) 2007 Edgewall Software
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
4 # All rights reserved.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
5 #
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
8 # are also available at http://babel.edgewall.org/wiki/License.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
9 #
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
12 # history and logs, available at http://babel.edgewall.org/log/.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
13
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
14 """Basic infrastructure for extracting localizable messages from source files.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
15
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
16 This module defines an extensible system for collecting localizable message
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
17 strings from a variety of sources. A native extractor for Python source files
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
18 is builtin, extractors for other sources can be added using very simple plugins.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
19
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
20 The main entry points into the extraction functionality are the functions
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
21 `extract_from_dir` and `extract_from_file`.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
22 """
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
23
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
24 import os
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
25 try:
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
26 set
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
27 except NameError:
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
28 from sets import Set as set
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
29 import sys
162
32be08ab2440 alphabetize imports
pjenvey
parents: 154
diff changeset
30 from tokenize import generate_tokens, COMMENT, NAME, OP, STRING
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
31
164
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
32 from babel.util import parse_encoding, pathmatch, relpath
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
33
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
34 __all__ = ['extract', 'extract_from_dir', 'extract_from_file']
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
35 __docformat__ = 'restructuredtext en'
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
36
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
37 GROUP_NAME = 'babel.extractors'
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
38
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
39 DEFAULT_KEYWORDS = {
10
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
40 '_': None,
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
41 'gettext': None,
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
42 'ngettext': (1, 2),
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
43 'ugettext': None,
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
44 'ungettext': (1, 2),
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
45 'dgettext': (2,),
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
46 'dngettext': (2, 3),
179
31beb381d62f added 'N_' (gettext noop) to the extractor's default keywords
pjenvey
parents: 164
diff changeset
47 'N_': None
10
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
48 }
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
49
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
50 DEFAULT_MAPPING = [('**.py', 'python')]
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
51
47
f8469ab4b257 Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents: 44
diff changeset
52 def extract_from_dir(dirname=os.getcwd(), method_map=DEFAULT_MAPPING,
f8469ab4b257 Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents: 44
diff changeset
53 options_map=None, keywords=DEFAULT_KEYWORDS,
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
54 comment_tags=(), callback=None):
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
55 """Extract messages from any source files found in the given directory.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
56
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
57 This function generates tuples of the form:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
58
82
540bb484f6e0 Missed some param's documentation regarding translator comments.
palgarvio
parents: 81
diff changeset
59 ``(filename, lineno, message, comments)``
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
60
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
61 Which extraction method is used per file is determined by the `method_map`
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
62 parameter, which maps extended glob patterns to extraction method names.
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
63 For example, the following is the default mapping:
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
64
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
65 >>> method_map = [
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
66 ... ('**.py', 'python')
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
67 ... ]
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
68
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
69 This basically says that files with the filename extension ".py" at any
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
70 level inside the directory should be processed by the "python" extraction
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
71 method. Files that don't match any of the mapping patterns are ignored. See
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
72 the documentation of the `pathmatch` function for details on the pattern
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
73 syntax.
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
74
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
75 The following extended mapping would also use the "genshi" extraction
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
76 method on any file in "templates" subdirectory:
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
77
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
78 >>> method_map = [
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
79 ... ('**/templates/**.*', 'genshi'),
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
80 ... ('**.py', 'python')
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
81 ... ]
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
82
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
83 The dictionary provided by the optional `options_map` parameter augments
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
84 these mappings. It uses extended glob patterns as keys, and the values are
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
85 dictionaries mapping options names to option values (both strings).
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
86
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
87 The glob patterns of the `options_map` do not necessarily need to be the
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
88 same as those used in the method mapping. For example, while all files in
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
89 the ``templates`` folders in an application may be Genshi applications, the
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
90 options for those files may differ based on extension:
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
91
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
92 >>> options_map = {
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
93 ... '**/templates/**.txt': {
144
14fe2a8fb842 Some doc fixes.
cmlenz
parents: 138
diff changeset
94 ... 'template_class': 'genshi.template:TextTemplate',
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
95 ... 'encoding': 'latin-1'
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
96 ... },
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
97 ... '**/templates/**.html': {
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
98 ... 'include_attrs': ''
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
99 ... }
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
100 ... }
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
101
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
102 :param dirname: the path to the directory to extract messages from
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
103 :param method_map: a list of ``(pattern, method)`` tuples that maps of
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
104 extraction method names to extended glob patterns
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
105 :param options_map: a dictionary of additional options (optional)
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
106 :param keywords: a dictionary mapping keywords (i.e. names of functions
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
107 that should be recognized as translation functions) to
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
108 tuples that specify which of their arguments contain
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
109 localizable strings
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
110 :param comment_tags: a list of tags of translator comments to search for
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
111 and include in the results
47
f8469ab4b257 Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents: 44
diff changeset
112 :param callback: a function that is called for every file that message are
f8469ab4b257 Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents: 44
diff changeset
113 extracted from, just before the extraction itself is
75
0f74337264ce Fixed MIME type of new doc page.
cmlenz
parents: 62
diff changeset
114 performed; the function is passed the filename, the name
0f74337264ce Fixed MIME type of new doc page.
cmlenz
parents: 62
diff changeset
115 of the extraction method and and the options dictionary as
0f74337264ce Fixed MIME type of new doc page.
cmlenz
parents: 62
diff changeset
116 positional arguments, in that order
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
117 :return: an iterator over ``(filename, lineno, funcname, message)`` tuples
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
118 :rtype: ``iterator``
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
119 :see: `pathmatch`
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
120 """
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
121 if options_map is None:
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
122 options_map = {}
56
f40fc143439c Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 54
diff changeset
123
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
124 absname = os.path.abspath(dirname)
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
125 for root, dirnames, filenames in os.walk(absname):
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
126 for subdir in dirnames:
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
127 if subdir.startswith('.') or subdir.startswith('_'):
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
128 dirnames.remove(subdir)
154
31478eb3fb9e The default ordering of messages in generated POT files, which is based on the order those messages are found when walking the source tree, is no longer subject to differences between platforms; directory and file names are now always sorted alphabetically.
cmlenz
parents: 147
diff changeset
129 dirnames.sort()
31478eb3fb9e The default ordering of messages in generated POT files, which is based on the order those messages are found when walking the source tree, is no longer subject to differences between platforms; directory and file names are now always sorted alphabetically.
cmlenz
parents: 147
diff changeset
130 filenames.sort()
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
131 for filename in filenames:
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
132 filename = relpath(
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
133 os.path.join(root, filename).replace(os.sep, '/'),
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
134 dirname
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
135 )
62
2df27f49c320 The order of extraction methods is now preserved (see #10).
cmlenz
parents: 57
diff changeset
136 for pattern, method in method_map:
44
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
137 if pathmatch(pattern, filename):
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
138 filepath = os.path.join(absname, filename)
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
139 options = {}
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
140 for opattern, odict in options_map.items():
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
141 if pathmatch(opattern, filename):
a524b547ea7e Some work towards #4.
cmlenz
parents: 36
diff changeset
142 options = odict
47
f8469ab4b257 Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents: 44
diff changeset
143 if callback:
57
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
144 callback(filename, method, options)
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
145 for lineno, message, comments in \
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
146 extract_from_file(method, filepath,
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
147 keywords=keywords,
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
148 comment_tags=comment_tags,
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
149 options=options):
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
150 yield filename, lineno, message, comments
57
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
151 break
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
152
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
153 def extract_from_file(method, filename, keywords=DEFAULT_KEYWORDS,
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
154 comment_tags=(), options=None):
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
155 """Extract messages from a specific file.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
156
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
157 This function returns a list of tuples of the form:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
158
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
159 ``(lineno, funcname, message)``
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
160
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
161 :param filename: the path to the file to extract messages from
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
162 :param method: a string specifying the extraction method (.e.g. "python")
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
163 :param keywords: a dictionary mapping keywords (i.e. names of functions
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
164 that should be recognized as translation functions) to
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
165 tuples that specify which of their arguments contain
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
166 localizable strings
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
167 :param comment_tags: a list of translator tags to search for and include
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
168 in the results
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
169 :param options: a dictionary of additional options (optional)
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
170 :return: the list of extracted messages
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
171 :rtype: `list`
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
172 """
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
173 fileobj = open(filename, 'U')
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
174 try:
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
175 return list(extract(method, fileobj, keywords, comment_tags, options))
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
176 finally:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
177 fileobj.close()
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
178
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
179 def extract(method, fileobj, keywords=DEFAULT_KEYWORDS, comment_tags=(),
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
180 options=None):
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
181 """Extract messages from the given file-like object using the specified
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
182 extraction method.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
183
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
184 This function returns a list of tuples of the form:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
185
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
186 ``(lineno, message, comments)``
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
187
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
188 The implementation dispatches the actual extraction to plugins, based on the
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
189 value of the ``method`` parameter.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
190
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
191 >>> source = '''# foo module
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
192 ... def run(argv):
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
193 ... print _('Hello, world!')
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
194 ... '''
10
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
195
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
196 >>> from StringIO import StringIO
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
197 >>> for message in extract('python', StringIO(source)):
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
198 ... print message
164
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
199 (3, u'Hello, world!', [])
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
200
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
201 :param method: a string specifying the extraction method (.e.g. "python")
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
202 :param fileobj: the file-like object the messages should be extracted from
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
203 :param keywords: a dictionary mapping keywords (i.e. names of functions
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
204 that should be recognized as translation functions) to
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
205 tuples that specify which of their arguments contain
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
206 localizable strings
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
207 :param comment_tags: a list of translator tags to search for and include
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
208 in the results
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
209 :param options: a dictionary of additional options (optional)
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
210 :return: the list of extracted messages
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
211 :rtype: `list`
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
212 :raise ValueError: if the extraction method is not registered
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
213 """
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
214 from pkg_resources import working_set
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
215
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
216 for entry_point in working_set.iter_entry_points(GROUP_NAME, method):
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
217 func = entry_point.load(require=True)
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
218 results = func(fileobj, keywords.keys(), comment_tags,
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
219 options=options or {})
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
220 for lineno, funcname, messages, comments in results:
10
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
221 if isinstance(messages, (list, tuple)):
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
222 msgs = []
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
223 for index in keywords[funcname]:
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
224 msgs.append(messages[index - 1])
10
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
225 messages = tuple(msgs)
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
226 if len(messages) == 1:
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
227 messages = messages[0]
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
228 yield lineno, messages, comments
10
4130d9c6cb34 Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents: 1
diff changeset
229 return
12
e6ba3e878b10 * Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents: 10
diff changeset
230
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
231 raise ValueError('Unknown extraction method %r' % method)
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
232
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
233 def extract_nothing(fileobj, keywords, comment_tags, options):
57
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
234 """Pseudo extractor that does not actually extract anything, but simply
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
235 returns an empty list.
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
236 """
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
237 return []
d930a3dfbf3d * The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents: 56
diff changeset
238
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
239 def extract_python(fileobj, keywords, comment_tags, options):
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
240 """Extract messages from Python source code.
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
241
164
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
242 :param fileobj: the seekable, file-like object the messages should be
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
243 extracted from
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
244 :param keywords: a list of keywords (i.e. function names) that should be
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
245 recognized as translation functions
84
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
246 :param comment_tags: a list of translator tags to search for and include
3ae316b58231 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
247 in the results
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
248 :param options: a dictionary of additional options (optional)
81
85af04c72ccd Fixed and added some documentation about the translator comments implemented in [81].
palgarvio
parents: 80
diff changeset
249 :return: an iterator over ``(lineno, funcname, message, comments)`` tuples
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
250 :rtype: ``iterator``
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
251 """
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
252 funcname = None
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
253 lineno = None
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
254 buf = []
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
255 messages = []
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
256 translator_comments = []
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
257 in_args = False
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
258 in_translator_comments = False
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
259
164
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
260 encoding = parse_encoding(fileobj) or options.get('encoding', 'ascii')
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
261
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
262 tokens = generate_tokens(fileobj.readline)
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
263 for tok, value, (lineno, _), _, _ in tokens:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
264 if funcname and tok == OP and value == '(':
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
265 in_args = True
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
266 elif tok == COMMENT:
92
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
267 # Strip the comment token from the line
164
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
268 value = value.decode(encoding)[1:].strip()
147
63a93d33511a simplify
pjenvey
parents: 144
diff changeset
269 if in_translator_comments and \
93
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
270 translator_comments[-1][0] == lineno - 1:
92
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
271 # We're already inside a translator comment, continue appending
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
272 # XXX: Should we check if the programmer keeps adding the
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
273 # comment_tag for every comment line??? probably not!
93
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
274 translator_comments.append((lineno, value))
92
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
275 continue
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
276 # If execution reaches this point, let's see if comment line
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
277 # starts with one of the comment tags
85
04a2f16bdd04 Fixed de-pluralization bug introduced in [85] regarding the extraction of translator comments.
palgarvio
parents: 84
diff changeset
278 for comment_tag in comment_tags:
92
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
279 if value.startswith(comment_tag):
147
63a93d33511a simplify
pjenvey
parents: 144
diff changeset
280 in_translator_comments = True
92
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
281 comment = value[len(comment_tag):].strip()
93
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
282 translator_comments.append((lineno, comment))
92
ccb9da614597 Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents: 91
diff changeset
283 break
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
284 elif funcname and in_args:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
285 if tok == OP and value == ')':
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
286 in_args = in_translator_comments = False
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
287 if buf:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
288 messages.append(''.join(buf))
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
289 del buf[:]
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
290 if filter(None, messages):
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
291 if len(messages) > 1:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
292 messages = tuple(messages)
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
293 else:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
294 messages = messages[0]
93
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
295 # Comments don't apply unless they immediately preceed the
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
296 # message
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
297 if translator_comments and \
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
298 translator_comments[-1][0] < lineno - 1:
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
299 translator_comments = []
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
300
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
301 yield (lineno, funcname, messages,
f008662b5d6e Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents: 92
diff changeset
302 [comment[1] for comment in translator_comments])
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
303 funcname = lineno = None
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
304 messages = []
80
116e34b8cefa Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 75
diff changeset
305 translator_comments = []
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
306 elif tok == STRING:
164
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
307 # Unwrap quotes in a safe manner, maintaining the string's
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
308 # encoding
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
309 # https://sourceforge.net/tracker/?func=detail&atid=355470&aid=617979&group_id=5470
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
310 value = eval('# coding=%s\n%s' % (encoding, value),
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
311 {'__builtins__':{}}, {})
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
312 if isinstance(value, str):
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
313 value = value.decode(encoding)
e1199c0fb3bf made the python extractor detect source file encodings from the magic encoding
pjenvey
parents: 162
diff changeset
314 buf.append(value)
1
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
315 elif tok == OP and value == ',':
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
316 messages.append(''.join(buf))
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
317 del buf[:]
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
318 elif funcname:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
319 funcname = None
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
320 elif tok == NAME and value in keywords:
7870274479f5 Import of initial code base.
cmlenz
parents:
diff changeset
321 funcname = value
Copyright (C) 2012-2017 Edgewall Software