Mercurial > babel > old > babel-test
annotate babel/messages/extract.py @ 527:540cbe76f413
Get rid of the utility code for itemgetter(), we now simply import this
from operator.
author | jruigrok |
---|---|
date | Sat, 05 Mar 2011 15:06:28 +0000 |
parents | eef19ada4296 |
children | 85e1beadacb0 |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
3 # Copyright (C) 2007 Edgewall Software | |
4 # All rights reserved. | |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
8 # are also available at http://babel.edgewall.org/wiki/License. | |
9 # | |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
12 # history and logs, available at http://babel.edgewall.org/log/. | |
13 | |
14 """Basic infrastructure for extracting localizable messages from source files. | |
15 | |
16 This module defines an extensible system for collecting localizable message | |
17 strings from a variety of sources. A native extractor for Python source files | |
18 is builtin, extractors for other sources can be added using very simple plugins. | |
19 | |
20 The main entry points into the extraction functionality are the functions | |
21 `extract_from_dir` and `extract_from_file`. | |
22 """ | |
23 | |
24 import os | |
25 import sys | |
162 | 26 from tokenize import generate_tokens, COMMENT, NAME, OP, STRING |
1 | 27 |
525
eef19ada4296
Cleanup round #1: get rid of the frozenset/set utility code and imports.
jruigrok
parents:
426
diff
changeset
|
28 from babel.util import parse_encoding, pathmatch, relpath |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
29 from textwrap import dedent |
1 | 30 |
31 __all__ = ['extract', 'extract_from_dir', 'extract_from_file'] | |
32 __docformat__ = 'restructuredtext en' | |
33 | |
34 GROUP_NAME = 'babel.extractors' | |
35 | |
12
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
36 DEFAULT_KEYWORDS = { |
10
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
37 '_': None, |
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
38 'gettext': None, |
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
39 'ngettext': (1, 2), |
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
40 'ugettext': None, |
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
41 'ungettext': (1, 2), |
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
42 'dgettext': (2,), |
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
43 'dngettext': (2, 3), |
179
10f9fd9e730b
added 'N_' (gettext noop) to the extractor's default keywords
pjenvey
parents:
164
diff
changeset
|
44 'N_': None |
10
b24987f7318d
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
45 } |
1 | 46 |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
47 DEFAULT_MAPPING = [('**.py', 'python')] |
1 | 48 |
222 | 49 empty_msgid_warning = ( |
50 '%s: warning: Empty msgid. It is reserved by GNU gettext: gettext("") ' | |
51 'returns the header entry with meta information, not the empty string.') | |
52 | |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
53 |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
54 def _strip_comment_tags(comments, tags): |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
55 """Helper function for `extract` that strips comment tags from strings |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
56 in a list of comment lines. This functions operates in-place. |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
57 """ |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
58 def _strip(line): |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
59 for tag in tags: |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
60 if line.startswith(tag): |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
61 return line[len(tag):].strip() |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
62 return line |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
63 comments[:] = map(_strip, comments) |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
64 |
340
f7269b43236d
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
65 |
47
76381d4b3635
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
66 def extract_from_dir(dirname=os.getcwd(), method_map=DEFAULT_MAPPING, |
76381d4b3635
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
67 options_map=None, keywords=DEFAULT_KEYWORDS, |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
68 comment_tags=(), callback=None, strip_comment_tags=False): |
1 | 69 """Extract messages from any source files found in the given directory. |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
70 |
1 | 71 This function generates tuples of the form: |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
72 |
82
0b5d604399b8
Missed some param's documentation regarding translator comments.
palgarvio
parents:
81
diff
changeset
|
73 ``(filename, lineno, message, comments)`` |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
74 |
44 | 75 Which extraction method is used per file is determined by the `method_map` |
76 parameter, which maps extended glob patterns to extraction method names. | |
77 For example, the following is the default mapping: | |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
78 |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
79 >>> method_map = [ |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
80 ... ('**.py', 'python') |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
81 ... ] |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
82 |
1 | 83 This basically says that files with the filename extension ".py" at any |
84 level inside the directory should be processed by the "python" extraction | |
44 | 85 method. Files that don't match any of the mapping patterns are ignored. See |
86 the documentation of the `pathmatch` function for details on the pattern | |
87 syntax. | |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
88 |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
89 The following extended mapping would also use the "genshi" extraction |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
90 method on any file in "templates" subdirectory: |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
91 |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
92 >>> method_map = [ |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
93 ... ('**/templates/**.*', 'genshi'), |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
94 ... ('**.py', 'python') |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
95 ... ] |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
96 |
44 | 97 The dictionary provided by the optional `options_map` parameter augments |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
98 these mappings. It uses extended glob patterns as keys, and the values are |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
99 dictionaries mapping options names to option values (both strings). |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
100 |
44 | 101 The glob patterns of the `options_map` do not necessarily need to be the |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
102 same as those used in the method mapping. For example, while all files in |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
103 the ``templates`` folders in an application may be Genshi applications, the |
44 | 104 options for those files may differ based on extension: |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
105 |
44 | 106 >>> options_map = { |
107 ... '**/templates/**.txt': { | |
144 | 108 ... 'template_class': 'genshi.template:TextTemplate', |
44 | 109 ... 'encoding': 'latin-1' |
110 ... }, | |
111 ... '**/templates/**.html': { | |
112 ... 'include_attrs': '' | |
113 ... } | |
1 | 114 ... } |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
115 |
1 | 116 :param dirname: the path to the directory to extract messages from |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
117 :param method_map: a list of ``(pattern, method)`` tuples that maps of |
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
118 extraction method names to extended glob patterns |
44 | 119 :param options_map: a dictionary of additional options (optional) |
12
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
120 :param keywords: a dictionary mapping keywords (i.e. names of functions |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
121 that should be recognized as translation functions) to |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
122 tuples that specify which of their arguments contain |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
123 localizable strings |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
124 :param comment_tags: a list of tags of translator comments to search for |
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
125 and include in the results |
47
76381d4b3635
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
126 :param callback: a function that is called for every file that message are |
76381d4b3635
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
127 extracted from, just before the extraction itself is |
75 | 128 performed; the function is passed the filename, the name |
129 of the extraction method and and the options dictionary as | |
130 positional arguments, in that order | |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
131 :param strip_comment_tags: a flag that if set to `True` causes all comment |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
132 tags to be removed from the collected comments. |
1 | 133 :return: an iterator over ``(filename, lineno, funcname, message)`` tuples |
134 :rtype: ``iterator`` | |
44 | 135 :see: `pathmatch` |
1 | 136 """ |
44 | 137 if options_map is None: |
138 options_map = {} | |
56
27fba894d3ca
Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents:
54
diff
changeset
|
139 |
44 | 140 absname = os.path.abspath(dirname) |
141 for root, dirnames, filenames in os.walk(absname): | |
142 for subdir in dirnames: | |
143 if subdir.startswith('.') or subdir.startswith('_'): | |
144 dirnames.remove(subdir) | |
154
4d2117dfd7f5
The default ordering of messages in generated POT files, which is based on the order those messages are found when walking the source tree, is no longer subject to differences between platforms; directory and file names are now always sorted alphabetically.
cmlenz
parents:
147
diff
changeset
|
145 dirnames.sort() |
4d2117dfd7f5
The default ordering of messages in generated POT files, which is based on the order those messages are found when walking the source tree, is no longer subject to differences between platforms; directory and file names are now always sorted alphabetically.
cmlenz
parents:
147
diff
changeset
|
146 filenames.sort() |
44 | 147 for filename in filenames: |
148 filename = relpath( | |
149 os.path.join(root, filename).replace(os.sep, '/'), | |
150 dirname | |
151 ) | |
62
84d400066b71
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
152 for pattern, method in method_map: |
44 | 153 if pathmatch(pattern, filename): |
154 filepath = os.path.join(absname, filename) | |
155 options = {} | |
156 for opattern, odict in options_map.items(): | |
157 if pathmatch(opattern, filename): | |
158 options = odict | |
47
76381d4b3635
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
159 if callback: |
57
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
160 callback(filename, method, options) |
80
9c84b9fa5d30
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
161 for lineno, message, comments in \ |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
162 extract_from_file(method, filepath, |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
163 keywords=keywords, |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
164 comment_tags=comment_tags, |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
165 options=options, |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
166 strip_comment_tags= |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
167 strip_comment_tags): |
80
9c84b9fa5d30
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
168 yield filename, lineno, message, comments |
57
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
169 break |
1 | 170 |
340
f7269b43236d
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
171 |
12
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
172 def extract_from_file(method, filename, keywords=DEFAULT_KEYWORDS, |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
173 comment_tags=(), options=None, strip_comment_tags=False): |
1 | 174 """Extract messages from a specific file. |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
175 |
1 | 176 This function returns a list of tuples of the form: |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
177 |
1 | 178 ``(lineno, funcname, message)`` |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
179 |
1 | 180 :param filename: the path to the file to extract messages from |
181 :param method: a string specifying the extraction method (.e.g. "python") | |
12
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
182 :param keywords: a dictionary mapping keywords (i.e. names of functions |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
183 that should be recognized as translation functions) to |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
184 tuples that specify which of their arguments contain |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
185 localizable strings |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
186 :param comment_tags: a list of translator tags to search for and include |
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
187 in the results |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
188 :param strip_comment_tags: a flag that if set to `True` causes all comment |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
189 tags to be removed from the collected comments. |
1 | 190 :param options: a dictionary of additional options (optional) |
191 :return: the list of extracted messages | |
192 :rtype: `list` | |
193 """ | |
194 fileobj = open(filename, 'U') | |
195 try: | |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
196 return list(extract(method, fileobj, keywords, comment_tags, options, |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
197 strip_comment_tags)) |
1 | 198 finally: |
199 fileobj.close() | |
200 | |
340
f7269b43236d
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
201 |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
202 def extract(method, fileobj, keywords=DEFAULT_KEYWORDS, comment_tags=(), |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
203 options=None, strip_comment_tags=False): |
1 | 204 """Extract messages from the given file-like object using the specified |
205 extraction method. | |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
206 |
1 | 207 This function returns a list of tuples of the form: |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
208 |
80
9c84b9fa5d30
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
209 ``(lineno, message, comments)`` |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
210 |
1 | 211 The implementation dispatches the actual extraction to plugins, based on the |
212 value of the ``method`` parameter. | |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
213 |
1 | 214 >>> source = '''# foo module |
215 ... def run(argv): | |
216 ... print _('Hello, world!') | |
217 ... ''' | |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
218 |
1 | 219 >>> from StringIO import StringIO |
220 >>> for message in extract('python', StringIO(source)): | |
221 ... print message | |
164
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
222 (3, u'Hello, world!', []) |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
223 |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
224 :param method: a string specifying the extraction method (.e.g. "python"); |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
225 if this is a simple name, the extraction function will be |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
226 looked up by entry point; if it is an explicit reference |
329
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
227 to a function (of the form ``package.module:funcname`` or |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
228 ``package.module.funcname``), the corresponding function |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
229 will be imported and used |
1 | 230 :param fileobj: the file-like object the messages should be extracted from |
12
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
231 :param keywords: a dictionary mapping keywords (i.e. names of functions |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
232 that should be recognized as translation functions) to |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
233 tuples that specify which of their arguments contain |
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
234 localizable strings |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
235 :param comment_tags: a list of translator tags to search for and include |
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
236 in the results |
1 | 237 :param options: a dictionary of additional options (optional) |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
238 :param strip_comment_tags: a flag that if set to `True` causes all comment |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
239 tags to be removed from the collected comments. |
1 | 240 :return: the list of extracted messages |
241 :rtype: `list` | |
242 :raise ValueError: if the extraction method is not registered | |
243 """ | |
322 | 244 func = None |
329
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
245 if ':' in method or '.' in method: |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
246 if ':' not in method: |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
247 lastdot = method.rfind('.') |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
248 module, attrname = method[:lastdot], method[lastdot + 1:] |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
249 else: |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
250 module, attrname = method.split(':', 1) |
37dcc25f0f9a
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
251 func = getattr(__import__(module, {}, {}, [attrname]), attrname) |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
252 else: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
253 try: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
254 from pkg_resources import working_set |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
255 except ImportError: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
256 # pkg_resources is not available, so we resort to looking up the |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
257 # builtin extractors directly |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
258 builtin = {'ignore': extract_nothing, 'python': extract_python} |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
259 func = builtin.get(method) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
260 else: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
261 for entry_point in working_set.iter_entry_points(GROUP_NAME, |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
262 method): |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
263 func = entry_point.load(require=True) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
264 break |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
265 if func is None: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
266 raise ValueError('Unknown extraction method %r' % method) |
222 | 267 |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
268 results = func(fileobj, keywords.keys(), comment_tags, |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
269 options=options or {}) |
366
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
270 |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
271 for lineno, funcname, messages, comments in results: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
272 if funcname: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
273 spec = keywords[funcname] or (1,) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
274 else: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
275 spec = (1,) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
276 if not isinstance(messages, (list, tuple)): |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
277 messages = [messages] |
258
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
278 if not messages: |
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
279 continue |
222 | 280 |
258
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
281 # Validate the messages against the keyword's specification |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
282 msgs = [] |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
283 invalid = False |
258
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
284 # last_index is 1 based like the keyword spec |
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
285 last_index = len(messages) |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
286 for index in spec: |
258
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
287 if last_index < index: |
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
288 # Not enough arguments |
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
289 invalid = True |
f2cd4422d2cf
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
290 break |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
291 message = messages[index - 1] |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
292 if message is None: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
293 invalid = True |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
294 break |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
295 msgs.append(message) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
296 if invalid: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
297 continue |
222 | 298 |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
299 first_msg_index = spec[0] - 1 |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
300 if not messages[first_msg_index]: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
301 # An empty string msgid isn't valid, emit a warning |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
302 where = '%s:%i' % (hasattr(fileobj, 'name') and \ |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
303 fileobj.name or '(unknown)', lineno) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
304 print >> sys.stderr, empty_msgid_warning % where |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
305 continue |
12
a2c54ef107c2
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
306 |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
307 messages = tuple(msgs) |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
308 if len(messages) == 1: |
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
309 messages = messages[0] |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
310 |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
311 if strip_comment_tags: |
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
312 _strip_comment_tags(comments, comment_tags) |
250
cd7e378b8190
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
313 yield lineno, messages, comments |
1 | 314 |
340
f7269b43236d
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
315 |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
316 def extract_nothing(fileobj, keywords, comment_tags, options): |
57
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
317 """Pseudo extractor that does not actually extract anything, but simply |
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
318 returns an empty list. |
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
319 """ |
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
320 return [] |
a6183d300a6e
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
321 |
340
f7269b43236d
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
322 |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
323 def extract_python(fileobj, keywords, comment_tags, options): |
1 | 324 """Extract messages from Python source code. |
224
7b9c20c81c07
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
325 |
164
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
326 :param fileobj: the seekable, file-like object the messages should be |
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
327 extracted from |
1 | 328 :param keywords: a list of keywords (i.e. function names) that should be |
329 recognized as translation functions | |
84
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
330 :param comment_tags: a list of translator tags to search for and include |
4ff9cc26c11b
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
331 in the results |
1 | 332 :param options: a dictionary of additional options (optional) |
81
1e89661e6b26
Fixed and added some documentation about the translator comments implemented in [81].
palgarvio
parents:
80
diff
changeset
|
333 :return: an iterator over ``(lineno, funcname, message, comments)`` tuples |
1 | 334 :rtype: ``iterator`` |
335 """ | |
222 | 336 funcname = lineno = message_lineno = None |
337 call_stack = -1 | |
1 | 338 buf = [] |
339 messages = [] | |
80
9c84b9fa5d30
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
340 translator_comments = [] |
222 | 341 in_def = in_translator_comments = False |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
342 comment_tag = None |
1 | 343 |
222 | 344 encoding = parse_encoding(fileobj) or options.get('encoding', 'iso-8859-1') |
164
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
345 |
1 | 346 tokens = generate_tokens(fileobj.readline) |
347 for tok, value, (lineno, _), _, _ in tokens: | |
222 | 348 if call_stack == -1 and tok == NAME and value in ('def', 'class'): |
349 in_def = True | |
350 elif tok == OP and value == '(': | |
351 if in_def: | |
352 # Avoid false positives for declarations such as: | |
353 # def gettext(arg='message'): | |
354 in_def = False | |
355 continue | |
356 if funcname: | |
357 message_lineno = lineno | |
358 call_stack += 1 | |
223 | 359 elif in_def and tok == OP and value == ':': |
360 # End of a class definition without parens | |
361 in_def = False | |
362 continue | |
222 | 363 elif call_stack == -1 and tok == COMMENT: |
92
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
364 # Strip the comment token from the line |
164
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
365 value = value.decode(encoding)[1:].strip() |
147 | 366 if in_translator_comments and \ |
93
1ce6692ed625
Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents:
92
diff
changeset
|
367 translator_comments[-1][0] == lineno - 1: |
92
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
368 # We're already inside a translator comment, continue appending |
93
1ce6692ed625
Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents:
92
diff
changeset
|
369 translator_comments.append((lineno, value)) |
92
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
370 continue |
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
371 # If execution reaches this point, let's see if comment line |
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
372 # starts with one of the comment tags |
85
dc260efaed34
Fixed de-pluralization bug introduced in [85] regarding the extraction of translator comments.
palgarvio
parents:
84
diff
changeset
|
373 for comment_tag in comment_tags: |
92
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
374 if value.startswith(comment_tag): |
147 | 375 in_translator_comments = True |
338
6fe060286ff0
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
376 translator_comments.append((lineno, value)) |
92
5bac3678e60d
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
377 break |
222 | 378 elif funcname and call_stack == 0: |
1 | 379 if tok == OP and value == ')': |
380 if buf: | |
381 messages.append(''.join(buf)) | |
382 del buf[:] | |
222 | 383 else: |
384 messages.append(None) | |
93
1ce6692ed625
Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents:
92
diff
changeset
|
385 |
222 | 386 if len(messages) > 1: |
387 messages = tuple(messages) | |
388 else: | |
389 messages = messages[0] | |
390 # Comments don't apply unless they immediately preceed the | |
391 # message | |
392 if translator_comments and \ | |
393 translator_comments[-1][0] < message_lineno - 1: | |
394 translator_comments = [] | |
395 | |
396 yield (message_lineno, funcname, messages, | |
397 [comment[1] for comment in translator_comments]) | |
398 | |
399 funcname = lineno = message_lineno = None | |
400 call_stack = -1 | |
1 | 401 messages = [] |
80
9c84b9fa5d30
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
402 translator_comments = [] |
222 | 403 in_translator_comments = False |
1 | 404 elif tok == STRING: |
164
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
405 # Unwrap quotes in a safe manner, maintaining the string's |
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
406 # encoding |
222 | 407 # https://sourceforge.net/tracker/?func=detail&atid=355470& |
408 # aid=617979&group_id=5470 | |
164
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
409 value = eval('# coding=%s\n%s' % (encoding, value), |
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
410 {'__builtins__':{}}, {}) |
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
411 if isinstance(value, str): |
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
412 value = value.decode(encoding) |
84a9e5f97658
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
413 buf.append(value) |
1 | 414 elif tok == OP and value == ',': |
222 | 415 if buf: |
416 messages.append(''.join(buf)) | |
417 del buf[:] | |
418 else: | |
419 messages.append(None) | |
366
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
420 if translator_comments: |
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
421 # We have translator comments, and since we're on a |
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
422 # comma(,) user is allowed to break into a new line |
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
423 # Let's increase the last comment's lineno in order |
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
424 # for the comment to still be a valid one |
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
425 old_lineno, old_comment = translator_comments.pop() |
64dc3f943d3b
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
426 translator_comments.append((old_lineno+1, old_comment)) |
222 | 427 elif call_stack > 0 and tok == OP and value == ')': |
428 call_stack -= 1 | |
429 elif funcname and call_stack == -1: | |
1 | 430 funcname = None |
431 elif tok == NAME and value in keywords: | |
432 funcname = value | |
339 | 433 |
340
f7269b43236d
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
434 |
339 | 435 def extract_javascript(fileobj, keywords, comment_tags, options): |
436 """Extract messages from JavaScript source code. | |
437 | |
438 :param fileobj: the seekable, file-like object the messages should be | |
439 extracted from | |
440 :param keywords: a list of keywords (i.e. function names) that should be | |
441 recognized as translation functions | |
442 :param comment_tags: a list of translator tags to search for and include | |
443 in the results | |
444 :param options: a dictionary of additional options (optional) | |
445 :return: an iterator over ``(lineno, funcname, message, comments)`` tuples | |
446 :rtype: ``iterator`` | |
447 """ | |
448 from babel.messages.jslexer import tokenize, unquote_string | |
449 funcname = message_lineno = None | |
450 messages = [] | |
451 last_argument = None | |
452 translator_comments = [] | |
405
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
453 concatenate_next = False |
339 | 454 encoding = options.get('encoding', 'utf-8') |
455 last_token = None | |
456 call_stack = -1 | |
457 | |
458 for token in tokenize(fileobj.read().decode(encoding)): | |
459 if token.type == 'operator' and token.value == '(': | |
460 if funcname: | |
461 message_lineno = token.lineno | |
462 call_stack += 1 | |
463 | |
464 elif call_stack == -1 and token.type == 'linecomment': | |
465 value = token.value[2:].strip() | |
466 if translator_comments and \ | |
467 translator_comments[-1][0] == token.lineno - 1: | |
468 translator_comments.append((token.lineno, value)) | |
469 continue | |
470 | |
471 for comment_tag in comment_tags: | |
472 if value.startswith(comment_tag): | |
473 translator_comments.append((token.lineno, value.strip())) | |
474 break | |
475 | |
476 elif token.type == 'multilinecomment': | |
477 # only one multi-line comment may preceed a translation | |
478 translator_comments = [] | |
479 value = token.value[2:-2].strip() | |
480 for comment_tag in comment_tags: | |
481 if value.startswith(comment_tag): | |
482 lines = value.splitlines() | |
483 if lines: | |
484 lines[0] = lines[0].strip() | |
485 lines[1:] = dedent('\n'.join(lines[1:])).splitlines() | |
486 for offset, line in enumerate(lines): | |
487 translator_comments.append((token.lineno + offset, | |
488 line)) | |
489 break | |
490 | |
491 elif funcname and call_stack == 0: | |
492 if token.type == 'operator' and token.value == ')': | |
493 if last_argument is not None: | |
494 messages.append(last_argument) | |
495 if len(messages) > 1: | |
496 messages = tuple(messages) | |
497 elif messages: | |
498 messages = messages[0] | |
499 else: | |
500 messages = None | |
501 | |
426 | 502 # Comments don't apply unless they immediately precede the |
339 | 503 # message |
504 if translator_comments and \ | |
505 translator_comments[-1][0] < message_lineno - 1: | |
506 translator_comments = [] | |
507 | |
508 if messages is not None: | |
509 yield (message_lineno, funcname, messages, | |
510 [comment[1] for comment in translator_comments]) | |
511 | |
512 funcname = message_lineno = last_argument = None | |
405
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
513 concatenate_next = False |
339 | 514 translator_comments = [] |
515 messages = [] | |
516 call_stack = -1 | |
517 | |
518 elif token.type == 'string': | |
405
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
519 new_value = unquote_string(token.value) |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
520 if concatenate_next: |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
521 last_argument = (last_argument or '') + new_value |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
522 concatenate_next = False |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
523 else: |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
524 last_argument = new_value |
339 | 525 |
405
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
526 elif token.type == 'operator': |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
527 if token.value == ',': |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
528 if last_argument is not None: |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
529 messages.append(last_argument) |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
530 last_argument = None |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
531 else: |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
532 messages.append(None) |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
533 concatenate_next = False |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
534 elif token.value == '+': |
17ff4bb26dc8
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
535 concatenate_next = True |
339 | 536 |
537 elif call_stack > 0 and token.type == 'operator' \ | |
538 and token.value == ')': | |
539 call_stack -= 1 | |
540 | |
541 elif funcname and call_stack == -1: | |
542 funcname = None | |
543 | |
544 elif call_stack == -1 and token.type == 'name' and \ | |
545 token.value in keywords and \ | |
546 (last_token is None or last_token.type != 'name' or | |
547 last_token.value != 'function'): | |
548 funcname = token.value | |
549 | |
550 last_token = token |