Mercurial > babel > mirror
annotate babel/messages/extract.py @ 593:99983baf1067 trunk
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
author | fschwarz |
---|---|
date | Thu, 09 Aug 2012 11:20:25 +0000 |
parents | 1b801a0cb2cb |
children |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
530 | 3 # Copyright (C) 2007-2011 Edgewall Software |
1 | 4 # All rights reserved. |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
8 # are also available at http://babel.edgewall.org/wiki/License. | |
9 # | |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
12 # history and logs, available at http://babel.edgewall.org/log/. | |
13 | |
14 """Basic infrastructure for extracting localizable messages from source files. | |
15 | |
16 This module defines an extensible system for collecting localizable message | |
17 strings from a variety of sources. A native extractor for Python source files | |
18 is builtin, extractors for other sources can be added using very simple plugins. | |
19 | |
20 The main entry points into the extraction functionality are the functions | |
21 `extract_from_dir` and `extract_from_file`. | |
22 """ | |
23 | |
24 import os | |
25 import sys | |
162 | 26 from tokenize import generate_tokens, COMMENT, NAME, OP, STRING |
1 | 27 |
525
2baa2cedd6f9
Cleanup round #1: get rid of the frozenset/set utility code and imports.
jruigrok
parents:
426
diff
changeset
|
28 from babel.util import parse_encoding, pathmatch, relpath |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
29 from textwrap import dedent |
1 | 30 |
31 __all__ = ['extract', 'extract_from_dir', 'extract_from_file'] | |
32 __docformat__ = 'restructuredtext en' | |
33 | |
34 GROUP_NAME = 'babel.extractors' | |
35 | |
12
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
36 DEFAULT_KEYWORDS = { |
10
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
37 '_': None, |
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
38 'gettext': None, |
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
39 'ngettext': (1, 2), |
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
40 'ugettext': None, |
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
41 'ungettext': (1, 2), |
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
42 'dgettext': (2,), |
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
43 'dngettext': (2, 3), |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
44 'N_': None, |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
45 'pgettext': ((1, 'c'), 2) |
10
4130d9c6cb34
Both Babel's [source:trunk/babel/catalog/frontend.py frontend] and [source:trunk/babel/catalog/extract.py extract] now handle keyword indices. Also added an extra boolean flag so that the default keywords defined by Babel are not included in the keywords to search for when extracting strings.
palgarvio
parents:
1
diff
changeset
|
46 } |
1 | 47 |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
48 DEFAULT_MAPPING = [('**.py', 'python')] |
1 | 49 |
222 | 50 empty_msgid_warning = ( |
51 '%s: warning: Empty msgid. It is reserved by GNU gettext: gettext("") ' | |
52 'returns the header entry with meta information, not the empty string.') | |
53 | |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
54 |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
55 def _strip_comment_tags(comments, tags): |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
56 """Helper function for `extract` that strips comment tags from strings |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
57 in a list of comment lines. This functions operates in-place. |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
58 """ |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
59 def _strip(line): |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
60 for tag in tags: |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
61 if line.startswith(tag): |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
62 return line[len(tag):].strip() |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
63 return line |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
64 comments[:] = map(_strip, comments) |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
65 |
340
ce83b4f77114
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
66 |
47
f8469ab4b257
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
67 def extract_from_dir(dirname=os.getcwd(), method_map=DEFAULT_MAPPING, |
f8469ab4b257
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
68 options_map=None, keywords=DEFAULT_KEYWORDS, |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
69 comment_tags=(), callback=None, strip_comment_tags=False): |
1 | 70 """Extract messages from any source files found in the given directory. |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
71 |
1 | 72 This function generates tuples of the form: |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
73 |
82
540bb484f6e0
Missed some param's documentation regarding translator comments.
palgarvio
parents:
81
diff
changeset
|
74 ``(filename, lineno, message, comments)`` |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
75 |
44 | 76 Which extraction method is used per file is determined by the `method_map` |
77 parameter, which maps extended glob patterns to extraction method names. | |
78 For example, the following is the default mapping: | |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
79 |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
80 >>> method_map = [ |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
81 ... ('**.py', 'python') |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
82 ... ] |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
83 |
1 | 84 This basically says that files with the filename extension ".py" at any |
85 level inside the directory should be processed by the "python" extraction | |
44 | 86 method. Files that don't match any of the mapping patterns are ignored. See |
87 the documentation of the `pathmatch` function for details on the pattern | |
88 syntax. | |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
89 |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
90 The following extended mapping would also use the "genshi" extraction |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
91 method on any file in "templates" subdirectory: |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
92 |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
93 >>> method_map = [ |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
94 ... ('**/templates/**.*', 'genshi'), |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
95 ... ('**.py', 'python') |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
96 ... ] |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
97 |
44 | 98 The dictionary provided by the optional `options_map` parameter augments |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
99 these mappings. It uses extended glob patterns as keys, and the values are |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
100 dictionaries mapping options names to option values (both strings). |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
101 |
44 | 102 The glob patterns of the `options_map` do not necessarily need to be the |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
103 same as those used in the method mapping. For example, while all files in |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
104 the ``templates`` folders in an application may be Genshi applications, the |
44 | 105 options for those files may differ based on extension: |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
106 |
44 | 107 >>> options_map = { |
108 ... '**/templates/**.txt': { | |
144 | 109 ... 'template_class': 'genshi.template:TextTemplate', |
44 | 110 ... 'encoding': 'latin-1' |
111 ... }, | |
112 ... '**/templates/**.html': { | |
113 ... 'include_attrs': '' | |
114 ... } | |
1 | 115 ... } |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
116 |
1 | 117 :param dirname: the path to the directory to extract messages from |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
118 :param method_map: a list of ``(pattern, method)`` tuples that maps of |
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
119 extraction method names to extended glob patterns |
44 | 120 :param options_map: a dictionary of additional options (optional) |
12
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
121 :param keywords: a dictionary mapping keywords (i.e. names of functions |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
122 that should be recognized as translation functions) to |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
123 tuples that specify which of their arguments contain |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
124 localizable strings |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
125 :param comment_tags: a list of tags of translator comments to search for |
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
126 and include in the results |
47
f8469ab4b257
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
127 :param callback: a function that is called for every file that message are |
f8469ab4b257
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
128 extracted from, just before the extraction itself is |
75 | 129 performed; the function is passed the filename, the name |
130 of the extraction method and and the options dictionary as | |
131 positional arguments, in that order | |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
132 :param strip_comment_tags: a flag that if set to `True` causes all comment |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
133 tags to be removed from the collected comments. |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
134 :return: an iterator over ``(filename, lineno, funcname, message, context)`` |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
135 tuples |
1 | 136 :rtype: ``iterator`` |
44 | 137 :see: `pathmatch` |
1 | 138 """ |
44 | 139 if options_map is None: |
140 options_map = {} | |
56
f40fc143439c
Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents:
54
diff
changeset
|
141 |
44 | 142 absname = os.path.abspath(dirname) |
143 for root, dirnames, filenames in os.walk(absname): | |
144 for subdir in dirnames: | |
145 if subdir.startswith('.') or subdir.startswith('_'): | |
146 dirnames.remove(subdir) | |
154
31478eb3fb9e
The default ordering of messages in generated POT files, which is based on the order those messages are found when walking the source tree, is no longer subject to differences between platforms; directory and file names are now always sorted alphabetically.
cmlenz
parents:
147
diff
changeset
|
147 dirnames.sort() |
31478eb3fb9e
The default ordering of messages in generated POT files, which is based on the order those messages are found when walking the source tree, is no longer subject to differences between platforms; directory and file names are now always sorted alphabetically.
cmlenz
parents:
147
diff
changeset
|
148 filenames.sort() |
44 | 149 for filename in filenames: |
150 filename = relpath( | |
151 os.path.join(root, filename).replace(os.sep, '/'), | |
152 dirname | |
153 ) | |
62
2df27f49c320
The order of extraction methods is now preserved (see #10).
cmlenz
parents:
57
diff
changeset
|
154 for pattern, method in method_map: |
44 | 155 if pathmatch(pattern, filename): |
156 filepath = os.path.join(absname, filename) | |
157 options = {} | |
158 for opattern, odict in options_map.items(): | |
159 if pathmatch(opattern, filename): | |
160 options = odict | |
47
f8469ab4b257
Support passing extraction method mapping and options from the frontends (see #4). No distutils/setuptools keyword supported yet, but the rest seems to be working okay.
cmlenz
parents:
44
diff
changeset
|
161 if callback: |
57
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
162 callback(filename, method, options) |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
163 for lineno, message, comments, context in \ |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
164 extract_from_file(method, filepath, |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
165 keywords=keywords, |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
166 comment_tags=comment_tags, |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
167 options=options, |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
168 strip_comment_tags= |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
169 strip_comment_tags): |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
170 yield filename, lineno, message, comments, context |
57
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
171 break |
1 | 172 |
340
ce83b4f77114
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
173 |
12
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
174 def extract_from_file(method, filename, keywords=DEFAULT_KEYWORDS, |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
175 comment_tags=(), options=None, strip_comment_tags=False): |
1 | 176 """Extract messages from a specific file. |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
177 |
1 | 178 This function returns a list of tuples of the form: |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
179 |
1 | 180 ``(lineno, funcname, message)`` |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
181 |
1 | 182 :param filename: the path to the file to extract messages from |
183 :param method: a string specifying the extraction method (.e.g. "python") | |
12
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
184 :param keywords: a dictionary mapping keywords (i.e. names of functions |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
185 that should be recognized as translation functions) to |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
186 tuples that specify which of their arguments contain |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
187 localizable strings |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
188 :param comment_tags: a list of translator tags to search for and include |
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
189 in the results |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
190 :param strip_comment_tags: a flag that if set to `True` causes all comment |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
191 tags to be removed from the collected comments. |
1 | 192 :param options: a dictionary of additional options (optional) |
193 :return: the list of extracted messages | |
194 :rtype: `list` | |
195 """ | |
196 fileobj = open(filename, 'U') | |
197 try: | |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
198 return list(extract(method, fileobj, keywords, comment_tags, options, |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
199 strip_comment_tags)) |
1 | 200 finally: |
201 fileobj.close() | |
202 | |
340
ce83b4f77114
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
203 |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
204 def extract(method, fileobj, keywords=DEFAULT_KEYWORDS, comment_tags=(), |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
205 options=None, strip_comment_tags=False): |
1 | 206 """Extract messages from the given file-like object using the specified |
207 extraction method. | |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
208 |
560
86524be05b60
fix docstring for babel.messages.extract() so it mentions the correct return type
fschwarz
parents:
530
diff
changeset
|
209 This function returns tuples of the form: |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
210 |
80
116e34b8cefa
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
211 ``(lineno, message, comments)`` |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
212 |
1 | 213 The implementation dispatches the actual extraction to plugins, based on the |
214 value of the ``method`` parameter. | |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
215 |
1 | 216 >>> source = '''# foo module |
217 ... def run(argv): | |
218 ... print _('Hello, world!') | |
219 ... ''' | |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
220 |
1 | 221 >>> from StringIO import StringIO |
222 >>> for message in extract('python', StringIO(source)): | |
223 ... print message | |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
224 (3, u'Hello, world!', [], None) |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
225 |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
226 :param method: a string specifying the extraction method (.e.g. "python"); |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
227 if this is a simple name, the extraction function will be |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
228 looked up by entry point; if it is an explicit reference |
329
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
229 to a function (of the form ``package.module:funcname`` or |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
230 ``package.module.funcname``), the corresponding function |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
231 will be imported and used |
1 | 232 :param fileobj: the file-like object the messages should be extracted from |
12
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
233 :param keywords: a dictionary mapping keywords (i.e. names of functions |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
234 that should be recognized as translation functions) to |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
235 tuples that specify which of their arguments contain |
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
236 localizable strings |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
237 :param comment_tags: a list of translator tags to search for and include |
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
238 in the results |
1 | 239 :param options: a dictionary of additional options (optional) |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
240 :param strip_comment_tags: a flag that if set to `True` causes all comment |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
241 tags to be removed from the collected comments. |
560
86524be05b60
fix docstring for babel.messages.extract() so it mentions the correct return type
fschwarz
parents:
530
diff
changeset
|
242 :return: an iterator over ``(lineno, message, comments)`` tuples |
86524be05b60
fix docstring for babel.messages.extract() so it mentions the correct return type
fschwarz
parents:
530
diff
changeset
|
243 :rtype: `iterator` |
1 | 244 :raise ValueError: if the extraction method is not registered |
245 """ | |
322 | 246 func = None |
329
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
247 if ':' in method or '.' in method: |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
248 if ':' not in method: |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
249 lastdot = method.rfind('.') |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
250 module, attrname = method[:lastdot], method[lastdot + 1:] |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
251 else: |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
252 module, attrname = method.split(':', 1) |
35c19c01e4b5
Allow extraction method specification to use a dot instead of the colon for separating module and function names. See #105.
cmlenz
parents:
322
diff
changeset
|
253 func = getattr(__import__(module, {}, {}, [attrname]), attrname) |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
254 else: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
255 try: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
256 from pkg_resources import working_set |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
257 except ImportError: |
593
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
258 pass |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
259 else: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
260 for entry_point in working_set.iter_entry_points(GROUP_NAME, |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
261 method): |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
262 func = entry_point.load(require=True) |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
263 break |
593
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
264 if func is None: |
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
265 # if pkg_resources is not available or no usable egg-info was found |
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
266 # (see #230), we resort to looking up the builtin extractors |
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
267 # directly |
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
268 builtin = {'ignore': extract_nothing, 'python': extract_python} |
99983baf1067
resort to hard-coded message extractors/checkers if pkg_resources is installed but no egg-info was found (#230)
fschwarz
parents:
569
diff
changeset
|
269 func = builtin.get(method) |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
270 if func is None: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
271 raise ValueError('Unknown extraction method %r' % method) |
222 | 272 |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
273 results = func(fileobj, keywords.keys(), comment_tags, |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
274 options=options or {}) |
366
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
275 |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
276 for lineno, funcname, messages, comments in results: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
277 if funcname: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
278 spec = keywords[funcname] or (1,) |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
279 else: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
280 spec = (1,) |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
281 if not isinstance(messages, (list, tuple)): |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
282 messages = [messages] |
258
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
283 if not messages: |
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
284 continue |
222 | 285 |
258
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
286 # Validate the messages against the keyword's specification |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
287 context = None |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
288 msgs = [] |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
289 invalid = False |
258
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
290 # last_index is 1 based like the keyword spec |
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
291 last_index = len(messages) |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
292 for index in spec: |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
293 if isinstance(index, tuple): |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
294 context = messages[index[0] - 1] |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
295 continue |
258
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
296 if last_index < index: |
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
297 # Not enough arguments |
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
298 invalid = True |
5ca5fbd47766
skip messages that have less arguments than the keyword spec calls for
pjenvey
parents:
250
diff
changeset
|
299 break |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
300 message = messages[index - 1] |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
301 if message is None: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
302 invalid = True |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
303 break |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
304 msgs.append(message) |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
305 if invalid: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
306 continue |
222 | 307 |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
308 # keyword spec indexes are 1 based, therefore '-1' |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
309 if isinstance(spec[0], tuple): |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
310 # context-aware *gettext method |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
311 first_msg_index = spec[1] - 1 |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
312 else: |
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
313 first_msg_index = spec[0] - 1 |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
314 if not messages[first_msg_index]: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
315 # An empty string msgid isn't valid, emit a warning |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
316 where = '%s:%i' % (hasattr(fileobj, 'name') and \ |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
317 fileobj.name or '(unknown)', lineno) |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
318 print >> sys.stderr, empty_msgid_warning % where |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
319 continue |
12
e6ba3e878b10
* Removed pkg_resources/setuptools requirement from various places.
cmlenz
parents:
10
diff
changeset
|
320 |
250
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
321 messages = tuple(msgs) |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
322 if len(messages) == 1: |
6c06570af1b9
Soften dependency on setuptools. Extraction methods can now be referenced using a special section in the mapping configuration, mapping short names to fully-qualified function references.
cmlenz
parents:
224
diff
changeset
|
323 messages = messages[0] |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
324 |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
325 if strip_comment_tags: |
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
326 _strip_comment_tags(comments, comment_tags) |
569
1b801a0cb2cb
Support for context-aware methods during message extraction (fixes #229, patch by David Rios)
fschwarz
parents:
560
diff
changeset
|
327 yield lineno, messages, comments, context |
1 | 328 |
340
ce83b4f77114
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
329 |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
330 def extract_nothing(fileobj, keywords, comment_tags, options): |
57
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
331 """Pseudo extractor that does not actually extract anything, but simply |
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
332 returns an empty list. |
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
333 """ |
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
334 return [] |
d930a3dfbf3d
* The `extract_messages` distutils command now operators on configurable input directories again, instead of the complete current directory. The `input_dirs` default to the package directories.
cmlenz
parents:
56
diff
changeset
|
335 |
340
ce83b4f77114
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
336 |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
337 def extract_python(fileobj, keywords, comment_tags, options): |
1 | 338 """Extract messages from Python source code. |
224
0a71b675fc48
Fix for message extractors which return `None` as the gettext call.
palgarvio
parents:
223
diff
changeset
|
339 |
164
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
340 :param fileobj: the seekable, file-like object the messages should be |
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
341 extracted from |
1 | 342 :param keywords: a list of keywords (i.e. function names) that should be |
343 recognized as translation functions | |
84
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
344 :param comment_tags: a list of translator tags to search for and include |
3ae316b58231
Some cosmetic changes for the new translator comments support.
cmlenz
parents:
82
diff
changeset
|
345 in the results |
1 | 346 :param options: a dictionary of additional options (optional) |
81
85af04c72ccd
Fixed and added some documentation about the translator comments implemented in [81].
palgarvio
parents:
80
diff
changeset
|
347 :return: an iterator over ``(lineno, funcname, message, comments)`` tuples |
1 | 348 :rtype: ``iterator`` |
349 """ | |
222 | 350 funcname = lineno = message_lineno = None |
351 call_stack = -1 | |
1 | 352 buf = [] |
353 messages = [] | |
80
116e34b8cefa
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
354 translator_comments = [] |
222 | 355 in_def = in_translator_comments = False |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
356 comment_tag = None |
1 | 357 |
222 | 358 encoding = parse_encoding(fileobj) or options.get('encoding', 'iso-8859-1') |
164
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
359 |
1 | 360 tokens = generate_tokens(fileobj.readline) |
361 for tok, value, (lineno, _), _, _ in tokens: | |
222 | 362 if call_stack == -1 and tok == NAME and value in ('def', 'class'): |
363 in_def = True | |
364 elif tok == OP and value == '(': | |
365 if in_def: | |
366 # Avoid false positives for declarations such as: | |
367 # def gettext(arg='message'): | |
368 in_def = False | |
369 continue | |
370 if funcname: | |
371 message_lineno = lineno | |
372 call_stack += 1 | |
223 | 373 elif in_def and tok == OP and value == ':': |
374 # End of a class definition without parens | |
375 in_def = False | |
376 continue | |
222 | 377 elif call_stack == -1 and tok == COMMENT: |
92
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
378 # Strip the comment token from the line |
164
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
379 value = value.decode(encoding)[1:].strip() |
147 | 380 if in_translator_comments and \ |
93
f008662b5d6e
Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents:
92
diff
changeset
|
381 translator_comments[-1][0] == lineno - 1: |
92
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
382 # We're already inside a translator comment, continue appending |
93
f008662b5d6e
Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents:
92
diff
changeset
|
383 translator_comments.append((lineno, value)) |
92
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
384 continue |
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
385 # If execution reaches this point, let's see if comment line |
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
386 # starts with one of the comment tags |
85
04a2f16bdd04
Fixed de-pluralization bug introduced in [85] regarding the extraction of translator comments.
palgarvio
parents:
84
diff
changeset
|
387 for comment_tag in comment_tags: |
92
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
388 if value.startswith(comment_tag): |
147 | 389 in_translator_comments = True |
338
b39145076d8a
Stripping of comment tags is optional now. If enabled it will strip the tags from all lines of a comment now.
aronacher
parents:
329
diff
changeset
|
390 translator_comments.append((lineno, value)) |
92
ccb9da614597
Fixed bug introduced in [92], bad use of `lstrip()`. Added a unittest to test multiple translator comment tags.
palgarvio
parents:
91
diff
changeset
|
391 break |
222 | 392 elif funcname and call_stack == 0: |
1 | 393 if tok == OP and value == ')': |
394 if buf: | |
395 messages.append(''.join(buf)) | |
396 del buf[:] | |
222 | 397 else: |
398 messages.append(None) | |
93
f008662b5d6e
Commiting patch provided by pjenvey: Translator comments don't apply unless they immediately preceed the message.
palgarvio
parents:
92
diff
changeset
|
399 |
222 | 400 if len(messages) > 1: |
401 messages = tuple(messages) | |
402 else: | |
403 messages = messages[0] | |
404 # Comments don't apply unless they immediately preceed the | |
405 # message | |
406 if translator_comments and \ | |
407 translator_comments[-1][0] < message_lineno - 1: | |
408 translator_comments = [] | |
409 | |
410 yield (message_lineno, funcname, messages, | |
411 [comment[1] for comment in translator_comments]) | |
412 | |
413 funcname = lineno = message_lineno = None | |
414 call_stack = -1 | |
1 | 415 messages = [] |
80
116e34b8cefa
Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents:
75
diff
changeset
|
416 translator_comments = [] |
222 | 417 in_translator_comments = False |
1 | 418 elif tok == STRING: |
164
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
419 # Unwrap quotes in a safe manner, maintaining the string's |
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
420 # encoding |
222 | 421 # https://sourceforge.net/tracker/?func=detail&atid=355470& |
422 # aid=617979&group_id=5470 | |
164
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
423 value = eval('# coding=%s\n%s' % (encoding, value), |
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
424 {'__builtins__':{}}, {}) |
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
425 if isinstance(value, str): |
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
426 value = value.decode(encoding) |
e1199c0fb3bf
made the python extractor detect source file encodings from the magic encoding
pjenvey
parents:
162
diff
changeset
|
427 buf.append(value) |
1 | 428 elif tok == OP and value == ',': |
222 | 429 if buf: |
430 messages.append(''.join(buf)) | |
431 del buf[:] | |
432 else: | |
433 messages.append(None) | |
366
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
434 if translator_comments: |
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
435 # We have translator comments, and since we're on a |
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
436 # comma(,) user is allowed to break into a new line |
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
437 # Let's increase the last comment's lineno in order |
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
438 # for the comment to still be a valid one |
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
439 old_lineno, old_comment = translator_comments.pop() |
6abe384584c8
Test and respective fix for gettext calls that spawn multiple lines. Fixes #119.
palgarvio
parents:
343
diff
changeset
|
440 translator_comments.append((old_lineno+1, old_comment)) |
222 | 441 elif call_stack > 0 and tok == OP and value == ')': |
442 call_stack -= 1 | |
443 elif funcname and call_stack == -1: | |
1 | 444 funcname = None |
445 elif tok == NAME and value in keywords: | |
446 funcname = value | |
339 | 447 |
340
ce83b4f77114
added some newlines to extract and jslexer to stay consistent with the rest of the sourcecode.
aronacher
parents:
339
diff
changeset
|
448 |
339 | 449 def extract_javascript(fileobj, keywords, comment_tags, options): |
450 """Extract messages from JavaScript source code. | |
451 | |
452 :param fileobj: the seekable, file-like object the messages should be | |
453 extracted from | |
454 :param keywords: a list of keywords (i.e. function names) that should be | |
455 recognized as translation functions | |
456 :param comment_tags: a list of translator tags to search for and include | |
457 in the results | |
458 :param options: a dictionary of additional options (optional) | |
459 :return: an iterator over ``(lineno, funcname, message, comments)`` tuples | |
460 :rtype: ``iterator`` | |
461 """ | |
462 from babel.messages.jslexer import tokenize, unquote_string | |
463 funcname = message_lineno = None | |
464 messages = [] | |
465 last_argument = None | |
466 translator_comments = [] | |
405
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
467 concatenate_next = False |
339 | 468 encoding = options.get('encoding', 'utf-8') |
469 last_token = None | |
470 call_stack = -1 | |
471 | |
472 for token in tokenize(fileobj.read().decode(encoding)): | |
473 if token.type == 'operator' and token.value == '(': | |
474 if funcname: | |
475 message_lineno = token.lineno | |
476 call_stack += 1 | |
477 | |
478 elif call_stack == -1 and token.type == 'linecomment': | |
479 value = token.value[2:].strip() | |
480 if translator_comments and \ | |
481 translator_comments[-1][0] == token.lineno - 1: | |
482 translator_comments.append((token.lineno, value)) | |
483 continue | |
484 | |
485 for comment_tag in comment_tags: | |
486 if value.startswith(comment_tag): | |
487 translator_comments.append((token.lineno, value.strip())) | |
488 break | |
489 | |
490 elif token.type == 'multilinecomment': | |
491 # only one multi-line comment may preceed a translation | |
492 translator_comments = [] | |
493 value = token.value[2:-2].strip() | |
494 for comment_tag in comment_tags: | |
495 if value.startswith(comment_tag): | |
496 lines = value.splitlines() | |
497 if lines: | |
498 lines[0] = lines[0].strip() | |
499 lines[1:] = dedent('\n'.join(lines[1:])).splitlines() | |
500 for offset, line in enumerate(lines): | |
501 translator_comments.append((token.lineno + offset, | |
502 line)) | |
503 break | |
504 | |
505 elif funcname and call_stack == 0: | |
506 if token.type == 'operator' and token.value == ')': | |
507 if last_argument is not None: | |
508 messages.append(last_argument) | |
509 if len(messages) > 1: | |
510 messages = tuple(messages) | |
511 elif messages: | |
512 messages = messages[0] | |
513 else: | |
514 messages = None | |
515 | |
426 | 516 # Comments don't apply unless they immediately precede the |
339 | 517 # message |
518 if translator_comments and \ | |
519 translator_comments[-1][0] < message_lineno - 1: | |
520 translator_comments = [] | |
521 | |
522 if messages is not None: | |
523 yield (message_lineno, funcname, messages, | |
524 [comment[1] for comment in translator_comments]) | |
525 | |
526 funcname = message_lineno = last_argument = None | |
405
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
527 concatenate_next = False |
339 | 528 translator_comments = [] |
529 messages = [] | |
530 call_stack = -1 | |
531 | |
532 elif token.type == 'string': | |
405
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
533 new_value = unquote_string(token.value) |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
534 if concatenate_next: |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
535 last_argument = (last_argument or '') + new_value |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
536 concatenate_next = False |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
537 else: |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
538 last_argument = new_value |
339 | 539 |
405
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
540 elif token.type == 'operator': |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
541 if token.value == ',': |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
542 if last_argument is not None: |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
543 messages.append(last_argument) |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
544 last_argument = None |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
545 else: |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
546 messages.append(None) |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
547 concatenate_next = False |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
548 elif token.value == '+': |
0686aa8e7e36
Added support for string concatenation to javascript lexer. _("foo" + "bar") is now equivalent to _("foobar")
aronacher
parents:
366
diff
changeset
|
549 concatenate_next = True |
339 | 550 |
551 elif call_stack > 0 and token.type == 'operator' \ | |
552 and token.value == ')': | |
553 call_stack -= 1 | |
554 | |
555 elif funcname and call_stack == -1: | |
556 funcname = None | |
557 | |
558 elif call_stack == -1 and token.type == 'name' and \ | |
559 token.value in keywords and \ | |
560 (last_token is None or last_token.type != 'name' or | |
561 last_token.value != 'function'): | |
562 funcname = token.value | |
563 | |
564 last_token = token |