Mercurial > genshi > genshi-test
annotate markup/path.py @ 111:8a4d9064f363
Some fixes and more unit tests for the XPath engine.
author | cmlenz |
---|---|
date | Mon, 31 Jul 2006 17:25:43 +0000 |
parents | 61fa4cadb766 |
children | 8f53c3ad385c |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
66
822089ae65ce
Switch copyright to Edgewall and URLs to markup.edgewall.org.
cmlenz
parents:
61
diff
changeset
|
3 # Copyright (C) 2006 Edgewall Software |
1 | 4 # All rights reserved. |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
66
822089ae65ce
Switch copyright to Edgewall and URLs to markup.edgewall.org.
cmlenz
parents:
61
diff
changeset
|
8 # are also available at http://markup.edgewall.org/wiki/License. |
1 | 9 # |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
66
822089ae65ce
Switch copyright to Edgewall and URLs to markup.edgewall.org.
cmlenz
parents:
61
diff
changeset
|
12 # history and logs, available at http://markup.edgewall.org/log/. |
1 | 13 |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
14 """Basic support for evaluating XPath expressions against streams. |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
15 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
16 >>> from markup.input import XML |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
17 >>> doc = XML('''<doc> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
18 ... <items count="2"> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
19 ... <item status="new"> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
20 ... <summary>Foo</summary> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
21 ... </item> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
22 ... <item status="closed"> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
23 ... <summary>Bar</summary> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
24 ... </item> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
25 ... </items> |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
26 ... </doc>''') |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
27 >>> print doc.select('items/item[@status="closed"]/summary/text()') |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
28 Bar |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
29 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
30 Because the XPath engine operates on markup streams (as opposed to tree |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
31 structures), it only implements a subset of the full XPath 1.0 language. |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
32 """ |
1 | 33 |
34 import re | |
35 | |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
36 from markup.core import QName, Stream, START, END, TEXT, COMMENT, PI |
1 | 37 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
38 __all__ = ['Path', 'PathSyntaxError'] |
1 | 39 |
40 | |
41 class Path(object): | |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
42 """Implements basic XPath support on streams. |
1 | 43 |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
44 Instances of this class represent a "compiled" XPath expression, and provide |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
45 methods for testing the path against a stream, as well as extracting a |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
46 substream matching that path. |
1 | 47 """ |
48 | |
49 def __init__(self, text): | |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
50 """Create the path object from a string. |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
51 |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
52 @param text: the path expression |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
53 """ |
1 | 54 self.source = text |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
55 self.paths = _PathParser(text).parse() |
1 | 56 |
57 def __repr__(self): | |
58 return '<%s "%s">' % (self.__class__.__name__, self.source) | |
59 | |
60 def select(self, stream): | |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
61 """Returns a substream of the given stream that matches the path. |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
62 |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
63 If there are no matches, this method returns an empty stream. |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
64 |
33 | 65 >>> from markup.input import XML |
66 >>> xml = XML('<root><elem><child>Text</child></elem></root>') | |
61 | 67 |
33 | 68 >>> print Path('child').select(xml) |
69 <child>Text</child> | |
70 | |
71 >>> print Path('child/text()').select(xml) | |
72 Text | |
73 | |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
74 @param stream: the stream to select from |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
75 @return: the substream matching the path, or an empty stream |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
76 """ |
1 | 77 stream = iter(stream) |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
78 def _generate(): |
1 | 79 test = self.test() |
80 for kind, data, pos in stream: | |
81 result = test(kind, data, pos) | |
82 if result is True: | |
83 yield kind, data, pos | |
84 depth = 1 | |
85 while depth > 0: | |
73 | 86 subkind, subdata, subpos = stream.next() |
87 if subkind is START: | |
88 depth += 1 | |
89 elif subkind is END: | |
90 depth -= 1 | |
91 yield subkind, subdata, subpos | |
92 test(subkind, subdata, subpos) | |
1 | 93 elif result: |
94 yield result | |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
95 return Stream(_generate()) |
1 | 96 |
38
fec9f4897415
Fix for #2 (incorrect context node in path expressions). Still some paths that produce incorrect results, but the common case seems to work now.
cmlenz
parents:
37
diff
changeset
|
97 def test(self, ignore_context=False): |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
98 """Returns a function that can be used to track whether the path matches |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
99 a specific stream event. |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
100 |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
101 The function returned expects the positional arguments `kind`, `data`, |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
102 and `pos`, i.e. basically an unpacked stream event. If the path matches |
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
103 the event, the function returns the match (for example, a `START` or |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
104 `TEXT` event.) Otherwise, it returns `None`. |
33 | 105 |
106 >>> from markup.input import XML | |
107 >>> xml = XML('<root><elem><child id="1"/></elem><child id="2"/></root>') | |
108 >>> test = Path('child').test() | |
109 >>> for kind, data, pos in xml: | |
110 ... if test(kind, data, pos): | |
111 ... print kind, data | |
112 START (u'child', [(u'id', u'1')]) | |
113 START (u'child', [(u'id', u'2')]) | |
26
039fc5b87405
* Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents:
25
diff
changeset
|
114 """ |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
115 paths = [(idx, steps, len(steps), [0]) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
116 for idx, steps in enumerate(self.paths)] |
1 | 117 |
118 def _test(kind, data, pos): | |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
119 for idx, steps, size, stack in paths: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
120 if not stack: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
121 continue |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
122 cursor = stack[-1] |
1 | 123 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
124 if kind is END: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
125 stack.pop() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
126 continue |
1 | 127 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
128 elif kind is START: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
129 stack.append(cursor) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
130 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
131 matched = None |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
132 while 1: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
133 axis, node_test, predicates = steps[cursor] |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
134 |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
135 matched = node_test(kind, data, pos) |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
136 if matched and predicates: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
137 for predicate in predicates: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
138 if not predicate(kind, data, pos): |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
139 matched = None |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
140 break |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
141 |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
142 if matched: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
143 if cursor + 1 == size: # the last location step |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
144 if ignore_context or \ |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
145 kind is not START or \ |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
146 axis in ('attribute', 'self') or \ |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
147 len(stack) > 2: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
148 return matched |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
149 else: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
150 cursor += 1 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
151 stack[-1] = cursor |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
152 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
153 if axis != 'self': |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
154 break |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
155 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
156 if not matched and kind is START \ |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
157 and not axis.startswith('descendant'): |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
158 # If this step is not a closure, it cannot be matched until |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
159 # the current element is closed... so we need to move the |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
160 # cursor back to the last closure and retest that against |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
161 # the current element |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
162 backsteps = [step for step in steps[:cursor] |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
163 if step[0].startswith('descendant')] |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
164 backsteps.reverse() |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
165 for axis, node_test, predicates in backsteps: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
166 matched = node_test(kind, data, pos) |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
167 if not matched: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
168 cursor -= 1 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
169 break |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
170 stack[-1] = cursor |
1 | 171 |
172 return None | |
173 | |
174 return _test | |
175 | |
176 | |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
177 def _node_test_current_element(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
178 def _node_test_current_element(kind, *_): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
179 return kind is START |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
180 _node_test_current_element.axis = 'self' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
181 return _node_test_current_element |
77
f1aa49c759b2
* Simplify implementation of the individual XPath tests (use closures instead of callable classes)
cmlenz
parents:
73
diff
changeset
|
182 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
183 def _node_test_any_child_element(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
184 def _node_test_any_child_element(kind, *_): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
185 return kind is START |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
186 _node_test_any_child_element.axis = 'child' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
187 return _node_test_any_child_element |
77
f1aa49c759b2
* Simplify implementation of the individual XPath tests (use closures instead of callable classes)
cmlenz
parents:
73
diff
changeset
|
188 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
189 def _node_test_child_element_by_name(name): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
190 def _node_test_child_element_by_name(kind, data, _): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
191 return kind is START and data[0].localname == name |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
192 _node_test_child_element_by_name.axis = 'child' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
193 return _node_test_child_element_by_name |
77
f1aa49c759b2
* Simplify implementation of the individual XPath tests (use closures instead of callable classes)
cmlenz
parents:
73
diff
changeset
|
194 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
195 def _node_test_any_attribute(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
196 def _node_test_any_attribute(kind, data, _): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
197 if kind is START and data[1]: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
198 return data[1] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
199 _node_test_any_attribute.axis = 'attribute' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
200 return _node_test_any_attribute |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
201 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
202 def _node_test_attribute_by_name(name): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
203 def _node_test_attribute_by_name(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
204 if kind is START and name in data[1]: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
205 return TEXT, data[1].get(name), pos |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
206 _node_test_attribute_by_name.axis = 'attribute' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
207 return _node_test_attribute_by_name |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
208 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
209 def _function_comment(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
210 def _function_comment(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
211 return kind is COMMENT and (kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
212 _function_comment.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
213 return _function_comment |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
214 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
215 def _function_node(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
216 def _function_node(kind, data, pos): |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
217 if kind is START: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
218 return True |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
219 return kind, data, pos |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
220 _function_node.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
221 return _function_node |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
222 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
223 def _function_processing_instruction(name=None): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
224 def _function_processing_instruction(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
225 if kind is PI and (not name or data[0] == name): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
226 return (kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
227 _function_processing_instruction.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
228 return _function_processing_instruction |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
229 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
230 def _function_text(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
231 def _function_text(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
232 return kind is TEXT and (kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
233 _function_text.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
234 return _function_text |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
235 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
236 def _literal_string(text): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
237 def _literal_string(*_): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
238 return TEXT, text, (None, -1, -1) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
239 _literal_string.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
240 return _literal_string |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
241 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
242 def _operator_eq(lval, rval): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
243 def _operator_eq(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
244 lv = lval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
245 rv = rval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
246 return (lv and lv[1]) == (rv and rv[1]) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
247 _operator_eq.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
248 return _operator_eq |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
249 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
250 def _operator_neq(lval, rval): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
251 def _operator_neq(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
252 lv = lval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
253 rv = rval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
254 return (lv and lv[1]) != (rv and rv[1]) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
255 _operator_neq.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
256 return _operator_neq |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
257 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
258 def _operator_and(lval, rval): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
259 def _operator_and(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
260 lv = lval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
261 if not lv or (lv is not True and not lv[1]): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
262 return False |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
263 rv = rval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
264 if not rv or (rv is not True and not rv[1]): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
265 return False |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
266 return True |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
267 _operator_and.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
268 return _operator_and |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
269 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
270 def _operator_or(lval, rval): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
271 def _operator_or(kind, data, pos): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
272 lv = lval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
273 if lv and (lv is True or lv[1]): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
274 return True |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
275 rv = rval(kind, data, pos) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
276 if rv and (rv is True or rv[1]): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
277 return True |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
278 return False |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
279 _operator_or.axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
280 return _operator_or |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
281 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
282 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
283 class PathSyntaxError(Exception): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
284 """Exception raised when an XPath expression is syntactically incorrect.""" |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
285 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
286 def __init__(self, message, filename=None, lineno=-1, offset=-1): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
287 if filename: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
288 message = '%s (%s, line %d)' % (message, filename, lineno) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
289 Exception.__init__(self, message) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
290 self.filename = filename |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
291 self.lineno = lineno |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
292 self.offset = offset |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
293 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
294 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
295 class _PathParser(object): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
296 """Tokenizes and parses an XPath expression.""" |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
297 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
298 _QUOTES = (("'", "'"), ('"', '"')) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
299 _TOKENS = ('::', ':', '..', '.', '//', '/', '[', ']', '()', '(', ')', '@', |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
300 '=', '!=', '!', '|') |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
301 _tokenize = re.compile('(%s)|([^%s\s]+)|\s+' % ( |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
302 '|'.join([re.escape(t) for t in _TOKENS]), |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
303 ''.join([re.escape(t[0]) for t in _TOKENS]))).findall |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
304 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
305 def __init__(self, text): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
306 self.tokens = filter(None, [a or b for a, b in self._tokenize(text)]) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
307 self.pos = 0 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
308 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
309 # Tokenizer |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
310 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
311 at_end = property(lambda self: self.pos == len(self.tokens) - 1) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
312 cur_token = property(lambda self: self.tokens[self.pos]) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
313 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
314 def next_token(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
315 self.pos += 1 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
316 return self.tokens[self.pos] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
317 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
318 def peek_token(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
319 if not self.at_end: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
320 return self.tokens[self.pos + 1] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
321 return None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
322 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
323 # Recursive descent parser |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
324 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
325 def parse(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
326 """Parses the XPath expression and returns a list of location path |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
327 tests. |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
328 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
329 For union expressions (such as `*|text()`), this function returns one |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
330 test for each operand in the union. For patch expressions that don't |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
331 use the union operator, the function always returns a list of size 1. |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
332 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
333 Each path test in turn is a sequence of tests that correspond to the |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
334 location steps, each tuples of the form `(axis, testfunc, predicates)` |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
335 """ |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
336 paths = [self._location_path()] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
337 while self.cur_token == '|': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
338 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
339 paths.append(self._location_path()) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
340 if not self.at_end: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
341 raise PathSyntaxError('Unexpected token %r after end of expression' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
342 % self.cur_token) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
343 return paths |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
344 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
345 def _location_path(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
346 next_is_closure = True |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
347 steps = [] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
348 while True: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
349 if self.cur_token == '//': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
350 next_is_closure = True |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
351 self.next_token() |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
352 elif self.cur_token == '/' and not steps: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
353 raise PathSyntaxError('Absolute location paths not supported') |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
354 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
355 axis, node_test, predicates = self._location_step() |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
356 if axis == 'child' and next_is_closure: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
357 axis = 'descendant-or-self' |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
358 steps.append((axis, node_test, predicates)) |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
359 next_is_closure = False |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
360 |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
361 if self.at_end or not self.cur_token.startswith('/'): |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
362 break |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
363 self.next_token() |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
364 |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
365 return steps |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
366 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
367 def _location_step(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
368 if self.cur_token == '@': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
369 axis = 'attribute' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
370 self.next_token() |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
371 elif self.cur_token == '.': |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
372 axis = 'self' |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
373 elif self.peek_token() == '::': |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
374 axis = self.cur_token |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
375 if axis not in ('attribute', 'child', 'descendant', |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
376 'descendant-or-self', 'namespace', 'self'): |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
377 raise PathSyntaxError('Unsupport axis "%s"' % axis) |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
378 self.next_token() |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
379 self.next_token() |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
380 else: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
381 axis = 'child' |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
382 node_test = self._node_test(axis) |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
383 predicates = [] |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
384 while self.cur_token == '[': |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
385 predicates.append(self._predicate()) |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
386 return axis, node_test, predicates |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
387 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
388 def _node_test(self, axis=None): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
389 test = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
390 if self.peek_token() in ('(', '()'): # Node type test |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
391 test = self._node_type() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
392 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
393 else: # Name test |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
394 if axis == 'attribute': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
395 if self.cur_token == '*': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
396 test = _node_test_any_attribute() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
397 else: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
398 test = _node_test_attribute_by_name(self.cur_token) |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
399 elif axis == 'self': |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
400 test = _node_test_current_element() |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
401 else: |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
402 if self.cur_token == '*': |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
403 test = _node_test_any_child_element() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
404 else: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
405 test = _node_test_child_element_by_name(self.cur_token) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
406 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
407 if not self.at_end: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
408 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
409 return test |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
410 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
411 def _node_type(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
412 name = self.cur_token |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
413 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
414 if name == 'comment': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
415 return _function_comment() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
416 elif name == 'node': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
417 return _function_node() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
418 elif name == 'processing-instruction': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
419 args = [] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
420 if self.cur_token != '()': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
421 # The processing-instruction() function optionally accepts the |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
422 # name of the PI as argument, which must be a literal string |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
423 self.next_token() # ( |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
424 if self.cur_token != ')': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
425 string = self.cur_token |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
426 if (string[0], string[-1]) in self._QUOTES: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
427 string = string[1:-1] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
428 args.append(string) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
429 return _function_processing_instruction(*args) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
430 elif name == 'text': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
431 return _function_text() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
432 else: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
433 raise PathSyntaxError('%s() not allowed here' % name) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
434 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
435 def _predicate(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
436 assert self.cur_token == '[' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
437 self.next_token() |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
438 expr = self._or_expr() |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
439 assert self.cur_token == ']' |
111
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
440 if not self.at_end: |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
441 self.next_token() |
8a4d9064f363
Some fixes and more unit tests for the XPath engine.
cmlenz
parents:
106
diff
changeset
|
442 return expr |
106
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
443 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
444 def _or_expr(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
445 expr = self._and_expr() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
446 while self.cur_token == 'or': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
447 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
448 expr = _operator_or(expr, self._and_expr()) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
449 return expr |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
450 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
451 def _and_expr(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
452 expr = self._equality_expr() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
453 while self.cur_token == 'and': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
454 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
455 expr = _operator_and(expr, self._equality_expr()) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
456 return expr |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
457 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
458 def _equality_expr(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
459 expr = self._primary_expr() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
460 while self.cur_token in ('=', '!='): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
461 op = {'=': _operator_eq, '!=': _operator_neq}[self.cur_token] |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
462 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
463 expr = op(expr, self._primary_expr()) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
464 return expr |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
465 |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
466 def _primary_expr(self): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
467 token = self.cur_token |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
468 if len(token) > 1 and (token[0], token[-1]) in self._QUOTES: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
469 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
470 return _literal_string(token[1:-1]) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
471 elif token[0].isdigit(): |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
472 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
473 return _literal_number(float(token)) |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
474 else: |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
475 axis = None |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
476 if token == '@': |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
477 axis = 'attribute' |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
478 self.next_token() |
61fa4cadb766
Complete rewrite of the XPath parsing, which was a mess before. Closes #19.
cmlenz
parents:
77
diff
changeset
|
479 return self._node_test(axis) |