annotate markup/core.py @ 17:74cc70129d04 trunk

Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs. Also, output filters are now applied in the `Stream.serialize()` method instead of by the `Template.generate()` method, which just makes more sense.
author cmlenz
date Sun, 18 Jun 2006 22:33:33 +0000
parents f77f7a91aa46
children 5420cfe42d36
rev   line source
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
2 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
3 # Copyright (C) 2006 Christopher Lenz
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
5 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
8 # are also available at http://trac.edgewall.com/license.html.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
9 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
12 # history and logs, available at http://projects.edgewall.com/trac/.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
13
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
14 """Core classes for markup processing."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
15
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
16 import htmlentitydefs
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
17 import re
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
18 from StringIO import StringIO
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
19
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
20 __all__ = ['Stream', 'Markup', 'escape', 'unescape', 'Namespace', 'QName']
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
21
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
22
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
23 class StreamEventKind(str):
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
24 """A kind of event on an XML stream."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
25
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
26
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
27 class Stream(object):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
28 """Represents a stream of markup events.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
29
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
30 This class is basically an iterator over the events.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
31
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
32 Also provided are ways to serialize the stream to text. The `serialize()`
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
33 method will return an iterator over generated strings, while `render()`
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
34 returns the complete generated text at once. Both accept various parameters
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
35 that impact the way the stream is serialized.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
36
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
37 Stream events are tuples of the form:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
38
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
39 (kind, data, position)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
40
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
41 where `kind` is the event kind (such as `START`, `END`, `TEXT`, etc), `data`
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
42 depends on the kind of event, and `position` is a `(line, offset)` tuple
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
43 that contains the location of the original element or text in the input.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
44 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
45 __slots__ = ['events']
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
46
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
47 START = StreamEventKind('START') # a start tag
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
48 END = StreamEventKind('END') # an end tag
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
49 TEXT = StreamEventKind('TEXT') # literal text
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
50 PROLOG = StreamEventKind('PROLOG') # XML prolog
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
51 DOCTYPE = StreamEventKind('DOCTYPE') # doctype declaration
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
52 START_NS = StreamEventKind('START-NS') # start namespace mapping
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
53 END_NS = StreamEventKind('END-NS') # end namespace mapping
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
54 PI = StreamEventKind('PI') # processing instruction
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
55 COMMENT = StreamEventKind('COMMENT') # comment
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
56
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
57 def __init__(self, events):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
58 """Initialize the stream with a sequence of markup events.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
59
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
60 @oaram events: a sequence or iterable providing the events
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
61 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
62 self.events = events
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
63
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
64 def __iter__(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
65 return iter(self.events)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
66
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
67 def render(self, method='xml', encoding='utf-8', filters=None, **kwargs):
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
68 """Return a string representation of the stream.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
69
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
70 @param method: determines how the stream is serialized; can be either
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
71 'xml' or 'html', or a custom `Serializer` subclass
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
72 @param encoding: how the output string should be encoded; if set to
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
73 `None`, this method returns a `unicode` object
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
74
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
75 Any additional keyword arguments are passed to the serializer, and thus
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
76 depend on the `method` parameter value.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
77 """
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
78 generator = self.serialize(method=method, filters=filters, **kwargs)
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
79 output = u''.join(list(generator))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
80 if encoding is not None:
9
5dc4bfe67c20 Actually use the specified encoding in `Stream.render()`.
cmlenz
parents: 8
diff changeset
81 return output.encode(encoding)
8
3710e3d0d4a2 `Stream.render()` was masking `TypeError`s (fix based on suggestion by Matt Good).
cmlenz
parents: 6
diff changeset
82 return output
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
83
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
84 def select(self, path):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
85 """Return a new stream that contains the events matching the given
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
86 XPath expression.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
87
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
88 @param path: a string containing the XPath expression
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
89 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
90 from markup.path import Path
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
91 return Path(path).select(self)
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
92
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
93 def serialize(self, method='xml', filters=None, **kwargs):
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
94 """Generate strings corresponding to a specific serialization of the
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
95 stream.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
96
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
97 Unlike the `render()` method, this method is a generator this returns
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
98 the serialized output incrementally, as opposed to returning a single
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
99 string.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
100
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
101 @param method: determines how the stream is serialized; can be either
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
102 'xml' or 'html', or a custom `Serializer` subclass
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
103 """
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
104 from markup.filters import WhitespaceFilter
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
105 from markup import output
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
106 cls = method
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
107 if isinstance(method, basestring):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
108 cls = {'xml': output.XMLSerializer,
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
109 'html': output.HTMLSerializer}[method]
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
110 else:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
111 assert issubclass(cls, serializers.Serializer)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
112 serializer = cls(**kwargs)
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
113
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
114 stream = self
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
115 if filters is None:
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
116 filters = [WhitespaceFilter()]
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
117 for filter_ in filters:
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
118 stream = filter_(iter(stream))
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
119
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
120 return serializer.serialize(stream)
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
121
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
122 def __str__(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
123 return self.render()
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
124
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
125 def __unicode__(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
126 return self.render(encoding=None)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
127
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
128
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
129 class Attributes(list):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
130
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
131 def __init__(self, attrib=None):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
132 list.__init__(self, map(lambda (k, v): (QName(k), v), attrib or []))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
133
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
134 def __contains__(self, name):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
135 return name in [attr for attr, value in self]
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
136
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
137 def get(self, name, default=None):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
138 for attr, value in self:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
139 if attr == name:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
140 return value
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
141 return default
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
142
5
dbb08edbc615 Improved `py:attrs` directive so that it removes existing attributes if they evaluate to `None` (AFAICT matching Kid behavior).
cmlenz
parents: 1
diff changeset
143 def remove(self, name):
dbb08edbc615 Improved `py:attrs` directive so that it removes existing attributes if they evaluate to `None` (AFAICT matching Kid behavior).
cmlenz
parents: 1
diff changeset
144 for idx, (attr, _) in enumerate(self):
dbb08edbc615 Improved `py:attrs` directive so that it removes existing attributes if they evaluate to `None` (AFAICT matching Kid behavior).
cmlenz
parents: 1
diff changeset
145 if attr == name:
dbb08edbc615 Improved `py:attrs` directive so that it removes existing attributes if they evaluate to `None` (AFAICT matching Kid behavior).
cmlenz
parents: 1
diff changeset
146 del self[idx]
dbb08edbc615 Improved `py:attrs` directive so that it removes existing attributes if they evaluate to `None` (AFAICT matching Kid behavior).
cmlenz
parents: 1
diff changeset
147 break
dbb08edbc615 Improved `py:attrs` directive so that it removes existing attributes if they evaluate to `None` (AFAICT matching Kid behavior).
cmlenz
parents: 1
diff changeset
148
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
149 def set(self, name, value):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
150 for idx, (attr, _) in enumerate(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
151 if attr == name:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
152 self[idx] = (attr, value)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
153 break
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
154 else:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
155 self.append((QName(name), value))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
156
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
157
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
158 class Markup(unicode):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
159 """Marks a string as being safe for inclusion in HTML/XML output without
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
160 needing to be escaped.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
161 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
162 def __new__(self, text='', *args):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
163 if args:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
164 text %= tuple([escape(arg) for arg in args])
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
165 return unicode.__new__(self, text)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
166
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
167 def __add__(self, other):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
168 return Markup(unicode(self) + Markup.escape(other))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
169
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
170 def __mod__(self, args):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
171 if not isinstance(args, (list, tuple)):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
172 args = [args]
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
173 return Markup(unicode.__mod__(self,
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
174 tuple([escape(arg) for arg in args])))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
175
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
176 def __mul__(self, num):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
177 return Markup(unicode(self) * num)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
178
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
179 def __repr__(self):
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
180 return '<%s "%s">' % (self.__class__.__name__, self)
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
181
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
182 def join(self, seq):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
183 return Markup(unicode(self).join([Markup.escape(item) for item in seq]))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
184
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
185 def stripentities(self, keepxmlentities=False):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
186 """Return a copy of the text with any character or numeric entities
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
187 replaced by the equivalent UTF-8 characters.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
188
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
189 If the `keepxmlentities` parameter is provided and evaluates to `True`,
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
190 the core XML entities (&amp;, &apos;, &gt;, &lt; and &quot;) are not
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
191 stripped.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
192 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
193 def _replace_entity(match):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
194 if match.group(1): # numeric entity
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
195 ref = match.group(1)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
196 if ref.startswith('x'):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
197 ref = int(ref[1:], 16)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
198 else:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
199 ref = int(ref, 10)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
200 return unichr(ref)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
201 else: # character entity
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
202 ref = match.group(2)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
203 if keepxmlentities and ref in ('amp', 'apos', 'gt', 'lt', 'quot'):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
204 return '&%s;' % ref
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
205 try:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
206 codepoint = htmlentitydefs.name2codepoint[ref]
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
207 return unichr(codepoint)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
208 except KeyError:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
209 if keepxmlentities:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
210 return '&amp;%s;' % ref
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
211 else:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
212 return ref
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
213 return Markup(re.sub(r'&(?:#((?:\d+)|(?:[xX][0-9a-fA-F]+));?|(\w+);)',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
214 _replace_entity, self))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
215
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
216 def striptags(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
217 """Return a copy of the text with all XML/HTML tags removed."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
218 return Markup(re.sub(r'<[^>]*?>', '', self))
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
219
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
220 def escape(cls, text, quotes=True):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
221 """Create a Markup instance from a string and escape special characters
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
222 it may contain (<, >, & and \").
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
223
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
224 If the `quotes` parameter is set to `False`, the \" character is left
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
225 as is. Escaping quotes is generally only required for strings that are
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
226 to be used in attribute values.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
227 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
228 if isinstance(text, cls):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
229 return text
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
230 text = unicode(text)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
231 if not text:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
232 return cls()
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
233 text = text.replace('&', '&amp;') \
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
234 .replace('<', '&lt;') \
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
235 .replace('>', '&gt;')
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
236 if quotes:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
237 text = text.replace('"', '&#34;')
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
238 return cls(text)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
239 escape = classmethod(escape)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
240
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
241 def unescape(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
242 """Reverse-escapes &, <, > and \" and returns a `unicode` object."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
243 if not self:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
244 return ''
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
245 return unicode(self).replace('&#34;', '"') \
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
246 .replace('&gt;', '>') \
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
247 .replace('&lt;', '<') \
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
248 .replace('&amp;', '&')
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
249
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
250 def plaintext(self, keeplinebreaks=True):
6
71e8e645fe81 Simplified implementation of `py:content` directive.
cmlenz
parents: 5
diff changeset
251 """Returns the text as a `unicode` string with all entities and tags
71e8e645fe81 Simplified implementation of `py:content` directive.
cmlenz
parents: 5
diff changeset
252 removed.
71e8e645fe81 Simplified implementation of `py:content` directive.
cmlenz
parents: 5
diff changeset
253 """
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
254 text = unicode(self.striptags().stripentities())
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
255 if not keeplinebreaks:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
256 text = text.replace('\n', ' ')
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
257 return text
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
258
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
259 def sanitize(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
260 from markup.filters import HTMLSanitizer
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
261 from markup.input import HTMLParser
17
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
262 text = StringIO(self.stripentities(keepxmlentities=True))
74cc70129d04 Refactoring to address #6: all match templates are now processed by a single filter, which means that match templates added by included templates are properly applied. A side effect of this refactoring is that `Context` objects may not be reused across multiple template processing runs.
cmlenz
parents: 10
diff changeset
263 return Stream(HTMLSanitizer()(HTMLParser(text)))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
264
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
265
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
266 escape = Markup.escape
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
267
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
268 def unescape(text):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
269 """Reverse-escapes &, <, > and \" and returns a `unicode` object."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
270 if not isinstance(text, Markup):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
271 return text
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
272 return text.unescape()
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
273
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
274
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
275 class Namespace(object):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
276
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
277 def __init__(self, uri):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
278 self.uri = uri
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
279
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
280 def __getitem__(self, name):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
281 return QName(self.uri + '}' + name)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
282
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
283 __getattr__ = __getitem__
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
284
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
285 def __repr__(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
286 return '<Namespace "%s">' % self.uri
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
287
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
288 def __str__(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
289 return self.uri
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
290
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
291 def __unicode__(self):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
292 return unicode(self.uri)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
293
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
294
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
295 class QName(unicode):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
296 """A qualified element or attribute name.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
297
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
298 The unicode value of instances of this class contains the qualified name of
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
299 the element or attribute, in the form `{namespace}localname`. The namespace
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
300 URI can be obtained through the additional `namespace` attribute, while the
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
301 local name can be accessed through the `localname` attribute.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
302 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
303 __slots__ = ['namespace', 'localname']
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
304
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
305 def __new__(cls, qname):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
306 if isinstance(qname, QName):
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
307 return qname
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
308
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
309 parts = qname.split('}', 1)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
310 if qname.find('}') > 0:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
311 self = unicode.__new__(cls, '{' + qname)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
312 self.namespace = parts[0]
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
313 self.localname = parts[1]
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
314 else:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
315 self = unicode.__new__(cls, qname)
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
316 self.namespace = None
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
317 self.localname = qname
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
318 return self
Copyright (C) 2012-2017 Edgewall Software