annotate markup/output.py @ 221:c448cf114c30

Fix Python 2.3 incompatibility introduced in [276].
author cmlenz
date Tue, 05 Sep 2006 16:35:54 +0000
parents 0f897d319002
children
rev   line source
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
2 #
66
822089ae65ce Switch copyright to Edgewall and URLs to markup.edgewall.org.
cmlenz
parents: 27
diff changeset
3 # Copyright (C) 2006 Edgewall Software
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
5 #
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
66
822089ae65ce Switch copyright to Edgewall and URLs to markup.edgewall.org.
cmlenz
parents: 27
diff changeset
8 # are also available at http://markup.edgewall.org/wiki/License.
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
9 #
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
66
822089ae65ce Switch copyright to Edgewall and URLs to markup.edgewall.org.
cmlenz
parents: 27
diff changeset
12 # history and logs, available at http://markup.edgewall.org/log/.
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
13
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
14 """This module provides different kinds of serialization methods for XML event
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
15 streams.
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
16 """
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
17
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
18 from itertools import chain
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
19 try:
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
20 frozenset
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
21 except NameError:
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
22 from sets import ImmutableSet as frozenset
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
23 import re
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
24
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
25 from markup.core import escape, Markup, Namespace, QName, StreamEventKind
145
56d534eb53f9 * Fix error in expression evaluation when the expression evaluates to an iterable that does not produce event tuples.
cmlenz
parents: 143
diff changeset
26 from markup.core import DOCTYPE, START, END, START_NS, TEXT, START_CDATA, \
56d534eb53f9 * Fix error in expression evaluation when the expression evaluates to an iterable that does not produce event tuples.
cmlenz
parents: 143
diff changeset
27 END_CDATA, PI, COMMENT, XML_NAMESPACE
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
28
200
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
29 __all__ = ['DocType', 'XMLSerializer', 'XHTMLSerializer', 'HTMLSerializer',
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
30 'TextSerializer']
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
31
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
32
85
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
33 class DocType(object):
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
34 """Defines a number of commonly used DOCTYPE declarations as constants."""
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
35
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
36 HTML_STRICT = ('html', '-//W3C//DTD HTML 4.01//EN',
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
37 'http://www.w3.org/TR/html4/strict.dtd')
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
38 HTML_TRANSITIONAL = ('html', '-//W3C//DTD HTML 4.01 Transitional//EN',
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
39 'http://www.w3.org/TR/html4/loose.dtd')
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
40 HTML = HTML_STRICT
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
41
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
42 XHTML_STRICT = ('html', '-//W3C//DTD XHTML 1.0 Strict//EN',
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
43 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
44 XHTML_TRANSITIONAL = ('html', '-//W3C//DTD XHTML 1.0 Transitional//EN',
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
45 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd')
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
46 XHTML = XHTML_STRICT
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
47
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
48
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
49 class XMLSerializer(object):
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
50 """Produces XML text from an event stream.
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
51
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
52 >>> from markup.builder import tag
20
e3d3c1d8c98a Fix tests broken in [20].
cmlenz
parents: 19
diff changeset
53 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
54 >>> print ''.join(XMLSerializer()(elem.generate()))
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
55 <div><a href="foo"/><br/><hr noshade="True"/></div>
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
56 """
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
57
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
58 _PRESERVE_SPACE = frozenset()
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
59
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
60 def __init__(self, doctype=None, strip_whitespace=True):
85
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
61 """Initialize the XML serializer.
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
62
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
63 @param doctype: a `(name, pubid, sysid)` tuple that represents the
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
64 DOCTYPE declaration that should be included at the top of the
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
65 generated output
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
66 @param strip_whitespace: whether extraneous whitespace should be
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
67 stripped from the output
85
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
68 """
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
69 self.preamble = []
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
70 if doctype:
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
71 self.preamble.append((DOCTYPE, doctype, (None, -1, -1)))
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
72 self.filters = [EmptyTagFilter()]
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
73 if strip_whitespace:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
74 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE))
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
75
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
76 def __call__(self, stream):
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
77 ns_attrib = []
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
78 ns_mapping = {XML_NAMESPACE.uri: 'xml'}
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
79 have_doctype = False
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
80 in_cdata = False
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
81
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
82 stream = chain(self.preamble, stream)
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
83 for filter_ in self.filters:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
84 stream = filter_(stream)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
85 for kind, data, pos in stream:
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
86
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
87 if kind is START or kind is EMPTY:
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
88 tag, attrib = data
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
89
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
90 tagname = tag.localname
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
91 namespace = tag.namespace
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
92 if namespace:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
93 if namespace in ns_mapping:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
94 prefix = ns_mapping[namespace]
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
95 if prefix:
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
96 tagname = '%s:%s' % (prefix, tagname)
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
97 else:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
98 ns_attrib.append((QName('xmlns'), namespace))
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
99 buf = ['<', tagname]
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
100
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
101 for attr, value in attrib + ns_attrib:
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
102 attrname = attr.localname
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
103 if attr.namespace:
26
039fc5b87405 * Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents: 20
diff changeset
104 prefix = ns_mapping.get(attr.namespace)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
105 if prefix:
69
e9a3930f8823 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
106 attrname = '%s:%s' % (prefix, attrname)
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
107 buf += [' ', attrname, '="', escape(value), '"']
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
108 ns_attrib = []
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
109
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
110 if kind is EMPTY:
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
111 buf += ['/>']
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
112 else:
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
113 buf += ['>']
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
114
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
115 yield Markup(''.join(buf))
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
116
69
e9a3930f8823 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
117 elif kind is END:
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
118 tag = data
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
119 tagname = tag.localname
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
120 if tag.namespace:
26
039fc5b87405 * Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents: 20
diff changeset
121 prefix = ns_mapping.get(tag.namespace)
039fc5b87405 * Split out the XPath tests into a separate `unittest`-based file.
cmlenz
parents: 20
diff changeset
122 if prefix:
69
e9a3930f8823 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
123 tagname = '%s:%s' % (prefix, tag.localname)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
124 yield Markup('</%s>' % tagname)
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
125
69
e9a3930f8823 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
126 elif kind is TEXT:
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
127 if in_cdata:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
128 yield data
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
129 else:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
130 yield escape(data, quotes=False)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
131
89
d4c7617900e3 Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
132 elif kind is COMMENT:
d4c7617900e3 Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
133 yield Markup('<!--%s-->' % data)
d4c7617900e3 Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
134
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
135 elif kind is DOCTYPE and not have_doctype:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
136 name, pubid, sysid = data
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
137 buf = ['<!DOCTYPE %s']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
138 if pubid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
139 buf += [' PUBLIC "%s"']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
140 elif sysid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
141 buf += [' SYSTEM']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
142 if sysid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
143 buf += [' "%s"']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
144 buf += ['>\n']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
145 yield Markup(''.join(buf), *filter(None, data))
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
146 have_doctype = True
109
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
147
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
148 elif kind is START_NS:
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
149 prefix, uri = data
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
150 if uri not in ns_mapping:
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
151 ns_mapping[uri] = prefix
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
152 if not prefix:
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
153 ns_attrib.append((QName('xmlns'), uri))
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
154 else:
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
155 ns_attrib.append((QName('xmlns:%s' % prefix), uri))
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
156
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
157 elif kind is START_CDATA:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
158 yield Markup('<![CDATA[')
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
159 in_cdata = True
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
160
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
161 elif kind is END_CDATA:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
162 yield Markup(']]>')
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
163 in_cdata = False
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
164
105
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
165 elif kind is PI:
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
166 yield Markup('<?%s %s?>' % data)
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
167
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
168
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
169 class XHTMLSerializer(XMLSerializer):
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
170 """Produces XHTML text from an event stream.
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
171
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
172 >>> from markup.builder import tag
20
e3d3c1d8c98a Fix tests broken in [20].
cmlenz
parents: 19
diff changeset
173 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
174 >>> print ''.join(XHTMLSerializer()(elem.generate()))
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
175 <div><a href="foo"></a><br /><hr noshade="noshade" /></div>
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
176 """
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
177
18
4cbebb15a834 Actually make use of the `markup.core.Namespace` class, and add a couple of doctests.
cmlenz
parents: 1
diff changeset
178 NAMESPACE = Namespace('http://www.w3.org/1999/xhtml')
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
179
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
180 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame',
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
181 'hr', 'img', 'input', 'isindex', 'link', 'meta',
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
182 'param'])
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
183 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare',
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
184 'defer', 'disabled', 'ismap', 'multiple',
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
185 'nohref', 'noresize', 'noshade', 'nowrap'])
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
186 _PRESERVE_SPACE = frozenset([QName('pre'), QName('textarea')])
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
187
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
188 def __call__(self, stream):
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
189 namespace = self.NAMESPACE
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
190 ns_attrib = []
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
191 ns_mapping = {XML_NAMESPACE.uri: 'xml'}
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
192 boolean_attrs = self._BOOLEAN_ATTRS
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
193 empty_elems = self._EMPTY_ELEMS
85
db8f2958c670 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
194 have_doctype = False
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
195 in_cdata = False
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
196
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
197 stream = chain(self.preamble, stream)
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
198 for filter_ in self.filters:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
199 stream = filter_(stream)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
200 for kind, data, pos in stream:
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
201
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
202 if kind is START or kind is EMPTY:
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
203 tag, attrib = data
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
204
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
205 tagname = tag.localname
177
dbae9efe5704 * Minor fix for the XHTML serializer (the local namespace var got clobbered)
cmlenz
parents: 158
diff changeset
206 tagns = tag.namespace
dbae9efe5704 * Minor fix for the XHTML serializer (the local namespace var got clobbered)
cmlenz
parents: 158
diff changeset
207 if tagns:
dbae9efe5704 * Minor fix for the XHTML serializer (the local namespace var got clobbered)
cmlenz
parents: 158
diff changeset
208 if tagns in ns_mapping:
dbae9efe5704 * Minor fix for the XHTML serializer (the local namespace var got clobbered)
cmlenz
parents: 158
diff changeset
209 prefix = ns_mapping[tagns]
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
210 if prefix:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
211 tagname = '%s:%s' % (prefix, tagname)
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
212 else:
177
dbae9efe5704 * Minor fix for the XHTML serializer (the local namespace var got clobbered)
cmlenz
parents: 158
diff changeset
213 ns_attrib.append((QName('xmlns'), tagns))
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
214 buf = ['<', tagname]
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
215
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
216 for attr, value in attrib + ns_attrib:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
217 attrname = attr.localname
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
218 if attr.namespace:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
219 prefix = ns_mapping.get(attr.namespace)
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
220 if prefix:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
221 attrname = '%s:%s' % (prefix, attrname)
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
222 if attrname in boolean_attrs:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
223 if value:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
224 buf += [' ', attrname, '="', attrname, '"']
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
225 else:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
226 buf += [' ', attrname, '="', escape(value), '"']
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
227 ns_attrib = []
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
228
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
229 if kind is EMPTY:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
230 if (tagns and tagns != namespace.uri) \
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
231 or tag.localname in empty_elems:
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
232 buf += [' />']
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
233 else:
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
234 buf += ['></%s>' % tagname]
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
235 else:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
236 buf += ['>']
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
237
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
238 yield Markup(''.join(buf))
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
239
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
240 elif kind is END:
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
241 tag = data
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
242 tagname = tag.localname
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
243 if tag.namespace:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
244 prefix = ns_mapping.get(tag.namespace)
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
245 if prefix:
177
dbae9efe5704 * Minor fix for the XHTML serializer (the local namespace var got clobbered)
cmlenz
parents: 158
diff changeset
246 tagname = '%s:%s' % (prefix, tagname)
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
247 yield Markup('</%s>' % tagname)
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
248
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
249 elif kind is TEXT:
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
250 if in_cdata:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
251 yield data
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
252 else:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
253 yield escape(data, quotes=False)
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
254
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
255 elif kind is COMMENT:
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
256 yield Markup('<!--%s-->' % data)
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
257
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
258 elif kind is DOCTYPE and not have_doctype:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
259 name, pubid, sysid = data
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
260 buf = ['<!DOCTYPE %s']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
261 if pubid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
262 buf += [' PUBLIC "%s"']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
263 elif sysid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
264 buf += [' SYSTEM']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
265 if sysid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
266 buf += [' "%s"']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
267 buf += ['>\n']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
268 yield Markup(''.join(buf), *filter(None, data))
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
269 have_doctype = True
109
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
270
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
271 elif kind is START_NS:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
272 prefix, uri = data
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
273 if uri not in ns_mapping:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
274 ns_mapping[uri] = prefix
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
275 if not prefix:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
276 ns_attrib.append((QName('xmlns'), uri))
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
277 else:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
278 ns_attrib.append((QName('xmlns:%s' % prefix), uri))
109
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
279
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
280 elif kind is START_CDATA:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
281 yield Markup('<![CDATA[')
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
282 in_cdata = True
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
283
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
284 elif kind is END_CDATA:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
285 yield Markup(']]>')
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
286 in_cdata = False
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
287
105
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
288 elif kind is PI:
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
289 yield Markup('<?%s %s?>' % data)
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
290
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
291
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
292 class HTMLSerializer(XHTMLSerializer):
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
293 """Produces HTML text from an event stream.
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
294
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
295 >>> from markup.builder import tag
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
296 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
297 >>> print ''.join(HTMLSerializer()(elem.generate()))
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
298 <div><a href="foo"></a><br><hr noshade></div>
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
299 """
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
300
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
301 _NOESCAPE_ELEMS = frozenset([QName('script'), QName('style')])
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
302
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
303 def __init__(self, doctype=None, strip_whitespace=True):
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
304 """Initialize the HTML serializer.
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
305
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
306 @param doctype: a `(name, pubid, sysid)` tuple that represents the
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
307 DOCTYPE declaration that should be included at the top of the
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
308 generated output
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
309 @param strip_whitespace: whether extraneous whitespace should be
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
310 stripped from the output
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
311 """
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
312 super(HTMLSerializer, self).__init__(doctype, False)
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
313 if strip_whitespace:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
314 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE,
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
315 self._NOESCAPE_ELEMS, True))
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
316
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
317 def __call__(self, stream):
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
318 namespace = self.NAMESPACE
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
319 ns_mapping = {}
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
320 boolean_attrs = self._BOOLEAN_ATTRS
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
321 empty_elems = self._EMPTY_ELEMS
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
322 noescape_elems = self._NOESCAPE_ELEMS
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
323 have_doctype = False
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
324 noescape = False
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
325
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
326 stream = chain(self.preamble, stream)
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
327 for filter_ in self.filters:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
328 stream = filter_(stream)
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
329 for kind, data, pos in stream:
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
330
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
331 if kind is START or kind is EMPTY:
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
332 tag, attrib = data
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
333 if not tag.namespace or tag in namespace:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
334 tagname = tag.localname
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
335 buf = ['<', tagname]
96
35d681a94763 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
336
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
337 for attr, value in attrib:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
338 attrname = attr.localname
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
339 if not attr.namespace or attr in namespace:
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
340 if attrname in boolean_attrs:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
341 if value:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
342 buf += [' ', attrname]
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
343 else:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
344 buf += [' ', attrname, '="', escape(value), '"']
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
345
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
346 buf += ['>']
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
347
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
348 if kind is EMPTY:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
349 if tagname not in empty_elems:
213
bafa1cc49c2f Store original message in exceptions as `msg` ivar.
cmlenz
parents: 212
diff changeset
350 buf += ['</%s>' % tagname]
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
351
140
a2edde90ad24 Fix bug in HTML serializer, plus some other minor tweaks.
cmlenz
parents: 136
diff changeset
352 yield Markup(''.join(buf))
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
353
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
354 if tagname in noescape_elems:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
355 noescape = True
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
356
69
e9a3930f8823 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
357 elif kind is END:
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
358 tag = data
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
359 if not tag.namespace or tag in namespace:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
360 yield Markup('</%s>' % tag.localname)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
361
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
362 noescape = False
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
363
69
e9a3930f8823 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
364 elif kind is TEXT:
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
365 if noescape:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
366 yield data
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
367 else:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
368 yield escape(data, quotes=False)
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
369
89
d4c7617900e3 Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
370 elif kind is COMMENT:
d4c7617900e3 Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
371 yield Markup('<!--%s-->' % data)
d4c7617900e3 Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
372
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
373 elif kind is DOCTYPE and not have_doctype:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
374 name, pubid, sysid = data
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
375 buf = ['<!DOCTYPE %s']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
376 if pubid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
377 buf += [' PUBLIC "%s"']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
378 elif sysid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
379 buf += [' SYSTEM']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
380 if sysid:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
381 buf += [' "%s"']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
382 buf += ['>\n']
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
383 yield Markup(''.join(buf), *filter(None, data))
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
384 have_doctype = True
109
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
385
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
386 elif kind is START_NS and data[1] not in ns_mapping:
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
387 ns_mapping[data[1]] = data[0]
109
2de3f9d84a1c Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
388
105
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
389 elif kind is PI:
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
390 yield Markup('<?%s %s?>' % data)
334a338847af Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
391
1
821114ec4f69 Initial import.
cmlenz
parents:
diff changeset
392
200
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
393 class TextSerializer(object):
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
394 """Produces plain text from an event stream.
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
395
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
396 Only text events are included in the output. Unlike the other serializer,
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
397 special XML characters are not escaped:
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
398
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
399 >>> from markup.builder import tag
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
400 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br)
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
401 >>> print elem
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
402 <div><a href="foo">&lt;Hello!&gt;</a><br/></div>
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
403 >>> print ''.join(TextSerializer()(elem.generate()))
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
404 <Hello!>
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
405
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
406 If text events contain literal markup (instances of the `Markup` class),
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
407 tags or entities are stripped from the output:
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
408
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
409 >>> elem = tag.div(Markup('<a href="foo">Hello!</a><br/>'))
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
410 >>> print elem
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
411 <div><a href="foo">Hello!</a><br/></div>
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
412 >>> print ''.join(TextSerializer()(elem.generate()))
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
413 Hello!
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
414 """
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
415
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
416 def __call__(self, stream):
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
417 for kind, data, pos in stream:
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
418 if kind is TEXT:
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
419 if type(data) is Markup:
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
420 data = data.striptags().stripentities()
201
0f16c907077e The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents: 200
diff changeset
421 yield unicode(data)
200
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
422
50eab0469148 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
423
212
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
424 class EmptyTagFilter(object):
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
425 """Combines `START` and `STOP` events into `EMPTY` events for elements that
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
426 have no contents.
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
427 """
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
428
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
429 EMPTY = StreamEventKind('EMPTY')
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
430
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
431 def __call__(self, stream):
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
432 prev = (None, None, None)
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
433 for kind, data, pos in stream:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
434 if prev[0] is START:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
435 if kind is END:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
436 prev = EMPTY, prev[1], prev[2]
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
437 yield prev
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
438 continue
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
439 else:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
440 yield prev
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
441 if kind is not START:
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
442 yield kind, data, pos
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
443 prev = kind, data, pos
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
444
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
445
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
446 EMPTY = EmptyTagFilter.EMPTY
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
447
e8c43127d9a9 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
448
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
449 class WhitespaceFilter(object):
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
450 """A filter that removes extraneous ignorable white space from the
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
451 stream."""
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
452
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
453 def __init__(self, preserve=None, noescape=None, escape_cdata=False):
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
454 """Initialize the filter.
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
455
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
456 @param preserve: a set or sequence of tag names for which white-space
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
457 should be ignored.
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
458 @param noescape: a set or sequence of tag names for which text content
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
459 should not be escaped
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
460
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
461 Both the `preserve` and `noescape` sets are expected to refer to
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
462 elements that cannot contain further child elements.
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
463 """
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
464 if preserve is None:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
465 preserve = []
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
466 self.preserve = frozenset(preserve)
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
467 if noescape is None:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
468 noescape = []
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
469 self.noescape = frozenset(noescape)
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
470 self.escape_cdata = escape_cdata
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
471
219
0f897d319002 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
472 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'],
0f897d319002 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
473 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub,
0f897d319002 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
474 collapse_lines=re.compile('\n{2,}').sub):
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
475 mjoin = Markup('').join
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
476 preserve_elems = self.preserve
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
477 preserve = False
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
478 noescape_elems = self.noescape
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
479 noescape = False
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
480 escape_cdata = self.escape_cdata
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
481
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
482 textbuf = []
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
483 push_text = textbuf.append
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
484 pop_text = textbuf.pop
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
485 for kind, data, pos in chain(stream, [(None, None, None)]):
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
486 if kind is TEXT:
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
487 if noescape:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
488 data = Markup(data)
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
489 push_text(data)
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
490 else:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
491 if textbuf:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
492 if len(textbuf) > 1:
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
493 text = mjoin(textbuf, escape_quotes=False)
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
494 del textbuf[:]
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
495 else:
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
496 text = escape(pop_text(), quotes=False)
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
497 if not preserve:
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
498 text = collapse_lines('\n', trim_trailing_space('', text))
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
499 yield TEXT, Markup(text), pos
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
500
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
501 if kind is START:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
502 tag, attrib = data
219
0f897d319002 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
503 if not preserve and (tag in preserve_elems or
0f897d319002 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
504 attrib.get(space) == 'preserve'):
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
505 preserve = True
219
0f897d319002 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
506 if not noescape and tag in noescape_elems:
141
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
507 noescape = True
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
508
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
509 elif kind is END:
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
510 preserve = noescape = False
b3ceaa35fb6b * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
511
143
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
512 elif kind is START_CDATA and not escape_cdata:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
513 noescape = True
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
514
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
515 elif kind is END_CDATA and not escape_cdata:
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
516 noescape = False
ef761afcedff CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
517
136
636e0100fcaf Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
518 if kind:
123
93bbdcf9428b Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
519 yield kind, data, pos
Copyright (C) 2012-2017 Edgewall Software