genshi/mirror: genshi/output.py annotate

annotate genshi/output.py @ 368:94ff33bfe515 stable-0.3.x

Ported [425] to 0.3.x.

author	cmlenz
date	Wed, 22 Nov 2006 20:55:08 +0000
parents	65a46e008098
children

rev	line source
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	1 # -- coding: utf-8 --
5479aae32f5a Initial import. cmlenz parents: diff changeset	2 #
66 59eb24184e9c Switch copyright to Edgewall and URLs to markup.edgewall.org. cmlenz parents: 27 diff changeset	3 # Copyright (C) 2006 Edgewall Software
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	4 # All rights reserved.
5479aae32f5a Initial import. cmlenz parents: diff changeset	5 #
5479aae32f5a Initial import. cmlenz parents: diff changeset	6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import. cmlenz parents: diff changeset	7 # you should have received as part of this distribution. The terms
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	8 # are also available at http://genshi.edgewall.org/wiki/License.
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	9 #
5479aae32f5a Initial import. cmlenz parents: diff changeset	10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import. cmlenz parents: diff changeset	11 # individuals. For the exact contribution history, see the revision
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	12 # history and logs, available at http://genshi.edgewall.org/log/.
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	13
5479aae32f5a Initial import. cmlenz parents: diff changeset	14 """This module provides different kinds of serialization methods for XML event
5479aae32f5a Initial import. cmlenz parents: diff changeset	15 streams.
5479aae32f5a Initial import. cmlenz parents: diff changeset	16 """
5479aae32f5a Initial import. cmlenz parents: diff changeset	17
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	18 from itertools import chain
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	19 try:
5479aae32f5a Initial import. cmlenz parents: diff changeset	20 frozenset
5479aae32f5a Initial import. cmlenz parents: diff changeset	21 except NameError:
5479aae32f5a Initial import. cmlenz parents: diff changeset	22 from sets import ImmutableSet as frozenset
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	23 import re
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	24
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	25 from genshi.core import escape, Markup, Namespace, QName, StreamEventKind
84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	26 from genshi.core import DOCTYPE, START, END, START_NS, TEXT, START_CDATA, \
145 47bbd9d2a5af * Fix error in expression evaluation when the expression evaluates to an iterable that does not produce event tuples. cmlenz parents: 143 diff changeset	27 END_CDATA, PI, COMMENT, XML_NAMESPACE
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	28
200 5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	29 __all__ = ['DocType', 'XMLSerializer', 'XHTMLSerializer', 'HTMLSerializer',
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	30 'TextSerializer']
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	31
5479aae32f5a Initial import. cmlenz parents: diff changeset	32
85 4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	33 class DocType(object):
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	34 """Defines a number of commonly used DOCTYPE declarations as constants."""
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	35
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	36 HTML_STRICT = ('html', '-//W3C//DTD HTML 4.01//EN',
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	37 'http://www.w3.org/TR/html4/strict.dtd')
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	38 HTML_TRANSITIONAL = ('html', '-//W3C//DTD HTML 4.01 Transitional//EN',
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	39 'http://www.w3.org/TR/html4/loose.dtd')
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	40 HTML = HTML_STRICT
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	41
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	42 XHTML_STRICT = ('html', '-//W3C//DTD XHTML 1.0 Strict//EN',
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	43 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	44 XHTML_TRANSITIONAL = ('html', '-//W3C//DTD XHTML 1.0 Transitional//EN',
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	45 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd')
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	46 XHTML = XHTML_STRICT
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	47
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	48
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	49 class XMLSerializer(object):
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	50 """Produces XML text from an event stream.
5479aae32f5a Initial import. cmlenz parents: diff changeset	51
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	52 >>> from genshi.builder import tag
20 cc92d74ce9e5 Fix tests broken in [20]. cmlenz parents: 19 diff changeset	53 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	54 >>> print ''.join(XMLSerializer()(elem.generate()))
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	55 <div><a href="foo"/><br/><hr noshade="True"/></div>
5479aae32f5a Initial import. cmlenz parents: diff changeset	56 """
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	57
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	58 _PRESERVE_SPACE = frozenset()
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	59
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	60 def __init__(self, doctype=None, strip_whitespace=True):
85 4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	61 """Initialize the XML serializer.
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	62
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	63 @param doctype: a `(name, pubid, sysid)` tuple that represents the
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	64 DOCTYPE declaration that should be included at the top of the
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	65 generated output
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	66 @param strip_whitespace: whether extraneous whitespace should be
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	67 stripped from the output
85 4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	68 """
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	69 self.preamble = []
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	70 if doctype:
4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	71 self.preamble.append((DOCTYPE, doctype, (None, -1, -1)))
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	72 self.filters = [EmptyTagFilter()]
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	73 if strip_whitespace:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	74 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE))
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	75
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	76 def __call__(self, stream):
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	77 ns_attrib = []
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	78 ns_mapping = {XML_NAMESPACE.uri: 'xml'}
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	79 have_doctype = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	80 in_cdata = False
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	81
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	82 stream = chain(self.preamble, stream)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	83 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	84 stream = filter_(stream)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	85 for kind, data, pos in stream:
5479aae32f5a Initial import. cmlenz parents: diff changeset	86
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	87 if kind is START or kind is EMPTY:
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	88 tag, attrib = data
5479aae32f5a Initial import. cmlenz parents: diff changeset	89
5479aae32f5a Initial import. cmlenz parents: diff changeset	90 tagname = tag.localname
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	91 namespace = tag.namespace
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	92 if namespace:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	93 if namespace in ns_mapping:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	94 prefix = ns_mapping[namespace]
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	95 if prefix:
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	96 tagname = '%s:%s' % (prefix, tagname)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	97 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	98 ns_attrib.append((QName('xmlns'), namespace))
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	99 buf = ['<', tagname]
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	100
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	101 for attr, value in attrib + ns_attrib:
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	102 attrname = attr.localname
5479aae32f5a Initial import. cmlenz parents: diff changeset	103 if attr.namespace:
26 3c1a022be04c * Split out the XPath tests into a separate `unittest`-based file. cmlenz parents: 20 diff changeset	104 prefix = ns_mapping.get(attr.namespace)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	105 if prefix:
69 c40a5dcd2b55 A couple of minor performance improvements. cmlenz parents: 66 diff changeset	106 attrname = '%s:%s' % (prefix, attrname)
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	107 buf += [' ', attrname, '="', escape(value), '"']
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	108 ns_attrib = []
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	109
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	110 if kind is EMPTY:
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	111 buf += ['/>']
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	112 else:
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	113 buf += ['>']
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	114
5479aae32f5a Initial import. cmlenz parents: diff changeset	115 yield Markup(''.join(buf))
5479aae32f5a Initial import. cmlenz parents: diff changeset	116
69 c40a5dcd2b55 A couple of minor performance improvements. cmlenz parents: 66 diff changeset	117 elif kind is END:
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	118 tag = data
5479aae32f5a Initial import. cmlenz parents: diff changeset	119 tagname = tag.localname
5479aae32f5a Initial import. cmlenz parents: diff changeset	120 if tag.namespace:
26 3c1a022be04c * Split out the XPath tests into a separate `unittest`-based file. cmlenz parents: 20 diff changeset	121 prefix = ns_mapping.get(tag.namespace)
3c1a022be04c * Split out the XPath tests into a separate `unittest`-based file. cmlenz parents: 20 diff changeset	122 if prefix:
69 c40a5dcd2b55 A couple of minor performance improvements. cmlenz parents: 66 diff changeset	123 tagname = '%s:%s' % (prefix, tag.localname)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	124 yield Markup('</%s>' % tagname)
5479aae32f5a Initial import. cmlenz parents: diff changeset	125
69 c40a5dcd2b55 A couple of minor performance improvements. cmlenz parents: 66 diff changeset	126 elif kind is TEXT:
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	127 if in_cdata:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	128 yield data
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	129 else:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	130 yield escape(data, quotes=False)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	131
89 80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output. cmlenz parents: 85 diff changeset	132 elif kind is COMMENT:
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output. cmlenz parents: 85 diff changeset	133 yield Markup('<!--%s-->' % data)
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output. cmlenz parents: 85 diff changeset	134
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	135 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	136 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	137 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	138 if pubid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	139 buf += [' PUBLIC "%s"']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	140 elif sysid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	141 buf += [' SYSTEM']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	142 if sysid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	143 buf += [' "%s"']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	144 buf += ['>\n']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	145 yield Markup(''.join(buf), *filter(None, data))
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	146 have_doctype = True
109 230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	147
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	148 elif kind is START_NS:
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	149 prefix, uri = data
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	150 if uri not in ns_mapping:
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	151 ns_mapping[uri] = prefix
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	152 if not prefix:
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	153 ns_attrib.append((QName('xmlns'), uri))
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	154 else:
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	155 ns_attrib.append((QName('xmlns:%s' % prefix), uri))
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	156
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	157 elif kind is START_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	158 yield Markup('<![CDATA[')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	159 in_cdata = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	160
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	161 elif kind is END_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	162 yield Markup(']]>')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	163 in_cdata = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	164
105 71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	165 elif kind is PI:
71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	166 yield Markup('<?%s %s?>' % data)
71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	167
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	168
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	169 class XHTMLSerializer(XMLSerializer):
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	170 """Produces XHTML text from an event stream.
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	171
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	172 >>> from genshi.builder import tag
20 cc92d74ce9e5 Fix tests broken in [20]. cmlenz parents: 19 diff changeset	173 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	174 >>> print ''.join(XHTMLSerializer()(elem.generate()))
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	175 <div><a href="foo"></a><br /><hr noshade="noshade" /></div>
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	176 """
5479aae32f5a Initial import. cmlenz parents: diff changeset	177
18 5420cfe42d36 Actually make use of the `markup.core.Namespace` class, and add a couple of doctests. cmlenz parents: 1 diff changeset	178 NAMESPACE = Namespace('http://www.w3.org/1999/xhtml')
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	179
5479aae32f5a Initial import. cmlenz parents: diff changeset	180 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame',
5479aae32f5a Initial import. cmlenz parents: diff changeset	181 'hr', 'img', 'input', 'isindex', 'link', 'meta',
5479aae32f5a Initial import. cmlenz parents: diff changeset	182 'param'])
5479aae32f5a Initial import. cmlenz parents: diff changeset	183 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare',
5479aae32f5a Initial import. cmlenz parents: diff changeset	184 'defer', 'disabled', 'ismap', 'multiple',
5479aae32f5a Initial import. cmlenz parents: diff changeset	185 'nohref', 'noresize', 'noshade', 'nowrap'])
368 94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	186 _PRESERVE_SPACE = frozenset([
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	187 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'),
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	188 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea')
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	189 ])
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	190
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	191 def __call__(self, stream):
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	192 namespace = self.NAMESPACE
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	193 ns_attrib = []
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	194 ns_mapping = {XML_NAMESPACE.uri: 'xml'}
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	195 boolean_attrs = self._BOOLEAN_ATTRS
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	196 empty_elems = self._EMPTY_ELEMS
85 4938c310d904 Improve handling of DOCTYPE declarations. cmlenz parents: 73 diff changeset	197 have_doctype = False
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	198 in_cdata = False
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	199
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	200 stream = chain(self.preamble, stream)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	201 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	202 stream = filter_(stream)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	203 for kind, data, pos in stream:
5479aae32f5a Initial import. cmlenz parents: diff changeset	204
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	205 if kind is START or kind is EMPTY:
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	206 tag, attrib = data
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	207
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	208 tagname = tag.localname
177 553866249cb0 * Minor fix for the XHTML serializer (the local namespace var got clobbered) cmlenz parents: 158 diff changeset	209 tagns = tag.namespace
553866249cb0 * Minor fix for the XHTML serializer (the local namespace var got clobbered) cmlenz parents: 158 diff changeset	210 if tagns:
553866249cb0 * Minor fix for the XHTML serializer (the local namespace var got clobbered) cmlenz parents: 158 diff changeset	211 if tagns in ns_mapping:
553866249cb0 * Minor fix for the XHTML serializer (the local namespace var got clobbered) cmlenz parents: 158 diff changeset	212 prefix = ns_mapping[tagns]
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	213 if prefix:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	214 tagname = '%s:%s' % (prefix, tagname)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	215 else:
177 553866249cb0 * Minor fix for the XHTML serializer (the local namespace var got clobbered) cmlenz parents: 158 diff changeset	216 ns_attrib.append((QName('xmlns'), tagns))
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	217 buf = ['<', tagname]
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	218
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	219 for attr, value in attrib + ns_attrib:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	220 attrname = attr.localname
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	221 if attr.namespace:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	222 prefix = ns_mapping.get(attr.namespace)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	223 if prefix:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	224 attrname = '%s:%s' % (prefix, attrname)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	225 if attrname in boolean_attrs:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	226 if value:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	227 buf += [' ', attrname, '="', attrname, '"']
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	228 else:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	229 buf += [' ', attrname, '="', escape(value), '"']
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	230 ns_attrib = []
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	231
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	232 if kind is EMPTY:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	233 if (tagns and tagns != namespace.uri) \
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	234 or tag.localname in empty_elems:
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	235 buf += [' />']
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	236 else:
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	237 buf += ['></%s>' % tagname]
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	238 else:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	239 buf += ['>']
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	240
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	241 yield Markup(''.join(buf))
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	242
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	243 elif kind is END:
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	244 tag = data
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	245 tagname = tag.localname
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	246 if tag.namespace:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	247 prefix = ns_mapping.get(tag.namespace)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	248 if prefix:
177 553866249cb0 * Minor fix for the XHTML serializer (the local namespace var got clobbered) cmlenz parents: 158 diff changeset	249 tagname = '%s:%s' % (prefix, tagname)
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	250 yield Markup('</%s>' % tagname)
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	251
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	252 elif kind is TEXT:
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	253 if in_cdata:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	254 yield data
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	255 else:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	256 yield escape(data, quotes=False)
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	257
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	258 elif kind is COMMENT:
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	259 yield Markup('<!--%s-->' % data)
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	260
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	261 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	262 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	263 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	264 if pubid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	265 buf += [' PUBLIC "%s"']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	266 elif sysid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	267 buf += [' SYSTEM']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	268 if sysid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	269 buf += [' "%s"']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	270 buf += ['>\n']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	271 yield Markup(''.join(buf), *filter(None, data))
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	272 have_doctype = True
109 230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	273
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	274 elif kind is START_NS:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	275 prefix, uri = data
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	276 if uri not in ns_mapping:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	277 ns_mapping[uri] = prefix
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	278 if not prefix:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	279 ns_attrib.append((QName('xmlns'), uri))
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	280 else:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	281 ns_attrib.append((QName('xmlns:%s' % prefix), uri))
109 230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	282
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	283 elif kind is START_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	284 yield Markup('<![CDATA[')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	285 in_cdata = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	286
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	287 elif kind is END_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	288 yield Markup(']]>')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	289 in_cdata = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	290
105 71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	291 elif kind is PI:
71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	292 yield Markup('<?%s %s?>' % data)
71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	293
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	294
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	295 class HTMLSerializer(XHTMLSerializer):
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	296 """Produces HTML text from an event stream.
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	297
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	298 >>> from genshi.builder import tag
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	299 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	300 >>> print ''.join(HTMLSerializer()(elem.generate()))
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	301 <div><a href="foo"></a><br><hr noshade></div>
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	302 """
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	303
284 65a46e008098 Ported [338], [343] and [345:349/trunk] to 0.3.x stable branch. cmlenz parents: 230 diff changeset	304 _NOESCAPE_ELEMS = frozenset([QName('script'),
65a46e008098 Ported [338], [343] and [345:349/trunk] to 0.3.x stable branch. cmlenz parents: 230 diff changeset	305 QName('http://www.w3.org/1999/xhtml}script'),
65a46e008098 Ported [338], [343] and [345:349/trunk] to 0.3.x stable branch. cmlenz parents: 230 diff changeset	306 QName('style'),
65a46e008098 Ported [338], [343] and [345:349/trunk] to 0.3.x stable branch. cmlenz parents: 230 diff changeset	307 QName('http://www.w3.org/1999/xhtml}style')])
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	308
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	309 def __init__(self, doctype=None, strip_whitespace=True):
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	310 """Initialize the HTML serializer.
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	311
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	312 @param doctype: a `(name, pubid, sysid)` tuple that represents the
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	313 DOCTYPE declaration that should be included at the top of the
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	314 generated output
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	315 @param strip_whitespace: whether extraneous whitespace should be
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	316 stripped from the output
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	317 """
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	318 super(HTMLSerializer, self).__init__(doctype, False)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	319 if strip_whitespace:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	320 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE,
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	321 self._NOESCAPE_ELEMS, True))
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	322
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	323 def __call__(self, stream):
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	324 namespace = self.NAMESPACE
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	325 ns_mapping = {}
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	326 boolean_attrs = self._BOOLEAN_ATTRS
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	327 empty_elems = self._EMPTY_ELEMS
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	328 noescape_elems = self._NOESCAPE_ELEMS
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	329 have_doctype = False
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	330 noescape = False
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	331
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	332 stream = chain(self.preamble, stream)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	333 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	334 stream = filter_(stream)
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	335 for kind, data, pos in stream:
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	336
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	337 if kind is START or kind is EMPTY:
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	338 tag, attrib = data
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	339 if not tag.namespace or tag in namespace:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	340 tagname = tag.localname
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	341 buf = ['<', tagname]
96 fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module. cmlenz parents: 89 diff changeset	342
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	343 for attr, value in attrib:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	344 attrname = attr.localname
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	345 if not attr.namespace or attr in namespace:
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	346 if attrname in boolean_attrs:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	347 if value:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	348 buf += [' ', attrname]
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	349 else:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	350 buf += [' ', attrname, '="', escape(value), '"']
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	351
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	352 buf += ['>']
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	353
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	354 if kind is EMPTY:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	355 if tagname not in empty_elems:
213 13d2d4420628 Store original message in exceptions as `msg` ivar. cmlenz parents: 212 diff changeset	356 buf += ['</%s>' % tagname]
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	357
140 c1f4390d50f8 Fix bug in HTML serializer, plus some other minor tweaks. cmlenz parents: 136 diff changeset	358 yield Markup(''.join(buf))
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	359
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	360 if tagname in noescape_elems:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	361 noescape = True
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	362
69 c40a5dcd2b55 A couple of minor performance improvements. cmlenz parents: 66 diff changeset	363 elif kind is END:
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	364 tag = data
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	365 if not tag.namespace or tag in namespace:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	366 yield Markup('</%s>' % tag.localname)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	367
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	368 noescape = False
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	369
69 c40a5dcd2b55 A couple of minor performance improvements. cmlenz parents: 66 diff changeset	370 elif kind is TEXT:
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	371 if noescape:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	372 yield data
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	373 else:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	374 yield escape(data, quotes=False)
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	375
89 80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output. cmlenz parents: 85 diff changeset	376 elif kind is COMMENT:
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output. cmlenz parents: 85 diff changeset	377 yield Markup('<!--%s-->' % data)
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output. cmlenz parents: 85 diff changeset	378
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	379 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	380 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	381 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	382 if pubid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	383 buf += [' PUBLIC "%s"']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	384 elif sysid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	385 buf += [' SYSTEM']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	386 if sysid:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	387 buf += [' "%s"']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	388 buf += ['>\n']
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	389 yield Markup(''.join(buf), *filter(None, data))
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	390 have_doctype = True
109 230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	391
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	392 elif kind is START_NS and data[1] not in ns_mapping:
b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	393 ns_mapping[data[1]] = data[0]
109 230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top. cmlenz parents: 105 diff changeset	394
105 71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	395 elif kind is PI:
71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	396 yield Markup('<?%s %s?>' % data)
71f3db26eecb Include processing instructions in serialized streams. cmlenz parents: 96 diff changeset	397
1 5479aae32f5a Initial import. cmlenz parents: diff changeset	398
200 5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	399 class TextSerializer(object):
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	400 """Produces plain text from an event stream.
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	401
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	402 Only text events are included in the output. Unlike the other serializer,
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	403 special XML characters are not escaped:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	404
230 84168828b074 Renamed Markup to Genshi in repository. cmlenz parents: 219 diff changeset	405 >>> from genshi.builder import tag
200 5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	406 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br)
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	407 >>> print elem
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	408 <div><a href="foo"><Hello!></a><br/></div>
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	409 >>> print ''.join(TextSerializer()(elem.generate()))
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	410 <Hello!>
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	411
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	412 If text events contain literal markup (instances of the `Markup` class),
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	413 tags or entities are stripped from the output:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	414
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	415 >>> elem = tag.div(Markup('<a href="foo">Hello!</a><br/>'))
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	416 >>> print elem
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	417 <div><a href="foo">Hello!</a><br/></div>
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	418 >>> print ''.join(TextSerializer()(elem.generate()))
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	419 Hello!
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	420 """
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	421
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	422 def __call__(self, stream):
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	423 for kind, data, pos in stream:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	424 if kind is TEXT:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	425 if type(data) is Markup:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	426 data = data.striptags().stripentities()
201 c5e0a1c86173 The `TextSerializer` should produce `unicode` objects, not `Markup` objects. cmlenz parents: 200 diff changeset	427 yield unicode(data)
200 5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	428
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41. cmlenz parents: 178 diff changeset	429
212 0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	430 class EmptyTagFilter(object):
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	431 """Combines `START` and `STOP` events into `EMPTY` events for elements that
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	432 have no contents.
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	433 """
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	434
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	435 EMPTY = StreamEventKind('EMPTY')
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	436
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	437 def __call__(self, stream):
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	438 prev = (None, None, None)
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	439 for kind, data, pos in stream:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	440 if prev[0] is START:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	441 if kind is END:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	442 prev = EMPTY, prev[1], prev[2]
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	443 yield prev
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	444 continue
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	445 else:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	446 yield prev
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	447 if kind is not START:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	448 yield kind, data, pos
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	449 prev = kind, data, pos
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	450
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	451
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	452 EMPTY = EmptyTagFilter.EMPTY
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	453
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator. cmlenz parents: 201 diff changeset	454
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	455 class WhitespaceFilter(object):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	456 """A filter that removes extraneous ignorable white space from the
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	457 stream."""
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	458
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	459 def __init__(self, preserve=None, noescape=None, escape_cdata=False):
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	460 """Initialize the filter.
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	461
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	462 @param preserve: a set or sequence of tag names for which white-space
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	463 should be ignored.
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	464 @param noescape: a set or sequence of tag names for which text content
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	465 should not be escaped
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	466
368 94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	467 The `noescape` set is expected to refer to elements that cannot contain
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	468 further child elements (such as <style> or <script> in HTML documents).
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	469 """
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	470 if preserve is None:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	471 preserve = []
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	472 self.preserve = frozenset(preserve)
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	473 if noescape is None:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	474 noescape = []
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	475 self.noescape = frozenset(noescape)
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	476 self.escape_cdata = escape_cdata
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	477
219 ebceef564b79 Minor improvements to `WhitespaceFilter`. cmlenz parents: 213 diff changeset	478 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'],
ebceef564b79 Minor improvements to `WhitespaceFilter`. cmlenz parents: 213 diff changeset	479 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub,
ebceef564b79 Minor improvements to `WhitespaceFilter`. cmlenz parents: 213 diff changeset	480 collapse_lines=re.compile('\n{2,}').sub):
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	481 mjoin = Markup('').join
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	482 preserve_elems = self.preserve
368 94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	483 preserve = 0
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	484 noescape_elems = self.noescape
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	485 noescape = False
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	486 escape_cdata = self.escape_cdata
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	487
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	488 textbuf = []
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	489 push_text = textbuf.append
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	490 pop_text = textbuf.pop
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	491 for kind, data, pos in chain(stream, [(None, None, None)]):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	492 if kind is TEXT:
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	493 if noescape:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	494 data = Markup(data)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	495 push_text(data)
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	496 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	497 if textbuf:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	498 if len(textbuf) > 1:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	499 text = mjoin(textbuf, escape_quotes=False)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	500 del textbuf[:]
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	501 else:
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	502 text = escape(pop_text(), quotes=False)
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	503 if not preserve:
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	504 text = collapse_lines('\n', trim_trailing_space('', text))
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	505 yield TEXT, Markup(text), pos
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	506
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	507 if kind is START:
368 94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	508 tag, attrs = data
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	509 if preserve or (tag in preserve_elems or
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	510 attrs.get(space) == 'preserve'):
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	511 preserve += 1
219 ebceef564b79 Minor improvements to `WhitespaceFilter`. cmlenz parents: 213 diff changeset	512 if not noescape and tag in noescape_elems:
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	513 noescape = True
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	514
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	515 elif kind is END:
368 94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	516 noescape = False
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	517 if preserve:
94ff33bfe515 Ported [425] to 0.3.x. cmlenz parents: 284 diff changeset	518 preserve -= 1
141 520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24) cmlenz parents: 140 diff changeset	519
143 3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	520 elif kind is START_CDATA and not escape_cdata:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	521 noescape = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	522
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	523 elif kind is END_CDATA and not escape_cdata:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	524 noescape = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24. cmlenz parents: 141 diff changeset	525
136 b86f496f6035 Minor performance improvements in serialization. cmlenz parents: 123 diff changeset	526 if kind:
123 10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved. cmlenz parents: 109 diff changeset	527 yield kind, data, pos

Mercurial > genshi > mirror

annotate genshi/output.py @ 368:94ff33bfe515 stable-0.3.x