annotate genshi/output.py @ 831:7a422be6f6a6 trunk

Follow-up fix for [1038].
author cmlenz
date Fri, 13 Mar 2009 21:05:19 +0000
parents 6e46513e1c5c
children 07f4339fecb0
rev   line source
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
2 #
719
4bc6741b2811 Fix copyright years.
cmlenz
parents: 713
diff changeset
3 # Copyright (C) 2006-2008 Edgewall Software
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
5 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
8 # are also available at http://genshi.edgewall.org/wiki/License.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
9 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
12 # history and logs, available at http://genshi.edgewall.org/log/.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
13
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
14 """This module provides different kinds of serialization methods for XML event
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
15 streams.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
16 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
17
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
18 from itertools import chain
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
19 import re
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
20
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
21 from genshi.core import escape, Attrs, Markup, Namespace, QName, StreamEventKind
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
22 from genshi.core import START, END, TEXT, XML_DECL, DOCTYPE, START_NS, END_NS, \
402
c199e9b95884 Fix output of namespace declarations for namespace URLs appearing more than once in a stream. Thanks to Jeff Cutsinger for reporting the problem.
cmlenz
parents: 397
diff changeset
23 START_CDATA, END_CDATA, PI, COMMENT, XML_NAMESPACE
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
24
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
25 __all__ = ['encode', 'get_serializer', 'DocType', 'XMLSerializer',
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
26 'XHTMLSerializer', 'HTMLSerializer', 'TextSerializer']
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
27 __docformat__ = 'restructuredtext en'
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
28
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
29 def encode(iterator, method='xml', encoding='utf-8', out=None):
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
30 """Encode serializer output into a string.
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
31
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
32 :param iterator: the iterator returned from serializing a stream (basically
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
33 any iterator that yields unicode objects)
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
34 :param method: the serialization method; determines how characters not
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
35 representable in the specified encoding are treated
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
36 :param encoding: how the output string should be encoded; if set to `None`,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
37 this method returns a `unicode` object
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
38 :param out: a file-like object that the output should be written to
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
39 instead of being returned as one big string; note that if
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
40 this is a file or socket (or similar), the `encoding` must
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
41 not be `None` (that is, the output must be encoded)
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
42 :return: a `str` or `unicode` object (depending on the `encoding`
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
43 parameter), or `None` if the `out` parameter is provided
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
44
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
45 :since: version 0.4.1
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
46 :note: Changed in 0.5: added the `out` parameter
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
47 """
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
48 if encoding is not None:
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
49 errors = 'replace'
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
50 if method != 'text' and not isinstance(method, TextSerializer):
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
51 errors = 'xmlcharrefreplace'
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
52 _encode = lambda string: string.encode(encoding, errors)
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
53 else:
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
54 _encode = lambda string: string
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
55 if out is None:
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
56 return _encode(u''.join(list(iterator)))
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
57 for chunk in iterator:
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
58 out.write(_encode(chunk))
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
59
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
60 def get_serializer(method='xml', **kwargs):
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
61 """Return a serializer object for the given method.
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
62
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
63 :param method: the serialization method; can be either "xml", "xhtml",
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
64 "html", "text", or a custom serializer class
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
65
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
66 Any additional keyword arguments are passed to the serializer, and thus
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
67 depend on the `method` parameter value.
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
68
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
69 :see: `XMLSerializer`, `XHTMLSerializer`, `HTMLSerializer`, `TextSerializer`
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
70 :since: version 0.4.1
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
71 """
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
72 if isinstance(method, basestring):
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
73 method = {'xml': XMLSerializer,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
74 'xhtml': XHTMLSerializer,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
75 'html': HTMLSerializer,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
76 'text': TextSerializer}[method.lower()]
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
77 return method(**kwargs)
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
78
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
79
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
80 class DocType(object):
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
81 """Defines a number of commonly used DOCTYPE declarations as constants."""
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
82
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
83 HTML_STRICT = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
84 'html', '-//W3C//DTD HTML 4.01//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
85 'http://www.w3.org/TR/html4/strict.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
86 )
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
87 HTML_TRANSITIONAL = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
88 'html', '-//W3C//DTD HTML 4.01 Transitional//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
89 'http://www.w3.org/TR/html4/loose.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
90 )
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
91 HTML_FRAMESET = (
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
92 'html', '-//W3C//DTD HTML 4.01 Frameset//EN',
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
93 'http://www.w3.org/TR/html4/frameset.dtd'
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
94 )
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
95 HTML = HTML_STRICT
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
96
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
97 HTML5 = ('html', None, None)
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
98
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
99 XHTML_STRICT = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
100 'html', '-//W3C//DTD XHTML 1.0 Strict//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
101 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
102 )
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
103 XHTML_TRANSITIONAL = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
104 'html', '-//W3C//DTD XHTML 1.0 Transitional//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
105 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
106 )
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
107 XHTML_FRAMESET = (
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
108 'html', '-//W3C//DTD XHTML 1.0 Frameset//EN',
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
109 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd'
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
110 )
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
111 XHTML = XHTML_STRICT
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
112
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
113 XHTML11 = (
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
114 'html', '-//W3C//DTD XHTML 1.1//EN',
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
115 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
116 )
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
117
663
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
118 SVG_FULL = (
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
119 'svg', '-//W3C//DTD SVG 1.1//EN',
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
120 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd'
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
121 )
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
122 SVG_BASIC = (
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
123 'svg', '-//W3C//DTD SVG Basic 1.1//EN',
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
124 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd'
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
125 )
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
126 SVG_TINY = (
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
127 'svg', '-//W3C//DTD SVG Tiny 1.1//EN',
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
128 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd'
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
129 )
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
130 SVG = SVG_FULL
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
131
822
70fddd2262f5 Get rid of some Python 2.3 legacy that's no longer needed now that 2.4 is the baseline.
cmlenz
parents: 750
diff changeset
132 @classmethod
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
133 def get(cls, name):
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
134 """Return the ``(name, pubid, sysid)`` tuple of the ``DOCTYPE``
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
135 declaration for the specified name.
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
136
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
137 The following names are recognized in this version:
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
138 * "html" or "html-strict" for the HTML 4.01 strict DTD
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
139 * "html-transitional" for the HTML 4.01 transitional DTD
745
74b5c5476ddb Preparing for [milestone:0.5] release.
cmlenz
parents: 740
diff changeset
140 * "html-frameset" for the HTML 4.01 frameset DTD
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
141 * "html5" for the ``DOCTYPE`` proposed for HTML5
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
142 * "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
143 * "xhtml-transitional" for the XHTML 1.0 transitional DTD
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
144 * "xhtml-frameset" for the XHTML 1.0 frameset DTD
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
145 * "xhtml11" for the XHTML 1.1 DTD
663
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
146 * "svg" or "svg-full" for the SVG 1.1 DTD
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
147 * "svg-basic" for the SVG Basic 1.1 DTD
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
148 * "svg-tiny" for the SVG Tiny 1.1 DTD
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
149
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
150 :param name: the name of the ``DOCTYPE``
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
151 :return: the ``(name, pubid, sysid)`` tuple for the requested
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
152 ``DOCTYPE``, or ``None`` if the name is not recognized
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
153 :since: version 0.4.1
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
154 """
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
155 return {
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
156 'html': cls.HTML, 'html-strict': cls.HTML_STRICT,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
157 'html-transitional': DocType.HTML_TRANSITIONAL,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
158 'html-frameset': DocType.HTML_FRAMESET,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
159 'html5': cls.HTML5,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
160 'xhtml': cls.XHTML, 'xhtml-strict': cls.XHTML_STRICT,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
161 'xhtml-transitional': cls.XHTML_TRANSITIONAL,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
162 'xhtml-frameset': cls.XHTML_FRAMESET,
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
163 'xhtml11': cls.XHTML11,
663
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
164 'svg': cls.SVG, 'svg-full': cls.SVG_FULL,
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
165 'svg-basic': cls.SVG_BASIC,
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
166 'svg-tiny': cls.SVG_TINY
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
167 }.get(name.lower())
448
1154f2aadb6c Add support for HTML5 doctype.
cmlenz
parents: 437
diff changeset
168
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
169
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
170 class XMLSerializer(object):
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
171 """Produces XML text from an event stream.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
172
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
173 >>> from genshi.builder import tag
20
cc92d74ce9e5 Fix tests broken in [20].
cmlenz
parents: 19
diff changeset
174 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
175 >>> print ''.join(XMLSerializer()(elem.generate()))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
176 <div><a href="foo"/><br/><hr noshade="True"/></div>
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
177 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
178
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
179 _PRESERVE_SPACE = frozenset()
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
180
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
181 def __init__(self, doctype=None, strip_whitespace=True,
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
182 namespace_prefixes=None, cache=True):
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
183 """Initialize the XML serializer.
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
184
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
185 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
186 DOCTYPE declaration that should be included at the top
494
942d73ba938c The `doctype` parameter for serializers can now be a string.
cmlenz
parents: 464
diff changeset
187 of the generated output, or the name of a DOCTYPE as
942d73ba938c The `doctype` parameter for serializers can now be a string.
cmlenz
parents: 464
diff changeset
188 defined in `DocType.get`
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
189 :param strip_whitespace: whether extraneous whitespace should be
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
190 stripped from the output
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
191 :param cache: whether to cache the text output per event, which
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
192 improves performance for repetitive markup
494
942d73ba938c The `doctype` parameter for serializers can now be a string.
cmlenz
parents: 464
diff changeset
193 :note: Changed in 0.4.2: The `doctype` parameter can now be a string.
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
194 :note: Changed in 0.6: The `cache` parameter was added
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
195 """
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
196 self.filters = [EmptyTagFilter()]
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
197 if strip_whitespace:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
198 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
199 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes,
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
200 cache=cache))
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
201 if doctype:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
202 self.filters.append(DocTypeInserter(doctype))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
203 self.cache = cache
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
204
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
205 def __call__(self, stream):
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
206 have_decl = have_doctype = False
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
207 in_cdata = False
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
208
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
209 cache = {}
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
210 cache_get = cache.get
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
211 if self.cache:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
212 def _emit(kind, input, output):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
213 cache[kind, input] = output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
214 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
215 else:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
216 def _emit(kind, input, output):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
217 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
218
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
219 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
220 stream = filter_(stream)
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
221 for kind, data, pos in stream:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
222 cached = cache_get((kind, data))
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
223 if cached is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
224 yield cached
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
225
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
226 elif kind is START or kind is EMPTY:
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
227 tag, attrib = data
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
228 buf = ['<', tag]
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
229 for attr, value in attrib:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
230 buf += [' ', attr, '="', escape(value), '"']
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
231 buf.append(kind is EMPTY and '/>' or '>')
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
232 yield _emit(kind, data, Markup(u''.join(buf)))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
233
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
234 elif kind is END:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
235 yield _emit(kind, data, Markup('</%s>' % data))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
236
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
237 elif kind is TEXT:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
238 if in_cdata:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
239 yield _emit(kind, data, data)
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
240 else:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
241 yield _emit(kind, data, escape(data, quotes=False))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
242
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
243 elif kind is COMMENT:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
244 yield _emit(kind, data, Markup('<!--%s-->' % data))
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
245
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
246 elif kind is XML_DECL and not have_decl:
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
247 version, encoding, standalone = data
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
248 buf = ['<?xml version="%s"' % version]
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
249 if encoding:
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
250 buf.append(' encoding="%s"' % encoding)
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
251 if standalone != -1:
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
252 standalone = standalone and 'yes' or 'no'
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
253 buf.append(' standalone="%s"' % standalone)
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
254 buf.append('?>\n')
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
255 yield Markup(u''.join(buf))
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
256 have_decl = True
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
257
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
258 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
259 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
260 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
261 if pubid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
262 buf.append(' PUBLIC "%s"')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
263 elif sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
264 buf.append(' SYSTEM')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
265 if sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
266 buf.append(' "%s"')
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
267 buf.append('>\n')
713
5420fe9d99a9 The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents: 689
diff changeset
268 yield Markup(u''.join(buf)) % filter(None, data)
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
269 have_doctype = True
109
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
270
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
271 elif kind is START_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
272 yield Markup('<![CDATA[')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
273 in_cdata = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
274
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
275 elif kind is END_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
276 yield Markup(']]>')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
277 in_cdata = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
278
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
279 elif kind is PI:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
280 yield _emit(kind, data, Markup('<?%s %s?>' % data))
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
281
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
282
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
283 class XHTMLSerializer(XMLSerializer):
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
284 """Produces XHTML text from an event stream.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
285
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
286 >>> from genshi.builder import tag
20
cc92d74ce9e5 Fix tests broken in [20].
cmlenz
parents: 19
diff changeset
287 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
288 >>> print ''.join(XHTMLSerializer()(elem.generate()))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
289 <div><a href="foo"></a><br /><hr noshade="noshade" /></div>
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
290 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
291
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
292 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
293 'hr', 'img', 'input', 'isindex', 'link', 'meta',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
294 'param'])
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
295 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
296 'defer', 'disabled', 'ismap', 'multiple',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
297 'nohref', 'noresize', 'noshade', 'nowrap'])
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
298 _PRESERVE_SPACE = frozenset([
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
299 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'),
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
300 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea')
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
301 ])
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
302
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
303 def __init__(self, doctype=None, strip_whitespace=True,
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
304 namespace_prefixes=None, drop_xml_decl=True, cache=True):
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
305 super(XHTMLSerializer, self).__init__(doctype, False)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
306 self.filters = [EmptyTagFilter()]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
307 if strip_whitespace:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
308 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE))
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
309 namespace_prefixes = namespace_prefixes or {}
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
310 namespace_prefixes['http://www.w3.org/1999/xhtml'] = ''
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
311 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes,
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
312 cache=cache))
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
313 if doctype:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
314 self.filters.append(DocTypeInserter(doctype))
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
315 self.drop_xml_decl = drop_xml_decl
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
316 self.cache = cache
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
317
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
318 def __call__(self, stream):
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
319 boolean_attrs = self._BOOLEAN_ATTRS
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
320 empty_elems = self._EMPTY_ELEMS
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
321 drop_xml_decl = self.drop_xml_decl
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
322 have_decl = have_doctype = False
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
323 in_cdata = False
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
324
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
325 cache = {}
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
326 cache_get = cache.get
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
327 if self.cache:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
328 def _emit(kind, input, output):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
329 cache[kind, input] = output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
330 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
331 else:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
332 def _emit(kind, input, output):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
333 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
334
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
335 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
336 stream = filter_(stream)
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
337 for kind, data, pos in stream:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
338 cached = cache_get((kind, data))
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
339 if cached is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
340 yield cached
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
341
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
342 elif kind is START or kind is EMPTY:
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
343 tag, attrib = data
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
344 buf = ['<', tag]
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
345 for attr, value in attrib:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
346 if attr in boolean_attrs:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
347 value = attr
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
348 elif attr == u'xml:lang' and u'lang' not in attrib:
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
349 buf += [' lang="', escape(value), '"']
689
3881a602048a The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents: 688
diff changeset
350 elif attr == u'xml:space':
3881a602048a The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents: 688
diff changeset
351 continue
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
352 buf += [' ', attr, '="', escape(value), '"']
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
353 if kind is EMPTY:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
354 if tag in empty_elems:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
355 buf.append(' />')
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
356 else:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
357 buf.append('></%s>' % tag)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
358 else:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
359 buf.append('>')
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
360 yield _emit(kind, data, Markup(u''.join(buf)))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
361
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
362 elif kind is END:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
363 yield _emit(kind, data, Markup('</%s>' % data))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
364
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
365 elif kind is TEXT:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
366 if in_cdata:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
367 yield _emit(kind, data, data)
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
368 else:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
369 yield _emit(kind, data, escape(data, quotes=False))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
370
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
371 elif kind is COMMENT:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
372 yield _emit(kind, data, Markup('<!--%s-->' % data))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
373
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
374 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
375 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
376 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
377 if pubid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
378 buf.append(' PUBLIC "%s"')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
379 elif sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
380 buf.append(' SYSTEM')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
381 if sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
382 buf.append(' "%s"')
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
383 buf.append('>\n')
713
5420fe9d99a9 The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents: 689
diff changeset
384 yield Markup(u''.join(buf)) % filter(None, data)
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
385 have_doctype = True
109
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
386
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
387 elif kind is XML_DECL and not have_decl and not drop_xml_decl:
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
388 version, encoding, standalone = data
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
389 buf = ['<?xml version="%s"' % version]
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
390 if encoding:
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
391 buf.append(' encoding="%s"' % encoding)
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
392 if standalone != -1:
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
393 standalone = standalone and 'yes' or 'no'
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
394 buf.append(' standalone="%s"' % standalone)
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
395 buf.append('?>\n')
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
396 yield Markup(u''.join(buf))
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
397 have_decl = True
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
398
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
399 elif kind is START_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
400 yield Markup('<![CDATA[')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
401 in_cdata = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
402
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
403 elif kind is END_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
404 yield Markup(']]>')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
405 in_cdata = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
406
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
407 elif kind is PI:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
408 yield _emit(kind, data, Markup('<?%s %s?>' % data))
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
409
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
410
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
411 class HTMLSerializer(XHTMLSerializer):
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
412 """Produces HTML text from an event stream.
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
413
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
414 >>> from genshi.builder import tag
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
415 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
416 >>> print ''.join(HTMLSerializer()(elem.generate()))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
417 <div><a href="foo"></a><br><hr noshade></div>
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
418 """
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
419
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
420 _NOESCAPE_ELEMS = frozenset([
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
421 QName('script'), QName('http://www.w3.org/1999/xhtml}script'),
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
422 QName('style'), QName('http://www.w3.org/1999/xhtml}style')
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
423 ])
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
424
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
425 def __init__(self, doctype=None, strip_whitespace=True, cache=True):
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
426 """Initialize the HTML serializer.
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
427
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
428 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
429 DOCTYPE declaration that should be included at the top
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
430 of the generated output
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
431 :param strip_whitespace: whether extraneous whitespace should be
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
432 stripped from the output
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
433 :param cache: whether to cache the text output per event, which
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
434 improves performance for repetitive markup
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
435 :note: Changed in 0.6: The `cache` parameter was added
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
436 """
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
437 super(HTMLSerializer, self).__init__(doctype, False)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
438 self.filters = [EmptyTagFilter()]
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
439 if strip_whitespace:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
440 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE,
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
441 self._NOESCAPE_ELEMS))
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
442 self.filters.append(NamespaceFlattener(prefixes={
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
443 'http://www.w3.org/1999/xhtml': ''
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
444 }, cache=cache))
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
445 if doctype:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
446 self.filters.append(DocTypeInserter(doctype))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
447 self.cache = True
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
448
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
449 def __call__(self, stream):
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
450 boolean_attrs = self._BOOLEAN_ATTRS
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
451 empty_elems = self._EMPTY_ELEMS
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
452 noescape_elems = self._NOESCAPE_ELEMS
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
453 have_doctype = False
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
454 noescape = False
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
455
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
456 cache = {}
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
457 cache_get = cache.get
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
458 if self.cache:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
459 def _emit(kind, input, output):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
460 cache[kind, input] = output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
461 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
462 else:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
463 def _emit(kind, input, output):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
464 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
465
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
466 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
467 stream = filter_(stream)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
468 for kind, data, _ in stream:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
469 output = cache_get((kind, data))
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
470 if output is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
471 yield output
831
7a422be6f6a6 Follow-up fix for [1038].
cmlenz
parents: 829
diff changeset
472 if (kind is START or kind is EMPTY) \
7a422be6f6a6 Follow-up fix for [1038].
cmlenz
parents: 829
diff changeset
473 and data[0] in noescape_elems:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
474 noescape = True
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
475 elif kind is END:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
476 noescape = False
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
477
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
478 elif kind is START or kind is EMPTY:
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
479 tag, attrib = data
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
480 buf = ['<', tag]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
481 for attr, value in attrib:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
482 if attr in boolean_attrs:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
483 if value:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
484 buf += [' ', attr]
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
485 elif ':' in attr:
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
486 if attr == 'xml:lang' and u'lang' not in attrib:
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
487 buf += [' lang="', escape(value), '"']
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
488 elif attr != 'xmlns':
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
489 buf += [' ', attr, '="', escape(value), '"']
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
490 buf.append('>')
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
491 if kind is EMPTY:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
492 if tag not in empty_elems:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
493 buf.append('</%s>' % tag)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
494 yield _emit(kind, data, Markup(u''.join(buf)))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
495 if tag in noescape_elems:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
496 noescape = True
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
497
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
498 elif kind is END:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
499 yield _emit(kind, data, Markup('</%s>' % data))
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
500 noescape = False
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
501
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
502 elif kind is TEXT:
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
503 if noescape:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
504 yield _emit(kind, data, data)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
505 else:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
506 yield _emit(kind, data, escape(data, quotes=False))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
507
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
508 elif kind is COMMENT:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
509 yield _emit(kind, data, Markup('<!--%s-->' % data))
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
510
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
511 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
512 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
513 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
514 if pubid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
515 buf.append(' PUBLIC "%s"')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
516 elif sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
517 buf.append(' SYSTEM')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
518 if sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
519 buf.append(' "%s"')
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
520 buf.append('>\n')
713
5420fe9d99a9 The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents: 689
diff changeset
521 yield Markup(u''.join(buf)) % filter(None, data)
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
522 have_doctype = True
109
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
523
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
524 elif kind is PI:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
525 yield _emit(kind, data, Markup('<?%s %s?>' % data))
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
526
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
527
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
528 class TextSerializer(object):
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
529 """Produces plain text from an event stream.
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
530
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
531 Only text events are included in the output. Unlike the other serializer,
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
532 special XML characters are not escaped:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
533
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
534 >>> from genshi.builder import tag
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
535 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br)
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
536 >>> print elem
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
537 <div><a href="foo">&lt;Hello!&gt;</a><br/></div>
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
538 >>> print ''.join(TextSerializer()(elem.generate()))
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
539 <Hello!>
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
540
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
541 If text events contain literal markup (instances of the `Markup` class),
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
542 that markup is by default passed through unchanged:
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
543
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
544 >>> elem = tag.div(Markup('<a href="foo">Hello &amp; Bye!</a><br/>'))
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
545 >>> print elem.generate().render(TextSerializer)
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
546 <a href="foo">Hello &amp; Bye!</a><br/>
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
547
740
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
548 You can use the ``strip_markup`` to change this behavior, so that tags and
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
549 entities are stripped from the output (or in the case of entities,
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
550 replaced with the equivalent character):
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
551
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
552 >>> print elem.generate().render(TextSerializer, strip_markup=True)
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
553 Hello & Bye!
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
554 """
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
555
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
556 def __init__(self, strip_markup=False):
740
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
557 """Create the serializer.
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
558
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
559 :param strip_markup: whether markup (tags and encoded characters) found
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
560 in the text should be removed
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
561 """
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
562 self.strip_markup = strip_markup
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
563
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
564 def __call__(self, stream):
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
565 strip_markup = self.strip_markup
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
566 for event in stream:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
567 if event[0] is TEXT:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
568 data = event[1]
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
569 if strip_markup and type(data) is Markup:
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
570 data = data.striptags().stripentities()
201
c5e0a1c86173 The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents: 200
diff changeset
571 yield unicode(data)
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
572
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
573
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
574 class EmptyTagFilter(object):
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
575 """Combines `START` and `STOP` events into `EMPTY` events for elements that
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
576 have no contents.
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
577 """
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
578
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
579 EMPTY = StreamEventKind('EMPTY')
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
580
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
581 def __call__(self, stream):
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
582 prev = (None, None, None)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
583 for ev in stream:
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
584 if prev[0] is START:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
585 if ev[0] is END:
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
586 prev = EMPTY, prev[1], prev[2]
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
587 yield prev
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
588 continue
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
589 else:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
590 yield prev
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
591 if ev[0] is not START:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
592 yield ev
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
593 prev = ev
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
594
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
595
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
596 EMPTY = EmptyTagFilter.EMPTY
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
597
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
598
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
599 class NamespaceFlattener(object):
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
600 r"""Output stream filter that removes namespace information from the stream,
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
601 instead adding namespace attributes and prefixes as needed.
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
602
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
603 :param prefixes: optional mapping of namespace URIs to prefixes
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
604
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
605 >>> from genshi.input import XML
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
606 >>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2">
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
607 ... <two:item/>
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
608 ... </doc>''')
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
609 >>> for kind, data, pos in NamespaceFlattener()(xml):
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
610 ... print kind, repr(data)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
611 START (u'doc', Attrs([(u'xmlns', u'NS1'), (u'xmlns:two', u'NS2')]))
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
612 TEXT u'\n '
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
613 START (u'two:item', Attrs())
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
614 END u'two:item'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
615 TEXT u'\n'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
616 END u'doc'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
617 """
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
618
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
619 def __init__(self, prefixes=None, cache=True):
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
620 self.prefixes = {XML_NAMESPACE.uri: 'xml'}
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
621 if prefixes is not None:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
622 self.prefixes.update(prefixes)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
623 self.cache = cache
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
624
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
625 def __call__(self, stream):
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
626 cache = {}
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
627 cache_get = cache.get
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
628 if self.cache:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
629 def _emit(kind, input, output, pos):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
630 cache[kind, input] = output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
631 return kind, output, pos
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
632 else:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
633 def _emit(kind, input, output, pos):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
634 return output
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
635
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
636 prefixes = dict([(v, [k]) for k, v in self.prefixes.items()])
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
637 namespaces = {XML_NAMESPACE.uri: ['xml']}
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
638 def _push_ns(prefix, uri):
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
639 namespaces.setdefault(uri, []).append(prefix)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
640 prefixes.setdefault(prefix, []).append(uri)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
641 cache.clear()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
642 def _pop_ns(prefix):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
643 uris = prefixes.get(prefix)
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
644 uri = uris.pop()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
645 if not uris:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
646 del prefixes[prefix]
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
647 if uri not in uris or uri != uris[-1]:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
648 uri_prefixes = namespaces[uri]
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
649 uri_prefixes.pop()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
650 if not uri_prefixes:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
651 del namespaces[uri]
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
652 cache.clear()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
653 return uri
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
654
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
655 ns_attrs = []
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
656 _push_ns_attr = ns_attrs.append
437
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
657 def _make_ns_attr(prefix, uri):
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
658 return u'xmlns%s' % (prefix and ':%s' % prefix or ''), uri
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
659
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
660 def _gen_prefix():
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
661 val = 0
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
662 while 1:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
663 val += 1
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
664 yield 'ns%d' % val
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
665 _gen_prefix = _gen_prefix().next
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
666
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
667 for kind, data, pos in stream:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
668 output = cache_get((kind, data))
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
669 if output is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
670 yield kind, output, pos
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
671
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
672 elif kind is START or kind is EMPTY:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
673 tag, attrs = data
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
674
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
675 tagname = tag.localname
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
676 tagns = tag.namespace
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
677 if tagns:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
678 if tagns in namespaces:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
679 prefix = namespaces[tagns][-1]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
680 if prefix:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
681 tagname = u'%s:%s' % (prefix, tagname)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
682 else:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
683 _push_ns_attr((u'xmlns', tagns))
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
684 _push_ns('', tagns)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
685
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
686 new_attrs = []
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
687 for attr, value in attrs:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
688 attrname = attr.localname
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
689 attrns = attr.namespace
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
690 if attrns:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
691 if attrns not in namespaces:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
692 prefix = _gen_prefix()
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
693 _push_ns(prefix, attrns)
412
bd51adc20a67 Actually write xmlns declaratons for generated attribute namespace prefixes.
cmlenz
parents: 410
diff changeset
694 _push_ns_attr(('xmlns:%s' % prefix, attrns))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
695 else:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
696 prefix = namespaces[attrns][-1]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
697 if prefix:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
698 attrname = u'%s:%s' % (prefix, attrname)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
699 new_attrs.append((attrname, value))
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
700
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
701 yield _emit(kind, data, (tagname, Attrs(ns_attrs + new_attrs)), pos)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
702 del ns_attrs[:]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
703
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
704 elif kind is END:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
705 tagname = data.localname
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
706 tagns = data.namespace
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
707 if tagns:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
708 prefix = namespaces[tagns][-1]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
709 if prefix:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
710 tagname = u'%s:%s' % (prefix, tagname)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
711 yield _emit(kind, data, tagname, pos)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
712
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
713 elif kind is START_NS:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
714 prefix, uri = data
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
715 if uri not in namespaces:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
716 prefix = prefixes.get(uri, [prefix])[-1]
437
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
717 _push_ns_attr(_make_ns_attr(prefix, uri))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
718 _push_ns(prefix, uri)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
719
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
720 elif kind is END_NS:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
721 if data in prefixes:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
722 uri = _pop_ns(data)
437
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
723 if ns_attrs:
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
724 attr = _make_ns_attr(data, uri)
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
725 if attr in ns_attrs:
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
726 ns_attrs.remove(attr)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
727
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
728 else:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
729 yield kind, data, pos
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
730
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
731
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
732 class WhitespaceFilter(object):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
733 """A filter that removes extraneous ignorable white space from the
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
734 stream.
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
735 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
736
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
737 def __init__(self, preserve=None, noescape=None):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
738 """Initialize the filter.
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
739
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
740 :param preserve: a set or sequence of tag names for which white-space
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
741 should be preserved
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
742 :param noescape: a set or sequence of tag names for which text content
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
743 should not be escaped
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
744
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
745 The `noescape` set is expected to refer to elements that cannot contain
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
746 further child elements (such as ``<style>`` or ``<script>`` in HTML
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
747 documents).
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
748 """
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
749 if preserve is None:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
750 preserve = []
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
751 self.preserve = frozenset(preserve)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
752 if noescape is None:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
753 noescape = []
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
754 self.noescape = frozenset(noescape)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
755
219
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
756 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'],
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
757 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub,
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
758 collapse_lines=re.compile('\n{2,}').sub):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
759 mjoin = Markup('').join
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
760 preserve_elems = self.preserve
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
761 preserve = 0
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
762 noescape_elems = self.noescape
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
763 noescape = False
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
764
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
765 textbuf = []
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
766 push_text = textbuf.append
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
767 pop_text = textbuf.pop
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
768 for kind, data, pos in chain(stream, [(None, None, None)]):
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
769
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
770 if kind is TEXT:
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
771 if noescape:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
772 data = Markup(data)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
773 push_text(data)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
774 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
775 if textbuf:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
776 if len(textbuf) > 1:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
777 text = mjoin(textbuf, escape_quotes=False)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
778 del textbuf[:]
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
779 else:
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
780 text = escape(pop_text(), quotes=False)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
781 if not preserve:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
782 text = collapse_lines('\n', trim_trailing_space('', text))
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
783 yield TEXT, Markup(text), pos
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
784
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
785 if kind is START:
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
786 tag, attrs = data
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
787 if preserve or (tag in preserve_elems or
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
788 attrs.get(space) == 'preserve'):
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
789 preserve += 1
219
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
790 if not noescape and tag in noescape_elems:
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
791 noescape = True
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
792
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
793 elif kind is END:
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
794 noescape = False
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
795 if preserve:
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
796 preserve -= 1
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
797
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
798 elif kind is START_CDATA:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
799 noescape = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
800
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
801 elif kind is END_CDATA:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
802 noescape = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
803
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
804 if kind:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
805 yield kind, data, pos
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
806
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
807
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
808 class DocTypeInserter(object):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
809 """A filter that inserts the DOCTYPE declaration in the correct location,
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
810 after the XML declaration.
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
811 """
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
812 def __init__(self, doctype):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
813 """Initialize the filter.
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
814
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
815 :param doctype: DOCTYPE as a string or DocType object.
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
816 """
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
817 if isinstance(doctype, basestring):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
818 doctype = DocType.get(doctype)
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
819 self.doctype_event = (DOCTYPE, doctype, (None, -1, -1))
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
820
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
821 def __call__(self, stream):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
822 doctype_inserted = False
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
823 for kind, data, pos in stream:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
824 if not doctype_inserted:
672
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
825 doctype_inserted = True
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
826 if kind is XML_DECL:
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
827 yield (kind, data, pos)
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
828 yield self.doctype_event
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
829 continue
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
830 yield self.doctype_event
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
831
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
832 yield (kind, data, pos)
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
833
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
834 if not doctype_inserted:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
835 yield self.doctype_event
Copyright (C) 2012-2017 Edgewall Software