Mercurial > genshi > mirror
annotate genshi/output.py @ 750:52219748e5c1 trunk
Remove some cruft for supporting Python 2.3.
author | cmlenz |
---|---|
date | Mon, 09 Jun 2008 15:19:59 +0000 |
parents | 74b5c5476ddb |
children | 70fddd2262f5 |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
719 | 3 # Copyright (C) 2006-2008 Edgewall Software |
1 | 4 # All rights reserved. |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
230 | 8 # are also available at http://genshi.edgewall.org/wiki/License. |
1 | 9 # |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
230 | 12 # history and logs, available at http://genshi.edgewall.org/log/. |
1 | 13 |
14 """This module provides different kinds of serialization methods for XML event | |
15 streams. | |
16 """ | |
17 | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
18 from itertools import chain |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
19 import re |
1 | 20 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
21 from genshi.core import escape, Attrs, Markup, Namespace, QName, StreamEventKind |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
22 from genshi.core import START, END, TEXT, XML_DECL, DOCTYPE, START_NS, END_NS, \ |
402
c199e9b95884
Fix output of namespace declarations for namespace URLs appearing more than once in a stream. Thanks to Jeff Cutsinger for reporting the problem.
cmlenz
parents:
397
diff
changeset
|
23 START_CDATA, END_CDATA, PI, COMMENT, XML_NAMESPACE |
1 | 24 |
462 | 25 __all__ = ['encode', 'get_serializer', 'DocType', 'XMLSerializer', |
26 'XHTMLSerializer', 'HTMLSerializer', 'TextSerializer'] | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
27 __docformat__ = 'restructuredtext en' |
1 | 28 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
29 def encode(iterator, method='xml', encoding='utf-8', out=None): |
462 | 30 """Encode serializer output into a string. |
31 | |
32 :param iterator: the iterator returned from serializing a stream (basically | |
33 any iterator that yields unicode objects) | |
34 :param method: the serialization method; determines how characters not | |
35 representable in the specified encoding are treated | |
36 :param encoding: how the output string should be encoded; if set to `None`, | |
37 this method returns a `unicode` object | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
38 :param out: a file-like object that the output should be written to |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
39 instead of being returned as one big string; note that if |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
40 this is a file or socket (or similar), the `encoding` must |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
41 not be `None` (that is, the output must be encoded) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
42 :return: a `str` or `unicode` object (depending on the `encoding` |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
43 parameter), or `None` if the `out` parameter is provided |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
44 |
462 | 45 :since: version 0.4.1 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
46 :note: Changed in 0.5: added the `out` parameter |
462 | 47 """ |
48 if encoding is not None: | |
49 errors = 'replace' | |
50 if method != 'text' and not isinstance(method, TextSerializer): | |
51 errors = 'xmlcharrefreplace' | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
52 _encode = lambda string: string.encode(encoding, errors) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
53 else: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
54 _encode = lambda string: string |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
55 if out is None: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
56 return _encode(u''.join(list(iterator))) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
57 for chunk in iterator: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
58 out.write(_encode(chunk)) |
462 | 59 |
60 def get_serializer(method='xml', **kwargs): | |
61 """Return a serializer object for the given method. | |
62 | |
63 :param method: the serialization method; can be either "xml", "xhtml", | |
64 "html", "text", or a custom serializer class | |
65 | |
66 Any additional keyword arguments are passed to the serializer, and thus | |
67 depend on the `method` parameter value. | |
68 | |
69 :see: `XMLSerializer`, `XHTMLSerializer`, `HTMLSerializer`, `TextSerializer` | |
70 :since: version 0.4.1 | |
71 """ | |
72 if isinstance(method, basestring): | |
73 method = {'xml': XMLSerializer, | |
74 'xhtml': XHTMLSerializer, | |
75 'html': HTMLSerializer, | |
76 'text': TextSerializer}[method.lower()] | |
77 return method(**kwargs) | |
78 | |
1 | 79 |
85 | 80 class DocType(object): |
81 """Defines a number of commonly used DOCTYPE declarations as constants.""" | |
82 | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
83 HTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
84 'html', '-//W3C//DTD HTML 4.01//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
85 'http://www.w3.org/TR/html4/strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
86 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
87 HTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
88 'html', '-//W3C//DTD HTML 4.01 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
89 'http://www.w3.org/TR/html4/loose.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
90 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
91 HTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
92 'html', '-//W3C//DTD HTML 4.01 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
93 'http://www.w3.org/TR/html4/frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
94 ) |
85 | 95 HTML = HTML_STRICT |
96 | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
97 HTML5 = ('html', None, None) |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
98 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
99 XHTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
100 'html', '-//W3C//DTD XHTML 1.0 Strict//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
101 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
102 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
103 XHTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
104 'html', '-//W3C//DTD XHTML 1.0 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
105 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
106 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
107 XHTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
108 'html', '-//W3C//DTD XHTML 1.0 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
109 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
110 ) |
85 | 111 XHTML = XHTML_STRICT |
112 | |
729 | 113 XHTML11 = ( |
114 'html', '-//W3C//DTD XHTML 1.1//EN', | |
115 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd' | |
116 ) | |
117 | |
663 | 118 SVG_FULL = ( |
119 'svg', '-//W3C//DTD SVG 1.1//EN', | |
120 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd' | |
121 ) | |
122 SVG_BASIC = ( | |
123 'svg', '-//W3C//DTD SVG Basic 1.1//EN', | |
124 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd' | |
125 ) | |
126 SVG_TINY = ( | |
127 'svg', '-//W3C//DTD SVG Tiny 1.1//EN', | |
128 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd' | |
129 ) | |
130 SVG = SVG_FULL | |
131 | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
132 def get(cls, name): |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
133 """Return the ``(name, pubid, sysid)`` tuple of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
134 declaration for the specified name. |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
135 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
136 The following names are recognized in this version: |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
137 * "html" or "html-strict" for the HTML 4.01 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
138 * "html-transitional" for the HTML 4.01 transitional DTD |
745 | 139 * "html-frameset" for the HTML 4.01 frameset DTD |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
140 * "html5" for the ``DOCTYPE`` proposed for HTML5 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
141 * "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
142 * "xhtml-transitional" for the XHTML 1.0 transitional DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
143 * "xhtml-frameset" for the XHTML 1.0 frameset DTD |
729 | 144 * "xhtml11" for the XHTML 1.1 DTD |
663 | 145 * "svg" or "svg-full" for the SVG 1.1 DTD |
146 * "svg-basic" for the SVG Basic 1.1 DTD | |
147 * "svg-tiny" for the SVG Tiny 1.1 DTD | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
148 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
149 :param name: the name of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
150 :return: the ``(name, pubid, sysid)`` tuple for the requested |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
151 ``DOCTYPE``, or ``None`` if the name is not recognized |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
152 :since: version 0.4.1 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
153 """ |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
154 return { |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
155 'html': cls.HTML, 'html-strict': cls.HTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
156 'html-transitional': DocType.HTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
157 'html-frameset': DocType.HTML_FRAMESET, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
158 'html5': cls.HTML5, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
159 'xhtml': cls.XHTML, 'xhtml-strict': cls.XHTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
160 'xhtml-transitional': cls.XHTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
161 'xhtml-frameset': cls.XHTML_FRAMESET, |
729 | 162 'xhtml11': cls.XHTML11, |
663 | 163 'svg': cls.SVG, 'svg-full': cls.SVG_FULL, |
164 'svg-basic': cls.SVG_BASIC, | |
165 'svg-tiny': cls.SVG_TINY | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
166 }.get(name.lower()) |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
167 get = classmethod(get) |
448 | 168 |
85 | 169 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
170 class XMLSerializer(object): |
1 | 171 """Produces XML text from an event stream. |
172 | |
230 | 173 >>> from genshi.builder import tag |
20 | 174 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
175 >>> print ''.join(XMLSerializer()(elem.generate())) |
1 | 176 <div><a href="foo"/><br/><hr noshade="True"/></div> |
177 """ | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
178 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
179 _PRESERVE_SPACE = frozenset() |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
180 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
181 def __init__(self, doctype=None, strip_whitespace=True, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
182 namespace_prefixes=None): |
85 | 183 """Initialize the XML serializer. |
184 | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
185 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
186 DOCTYPE declaration that should be included at the top |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
187 of the generated output, or the name of a DOCTYPE as |
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
188 defined in `DocType.get` |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
189 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
190 stripped from the output |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
191 :note: Changed in 0.4.2: The `doctype` parameter can now be a string. |
85 | 192 """ |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
193 self.filters = [EmptyTagFilter()] |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
194 if strip_whitespace: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
195 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
196 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
197 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
198 self.filters.append(DocTypeInserter(doctype)) |
1 | 199 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
200 def __call__(self, stream): |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
201 have_decl = have_doctype = False |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
202 in_cdata = False |
1 | 203 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
204 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
205 stream = filter_(stream) |
1 | 206 for kind, data, pos in stream: |
207 | |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
208 if kind is START or kind is EMPTY: |
1 | 209 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
210 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
211 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
212 buf += [' ', attr, '="', escape(value), '"'] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
213 buf.append(kind is EMPTY and '/>' or '>') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
214 yield Markup(u''.join(buf)) |
1 | 215 |
69 | 216 elif kind is END: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
217 yield Markup('</%s>' % data) |
1 | 218 |
69 | 219 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
220 if in_cdata: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
221 yield data |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
222 else: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
223 yield escape(data, quotes=False) |
1 | 224 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
225 elif kind is COMMENT: |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
226 yield Markup('<!--%s-->' % data) |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
227 |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
228 elif kind is XML_DECL and not have_decl: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
229 version, encoding, standalone = data |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
230 buf = ['<?xml version="%s"' % version] |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
231 if encoding: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
232 buf.append(' encoding="%s"' % encoding) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
233 if standalone != -1: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
234 standalone = standalone and 'yes' or 'no' |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
235 buf.append(' standalone="%s"' % standalone) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
236 buf.append('?>\n') |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
237 yield Markup(u''.join(buf)) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
238 have_decl = True |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
239 |
136 | 240 elif kind is DOCTYPE and not have_doctype: |
241 name, pubid, sysid = data | |
242 buf = ['<!DOCTYPE %s'] | |
243 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
244 buf.append(' PUBLIC "%s"') |
136 | 245 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
246 buf.append(' SYSTEM') |
136 | 247 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
248 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
249 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
250 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 251 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
252 |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
253 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
254 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
255 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
256 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
257 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
258 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
259 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
260 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
261 elif kind is PI: |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
262 yield Markup('<?%s %s?>' % data) |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
263 |
1 | 264 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
265 class XHTMLSerializer(XMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
266 """Produces XHTML text from an event stream. |
1 | 267 |
230 | 268 >>> from genshi.builder import tag |
20 | 269 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
270 >>> print ''.join(XHTMLSerializer()(elem.generate())) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
271 <div><a href="foo"></a><br /><hr noshade="noshade" /></div> |
1 | 272 """ |
273 | |
274 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame', | |
275 'hr', 'img', 'input', 'isindex', 'link', 'meta', | |
276 'param']) | |
277 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare', | |
278 'defer', 'disabled', 'ismap', 'multiple', | |
279 'nohref', 'noresize', 'noshade', 'nowrap']) | |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
280 _PRESERVE_SPACE = frozenset([ |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
281 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'), |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
282 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea') |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
283 ]) |
1 | 284 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
285 def __init__(self, doctype=None, strip_whitespace=True, |
729 | 286 namespace_prefixes=None, drop_xml_decl=True): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
287 super(XHTMLSerializer, self).__init__(doctype, False) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
288 self.filters = [EmptyTagFilter()] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
289 if strip_whitespace: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
290 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
291 namespace_prefixes = namespace_prefixes or {} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
292 namespace_prefixes['http://www.w3.org/1999/xhtml'] = '' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
293 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
294 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
295 self.filters.append(DocTypeInserter(doctype)) |
729 | 296 self.drop_xml_decl = drop_xml_decl |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
297 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
298 def __call__(self, stream): |
136 | 299 boolean_attrs = self._BOOLEAN_ATTRS |
300 empty_elems = self._EMPTY_ELEMS | |
729 | 301 drop_xml_decl = self.drop_xml_decl |
302 have_decl = have_doctype = False | |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
303 in_cdata = False |
1 | 304 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
305 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
306 stream = filter_(stream) |
1 | 307 for kind, data, pos in stream: |
308 | |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
309 if kind is START or kind is EMPTY: |
1 | 310 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
311 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
312 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
313 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
314 value = attr |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
315 elif attr == u'xml:lang' and u'lang' not in attrib: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
316 buf += [' lang="', escape(value), '"'] |
689
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
317 elif attr == u'xml:space': |
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
318 continue |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
319 buf += [' ', attr, '="', escape(value), '"'] |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
320 if kind is EMPTY: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
321 if tag in empty_elems: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
322 buf.append(' />') |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
323 else: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
324 buf.append('></%s>' % tag) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
325 else: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
326 buf.append('>') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
327 yield Markup(u''.join(buf)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
328 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
329 elif kind is END: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
330 yield Markup('</%s>' % data) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
331 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
332 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
333 if in_cdata: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
334 yield data |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
335 else: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
336 yield escape(data, quotes=False) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
337 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
338 elif kind is COMMENT: |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
339 yield Markup('<!--%s-->' % data) |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
340 |
136 | 341 elif kind is DOCTYPE and not have_doctype: |
342 name, pubid, sysid = data | |
343 buf = ['<!DOCTYPE %s'] | |
344 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
345 buf.append(' PUBLIC "%s"') |
136 | 346 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
347 buf.append(' SYSTEM') |
136 | 348 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
349 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
350 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
351 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 352 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
353 |
729 | 354 elif kind is XML_DECL and not have_decl and not drop_xml_decl: |
355 version, encoding, standalone = data | |
356 buf = ['<?xml version="%s"' % version] | |
357 if encoding: | |
358 buf.append(' encoding="%s"' % encoding) | |
359 if standalone != -1: | |
360 standalone = standalone and 'yes' or 'no' | |
361 buf.append(' standalone="%s"' % standalone) | |
362 buf.append('?>\n') | |
363 yield Markup(u''.join(buf)) | |
364 have_decl = True | |
365 | |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
366 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
367 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
368 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
369 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
370 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
371 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
372 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
373 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
374 elif kind is PI: |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
375 yield Markup('<?%s %s?>' % data) |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
376 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
377 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
378 class HTMLSerializer(XHTMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
379 """Produces HTML text from an event stream. |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
380 |
230 | 381 >>> from genshi.builder import tag |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
382 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
383 >>> print ''.join(HTMLSerializer()(elem.generate())) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
384 <div><a href="foo"></a><br><hr noshade></div> |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
385 """ |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
386 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
387 _NOESCAPE_ELEMS = frozenset([ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
388 QName('script'), QName('http://www.w3.org/1999/xhtml}script'), |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
389 QName('style'), QName('http://www.w3.org/1999/xhtml}style') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
390 ]) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
391 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
392 def __init__(self, doctype=None, strip_whitespace=True): |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
393 """Initialize the HTML serializer. |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
394 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
395 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
396 DOCTYPE declaration that should be included at the top |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
397 of the generated output |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
398 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
399 stripped from the output |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
400 """ |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
401 super(HTMLSerializer, self).__init__(doctype, False) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
402 self.filters = [EmptyTagFilter()] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
403 if strip_whitespace: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
404 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE, |
305 | 405 self._NOESCAPE_ELEMS)) |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
406 self.filters.append(NamespaceFlattener(prefixes={ |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
407 'http://www.w3.org/1999/xhtml': '' |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
408 })) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
409 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
410 self.filters.append(DocTypeInserter(doctype)) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
411 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
412 def __call__(self, stream): |
136 | 413 boolean_attrs = self._BOOLEAN_ATTRS |
414 empty_elems = self._EMPTY_ELEMS | |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
415 noescape_elems = self._NOESCAPE_ELEMS |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
416 have_doctype = False |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
417 noescape = False |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
418 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
419 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
420 stream = filter_(stream) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
421 for kind, data, pos in stream: |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
422 |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
423 if kind is START or kind is EMPTY: |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
424 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
425 buf = ['<', tag] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
426 for attr, value in attrib: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
427 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
428 if value: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
429 buf += [' ', attr] |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
430 elif ':' in attr: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
431 if attr == 'xml:lang' and u'lang' not in attrib: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
432 buf += [' lang="', escape(value), '"'] |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
433 elif attr != 'xmlns': |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
434 buf += [' ', attr, '="', escape(value), '"'] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
435 buf.append('>') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
436 if kind is EMPTY: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
437 if tag not in empty_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
438 buf.append('</%s>' % tag) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
439 yield Markup(u''.join(buf)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
440 if tag in noescape_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
441 noescape = True |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
442 |
69 | 443 elif kind is END: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
444 yield Markup('</%s>' % data) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
445 noescape = False |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
446 |
69 | 447 elif kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
448 if noescape: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
449 yield data |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
450 else: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
451 yield escape(data, quotes=False) |
1 | 452 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
453 elif kind is COMMENT: |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
454 yield Markup('<!--%s-->' % data) |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
455 |
136 | 456 elif kind is DOCTYPE and not have_doctype: |
457 name, pubid, sysid = data | |
458 buf = ['<!DOCTYPE %s'] | |
459 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
460 buf.append(' PUBLIC "%s"') |
136 | 461 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
462 buf.append(' SYSTEM') |
136 | 463 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
464 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
465 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
466 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 467 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
468 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
469 elif kind is PI: |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
470 yield Markup('<?%s %s?>' % data) |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
471 |
1 | 472 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
473 class TextSerializer(object): |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
474 """Produces plain text from an event stream. |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
475 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
476 Only text events are included in the output. Unlike the other serializer, |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
477 special XML characters are not escaped: |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
478 |
230 | 479 >>> from genshi.builder import tag |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
480 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br) |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
481 >>> print elem |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
482 <div><a href="foo"><Hello!></a><br/></div> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
483 >>> print ''.join(TextSerializer()(elem.generate())) |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
484 <Hello!> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
485 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
486 If text events contain literal markup (instances of the `Markup` class), |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
487 that markup is by default passed through unchanged: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
488 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
489 >>> elem = tag.div(Markup('<a href="foo">Hello & Bye!</a><br/>')) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
490 >>> print elem.generate().render(TextSerializer) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
491 <a href="foo">Hello & Bye!</a><br/> |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
492 |
740
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
493 You can use the ``strip_markup`` to change this behavior, so that tags and |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
494 entities are stripped from the output (or in the case of entities, |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
495 replaced with the equivalent character): |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
496 |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
497 >>> print elem.generate().render(TextSerializer, strip_markup=True) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
498 Hello & Bye! |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
499 """ |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
500 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
501 def __init__(self, strip_markup=False): |
740
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
502 """Create the serializer. |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
503 |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
504 :param strip_markup: whether markup (tags and encoded characters) found |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
505 in the text should be removed |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
506 """ |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
507 self.strip_markup = strip_markup |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
508 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
509 def __call__(self, stream): |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
510 strip_markup = self.strip_markup |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
511 for event in stream: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
512 if event[0] is TEXT: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
513 data = event[1] |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
514 if strip_markup and type(data) is Markup: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
515 data = data.striptags().stripentities() |
201
c5e0a1c86173
The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents:
200
diff
changeset
|
516 yield unicode(data) |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
517 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
518 |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
519 class EmptyTagFilter(object): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
520 """Combines `START` and `STOP` events into `EMPTY` events for elements that |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
521 have no contents. |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
522 """ |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
523 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
524 EMPTY = StreamEventKind('EMPTY') |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
525 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
526 def __call__(self, stream): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
527 prev = (None, None, None) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
528 for ev in stream: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
529 if prev[0] is START: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
530 if ev[0] is END: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
531 prev = EMPTY, prev[1], prev[2] |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
532 yield prev |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
533 continue |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
534 else: |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
535 yield prev |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
536 if ev[0] is not START: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
537 yield ev |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
538 prev = ev |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
539 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
540 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
541 EMPTY = EmptyTagFilter.EMPTY |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
542 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
543 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
544 class NamespaceFlattener(object): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
545 r"""Output stream filter that removes namespace information from the stream, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
546 instead adding namespace attributes and prefixes as needed. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
547 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
548 :param prefixes: optional mapping of namespace URIs to prefixes |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
549 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
550 >>> from genshi.input import XML |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
551 >>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2"> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
552 ... <two:item/> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
553 ... </doc>''') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
554 >>> for kind, data, pos in NamespaceFlattener()(xml): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
555 ... print kind, repr(data) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
556 START (u'doc', Attrs([(u'xmlns', u'NS1'), (u'xmlns:two', u'NS2')])) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
557 TEXT u'\n ' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
558 START (u'two:item', Attrs()) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
559 END u'two:item' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
560 TEXT u'\n' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
561 END u'doc' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
562 """ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
563 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
564 def __init__(self, prefixes=None): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
565 self.prefixes = {XML_NAMESPACE.uri: 'xml'} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
566 if prefixes is not None: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
567 self.prefixes.update(prefixes) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
568 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
569 def __call__(self, stream): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
570 prefixes = dict([(v, [k]) for k, v in self.prefixes.items()]) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
571 namespaces = {XML_NAMESPACE.uri: ['xml']} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
572 def _push_ns(prefix, uri): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
573 namespaces.setdefault(uri, []).append(prefix) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
574 prefixes.setdefault(prefix, []).append(uri) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
575 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
576 ns_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
577 _push_ns_attr = ns_attrs.append |
437 | 578 def _make_ns_attr(prefix, uri): |
579 return u'xmlns%s' % (prefix and ':%s' % prefix or ''), uri | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
580 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
581 def _gen_prefix(): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
582 val = 0 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
583 while 1: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
584 val += 1 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
585 yield 'ns%d' % val |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
586 _gen_prefix = _gen_prefix().next |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
587 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
588 for kind, data, pos in stream: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
589 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
590 if kind is START or kind is EMPTY: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
591 tag, attrs = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
592 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
593 tagname = tag.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
594 tagns = tag.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
595 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
596 if tagns in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
597 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
598 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
599 tagname = u'%s:%s' % (prefix, tagname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
600 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
601 _push_ns_attr((u'xmlns', tagns)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
602 _push_ns('', tagns) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
603 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
604 new_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
605 for attr, value in attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
606 attrname = attr.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
607 attrns = attr.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
608 if attrns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
609 if attrns not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
610 prefix = _gen_prefix() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
611 _push_ns(prefix, attrns) |
412
bd51adc20a67
Actually write xmlns declaratons for generated attribute namespace prefixes.
cmlenz
parents:
410
diff
changeset
|
612 _push_ns_attr(('xmlns:%s' % prefix, attrns)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
613 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
614 prefix = namespaces[attrns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
615 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
616 attrname = u'%s:%s' % (prefix, attrname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
617 new_attrs.append((attrname, value)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
618 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
619 yield kind, (tagname, Attrs(ns_attrs + new_attrs)), pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
620 del ns_attrs[:] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
621 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
622 elif kind is END: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
623 tagname = data.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
624 tagns = data.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
625 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
626 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
627 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
628 tagname = u'%s:%s' % (prefix, tagname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
629 yield kind, tagname, pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
630 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
631 elif kind is START_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
632 prefix, uri = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
633 if uri not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
634 prefix = prefixes.get(uri, [prefix])[-1] |
437 | 635 _push_ns_attr(_make_ns_attr(prefix, uri)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
636 _push_ns(prefix, uri) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
637 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
638 elif kind is END_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
639 if data in prefixes: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
640 uris = prefixes.get(data) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
641 uri = uris.pop() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
642 if not uris: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
643 del prefixes[data] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
644 if uri not in uris or uri != uris[-1]: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
645 uri_prefixes = namespaces[uri] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
646 uri_prefixes.pop() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
647 if not uri_prefixes: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
648 del namespaces[uri] |
437 | 649 if ns_attrs: |
650 attr = _make_ns_attr(data, uri) | |
651 if attr in ns_attrs: | |
652 ns_attrs.remove(attr) | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
653 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
654 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
655 yield kind, data, pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
656 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
657 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
658 class WhitespaceFilter(object): |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
659 """A filter that removes extraneous ignorable white space from the |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
660 stream. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
661 """ |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
662 |
305 | 663 def __init__(self, preserve=None, noescape=None): |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
664 """Initialize the filter. |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
665 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
666 :param preserve: a set or sequence of tag names for which white-space |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
667 should be preserved |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
668 :param noescape: a set or sequence of tag names for which text content |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
669 should not be escaped |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
670 |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
671 The `noescape` set is expected to refer to elements that cannot contain |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
672 further child elements (such as ``<style>`` or ``<script>`` in HTML |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
673 documents). |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
674 """ |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
675 if preserve is None: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
676 preserve = [] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
677 self.preserve = frozenset(preserve) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
678 if noescape is None: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
679 noescape = [] |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
680 self.noescape = frozenset(noescape) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
681 |
219 | 682 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'], |
683 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub, | |
684 collapse_lines=re.compile('\n{2,}').sub): | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
685 mjoin = Markup('').join |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
686 preserve_elems = self.preserve |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
687 preserve = 0 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
688 noescape_elems = self.noescape |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
689 noescape = False |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
690 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
691 textbuf = [] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
692 push_text = textbuf.append |
136 | 693 pop_text = textbuf.pop |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
694 for kind, data, pos in chain(stream, [(None, None, None)]): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
695 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
696 if kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
697 if noescape: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
698 data = Markup(data) |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
699 push_text(data) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
700 else: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
701 if textbuf: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
702 if len(textbuf) > 1: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
703 text = mjoin(textbuf, escape_quotes=False) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
704 del textbuf[:] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
705 else: |
136 | 706 text = escape(pop_text(), quotes=False) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
707 if not preserve: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
708 text = collapse_lines('\n', trim_trailing_space('', text)) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
709 yield TEXT, Markup(text), pos |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
710 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
711 if kind is START: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
712 tag, attrs = data |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
713 if preserve or (tag in preserve_elems or |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
714 attrs.get(space) == 'preserve'): |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
715 preserve += 1 |
219 | 716 if not noescape and tag in noescape_elems: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
717 noescape = True |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
718 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
719 elif kind is END: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
720 noescape = False |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
721 if preserve: |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
722 preserve -= 1 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
723 |
305 | 724 elif kind is START_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
725 noescape = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
726 |
305 | 727 elif kind is END_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
728 noescape = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
729 |
136 | 730 if kind: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
731 yield kind, data, pos |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
732 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
733 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
734 class DocTypeInserter(object): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
735 """A filter that inserts the DOCTYPE declaration in the correct location, |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
736 after the XML declaration. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
737 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
738 def __init__(self, doctype): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
739 """Initialize the filter. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
740 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
741 :param doctype: DOCTYPE as a string or DocType object. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
742 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
743 if isinstance(doctype, basestring): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
744 doctype = DocType.get(doctype) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
745 self.doctype_event = (DOCTYPE, doctype, (None, -1, -1)) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
746 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
747 def __call__(self, stream): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
748 doctype_inserted = False |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
749 for kind, data, pos in stream: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
750 if not doctype_inserted: |
672
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
751 doctype_inserted = True |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
752 if kind is XML_DECL: |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
753 yield (kind, data, pos) |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
754 yield self.doctype_event |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
755 continue |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
756 yield self.doctype_event |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
757 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
758 yield (kind, data, pos) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
759 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
760 if not doctype_inserted: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
761 yield self.doctype_event |