Mercurial > genshi > mirror
annotate genshi/output.py @ 831:7a422be6f6a6 trunk
Follow-up fix for [1038].
author | cmlenz |
---|---|
date | Fri, 13 Mar 2009 21:05:19 +0000 |
parents | 6e46513e1c5c |
children | 07f4339fecb0 |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
719 | 3 # Copyright (C) 2006-2008 Edgewall Software |
1 | 4 # All rights reserved. |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
230 | 8 # are also available at http://genshi.edgewall.org/wiki/License. |
1 | 9 # |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
230 | 12 # history and logs, available at http://genshi.edgewall.org/log/. |
1 | 13 |
14 """This module provides different kinds of serialization methods for XML event | |
15 streams. | |
16 """ | |
17 | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
18 from itertools import chain |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
19 import re |
1 | 20 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
21 from genshi.core import escape, Attrs, Markup, Namespace, QName, StreamEventKind |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
22 from genshi.core import START, END, TEXT, XML_DECL, DOCTYPE, START_NS, END_NS, \ |
402
c199e9b95884
Fix output of namespace declarations for namespace URLs appearing more than once in a stream. Thanks to Jeff Cutsinger for reporting the problem.
cmlenz
parents:
397
diff
changeset
|
23 START_CDATA, END_CDATA, PI, COMMENT, XML_NAMESPACE |
1 | 24 |
462 | 25 __all__ = ['encode', 'get_serializer', 'DocType', 'XMLSerializer', |
26 'XHTMLSerializer', 'HTMLSerializer', 'TextSerializer'] | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
27 __docformat__ = 'restructuredtext en' |
1 | 28 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
29 def encode(iterator, method='xml', encoding='utf-8', out=None): |
462 | 30 """Encode serializer output into a string. |
31 | |
32 :param iterator: the iterator returned from serializing a stream (basically | |
33 any iterator that yields unicode objects) | |
34 :param method: the serialization method; determines how characters not | |
35 representable in the specified encoding are treated | |
36 :param encoding: how the output string should be encoded; if set to `None`, | |
37 this method returns a `unicode` object | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
38 :param out: a file-like object that the output should be written to |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
39 instead of being returned as one big string; note that if |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
40 this is a file or socket (or similar), the `encoding` must |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
41 not be `None` (that is, the output must be encoded) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
42 :return: a `str` or `unicode` object (depending on the `encoding` |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
43 parameter), or `None` if the `out` parameter is provided |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
44 |
462 | 45 :since: version 0.4.1 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
46 :note: Changed in 0.5: added the `out` parameter |
462 | 47 """ |
48 if encoding is not None: | |
49 errors = 'replace' | |
50 if method != 'text' and not isinstance(method, TextSerializer): | |
51 errors = 'xmlcharrefreplace' | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
52 _encode = lambda string: string.encode(encoding, errors) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
53 else: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
54 _encode = lambda string: string |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
55 if out is None: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
56 return _encode(u''.join(list(iterator))) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
57 for chunk in iterator: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
58 out.write(_encode(chunk)) |
462 | 59 |
60 def get_serializer(method='xml', **kwargs): | |
61 """Return a serializer object for the given method. | |
62 | |
63 :param method: the serialization method; can be either "xml", "xhtml", | |
64 "html", "text", or a custom serializer class | |
65 | |
66 Any additional keyword arguments are passed to the serializer, and thus | |
67 depend on the `method` parameter value. | |
68 | |
69 :see: `XMLSerializer`, `XHTMLSerializer`, `HTMLSerializer`, `TextSerializer` | |
70 :since: version 0.4.1 | |
71 """ | |
72 if isinstance(method, basestring): | |
73 method = {'xml': XMLSerializer, | |
74 'xhtml': XHTMLSerializer, | |
75 'html': HTMLSerializer, | |
76 'text': TextSerializer}[method.lower()] | |
77 return method(**kwargs) | |
78 | |
1 | 79 |
85 | 80 class DocType(object): |
81 """Defines a number of commonly used DOCTYPE declarations as constants.""" | |
82 | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
83 HTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
84 'html', '-//W3C//DTD HTML 4.01//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
85 'http://www.w3.org/TR/html4/strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
86 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
87 HTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
88 'html', '-//W3C//DTD HTML 4.01 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
89 'http://www.w3.org/TR/html4/loose.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
90 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
91 HTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
92 'html', '-//W3C//DTD HTML 4.01 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
93 'http://www.w3.org/TR/html4/frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
94 ) |
85 | 95 HTML = HTML_STRICT |
96 | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
97 HTML5 = ('html', None, None) |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
98 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
99 XHTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
100 'html', '-//W3C//DTD XHTML 1.0 Strict//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
101 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
102 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
103 XHTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
104 'html', '-//W3C//DTD XHTML 1.0 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
105 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
106 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
107 XHTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
108 'html', '-//W3C//DTD XHTML 1.0 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
109 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
110 ) |
85 | 111 XHTML = XHTML_STRICT |
112 | |
729 | 113 XHTML11 = ( |
114 'html', '-//W3C//DTD XHTML 1.1//EN', | |
115 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd' | |
116 ) | |
117 | |
663 | 118 SVG_FULL = ( |
119 'svg', '-//W3C//DTD SVG 1.1//EN', | |
120 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd' | |
121 ) | |
122 SVG_BASIC = ( | |
123 'svg', '-//W3C//DTD SVG Basic 1.1//EN', | |
124 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd' | |
125 ) | |
126 SVG_TINY = ( | |
127 'svg', '-//W3C//DTD SVG Tiny 1.1//EN', | |
128 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd' | |
129 ) | |
130 SVG = SVG_FULL | |
131 | |
822
70fddd2262f5
Get rid of some Python 2.3 legacy that's no longer needed now that 2.4 is the baseline.
cmlenz
parents:
750
diff
changeset
|
132 @classmethod |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
133 def get(cls, name): |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
134 """Return the ``(name, pubid, sysid)`` tuple of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
135 declaration for the specified name. |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
136 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
137 The following names are recognized in this version: |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
138 * "html" or "html-strict" for the HTML 4.01 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
139 * "html-transitional" for the HTML 4.01 transitional DTD |
745 | 140 * "html-frameset" for the HTML 4.01 frameset DTD |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
141 * "html5" for the ``DOCTYPE`` proposed for HTML5 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
142 * "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
143 * "xhtml-transitional" for the XHTML 1.0 transitional DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
144 * "xhtml-frameset" for the XHTML 1.0 frameset DTD |
729 | 145 * "xhtml11" for the XHTML 1.1 DTD |
663 | 146 * "svg" or "svg-full" for the SVG 1.1 DTD |
147 * "svg-basic" for the SVG Basic 1.1 DTD | |
148 * "svg-tiny" for the SVG Tiny 1.1 DTD | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
149 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
150 :param name: the name of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
151 :return: the ``(name, pubid, sysid)`` tuple for the requested |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
152 ``DOCTYPE``, or ``None`` if the name is not recognized |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
153 :since: version 0.4.1 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
154 """ |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
155 return { |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
156 'html': cls.HTML, 'html-strict': cls.HTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
157 'html-transitional': DocType.HTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
158 'html-frameset': DocType.HTML_FRAMESET, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
159 'html5': cls.HTML5, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
160 'xhtml': cls.XHTML, 'xhtml-strict': cls.XHTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
161 'xhtml-transitional': cls.XHTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
162 'xhtml-frameset': cls.XHTML_FRAMESET, |
729 | 163 'xhtml11': cls.XHTML11, |
663 | 164 'svg': cls.SVG, 'svg-full': cls.SVG_FULL, |
165 'svg-basic': cls.SVG_BASIC, | |
166 'svg-tiny': cls.SVG_TINY | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
167 }.get(name.lower()) |
448 | 168 |
85 | 169 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
170 class XMLSerializer(object): |
1 | 171 """Produces XML text from an event stream. |
172 | |
230 | 173 >>> from genshi.builder import tag |
20 | 174 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
175 >>> print ''.join(XMLSerializer()(elem.generate())) |
1 | 176 <div><a href="foo"/><br/><hr noshade="True"/></div> |
177 """ | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
178 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
179 _PRESERVE_SPACE = frozenset() |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
180 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
181 def __init__(self, doctype=None, strip_whitespace=True, |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
182 namespace_prefixes=None, cache=True): |
85 | 183 """Initialize the XML serializer. |
184 | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
185 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
186 DOCTYPE declaration that should be included at the top |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
187 of the generated output, or the name of a DOCTYPE as |
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
188 defined in `DocType.get` |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
189 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
190 stripped from the output |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
191 :param cache: whether to cache the text output per event, which |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
192 improves performance for repetitive markup |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
193 :note: Changed in 0.4.2: The `doctype` parameter can now be a string. |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
194 :note: Changed in 0.6: The `cache` parameter was added |
85 | 195 """ |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
196 self.filters = [EmptyTagFilter()] |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
197 if strip_whitespace: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
198 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
199 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes, |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
200 cache=cache)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
201 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
202 self.filters.append(DocTypeInserter(doctype)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
203 self.cache = cache |
1 | 204 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
205 def __call__(self, stream): |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
206 have_decl = have_doctype = False |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
207 in_cdata = False |
1 | 208 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
209 cache = {} |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
210 cache_get = cache.get |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
211 if self.cache: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
212 def _emit(kind, input, output): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
213 cache[kind, input] = output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
214 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
215 else: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
216 def _emit(kind, input, output): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
217 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
218 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
219 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
220 stream = filter_(stream) |
1 | 221 for kind, data, pos in stream: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
222 cached = cache_get((kind, data)) |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
223 if cached is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
224 yield cached |
1 | 225 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
226 elif kind is START or kind is EMPTY: |
1 | 227 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
228 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
229 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
230 buf += [' ', attr, '="', escape(value), '"'] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
231 buf.append(kind is EMPTY and '/>' or '>') |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
232 yield _emit(kind, data, Markup(u''.join(buf))) |
1 | 233 |
69 | 234 elif kind is END: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
235 yield _emit(kind, data, Markup('</%s>' % data)) |
1 | 236 |
69 | 237 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
238 if in_cdata: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
239 yield _emit(kind, data, data) |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
240 else: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
241 yield _emit(kind, data, escape(data, quotes=False)) |
1 | 242 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
243 elif kind is COMMENT: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
244 yield _emit(kind, data, Markup('<!--%s-->' % data)) |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
245 |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
246 elif kind is XML_DECL and not have_decl: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
247 version, encoding, standalone = data |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
248 buf = ['<?xml version="%s"' % version] |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
249 if encoding: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
250 buf.append(' encoding="%s"' % encoding) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
251 if standalone != -1: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
252 standalone = standalone and 'yes' or 'no' |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
253 buf.append(' standalone="%s"' % standalone) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
254 buf.append('?>\n') |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
255 yield Markup(u''.join(buf)) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
256 have_decl = True |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
257 |
136 | 258 elif kind is DOCTYPE and not have_doctype: |
259 name, pubid, sysid = data | |
260 buf = ['<!DOCTYPE %s'] | |
261 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
262 buf.append(' PUBLIC "%s"') |
136 | 263 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
264 buf.append(' SYSTEM') |
136 | 265 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
266 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
267 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
268 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 269 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
270 |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
271 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
272 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
273 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
274 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
275 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
276 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
277 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
278 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
279 elif kind is PI: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
280 yield _emit(kind, data, Markup('<?%s %s?>' % data)) |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
281 |
1 | 282 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
283 class XHTMLSerializer(XMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
284 """Produces XHTML text from an event stream. |
1 | 285 |
230 | 286 >>> from genshi.builder import tag |
20 | 287 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
288 >>> print ''.join(XHTMLSerializer()(elem.generate())) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
289 <div><a href="foo"></a><br /><hr noshade="noshade" /></div> |
1 | 290 """ |
291 | |
292 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame', | |
293 'hr', 'img', 'input', 'isindex', 'link', 'meta', | |
294 'param']) | |
295 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare', | |
296 'defer', 'disabled', 'ismap', 'multiple', | |
297 'nohref', 'noresize', 'noshade', 'nowrap']) | |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
298 _PRESERVE_SPACE = frozenset([ |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
299 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'), |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
300 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea') |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
301 ]) |
1 | 302 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
303 def __init__(self, doctype=None, strip_whitespace=True, |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
304 namespace_prefixes=None, drop_xml_decl=True, cache=True): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
305 super(XHTMLSerializer, self).__init__(doctype, False) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
306 self.filters = [EmptyTagFilter()] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
307 if strip_whitespace: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
308 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
309 namespace_prefixes = namespace_prefixes or {} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
310 namespace_prefixes['http://www.w3.org/1999/xhtml'] = '' |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
311 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes, |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
312 cache=cache)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
313 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
314 self.filters.append(DocTypeInserter(doctype)) |
729 | 315 self.drop_xml_decl = drop_xml_decl |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
316 self.cache = cache |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
317 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
318 def __call__(self, stream): |
136 | 319 boolean_attrs = self._BOOLEAN_ATTRS |
320 empty_elems = self._EMPTY_ELEMS | |
729 | 321 drop_xml_decl = self.drop_xml_decl |
322 have_decl = have_doctype = False | |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
323 in_cdata = False |
1 | 324 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
325 cache = {} |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
326 cache_get = cache.get |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
327 if self.cache: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
328 def _emit(kind, input, output): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
329 cache[kind, input] = output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
330 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
331 else: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
332 def _emit(kind, input, output): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
333 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
334 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
335 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
336 stream = filter_(stream) |
1 | 337 for kind, data, pos in stream: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
338 cached = cache_get((kind, data)) |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
339 if cached is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
340 yield cached |
1 | 341 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
342 elif kind is START or kind is EMPTY: |
1 | 343 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
344 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
345 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
346 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
347 value = attr |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
348 elif attr == u'xml:lang' and u'lang' not in attrib: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
349 buf += [' lang="', escape(value), '"'] |
689
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
350 elif attr == u'xml:space': |
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
351 continue |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
352 buf += [' ', attr, '="', escape(value), '"'] |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
353 if kind is EMPTY: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
354 if tag in empty_elems: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
355 buf.append(' />') |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
356 else: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
357 buf.append('></%s>' % tag) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
358 else: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
359 buf.append('>') |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
360 yield _emit(kind, data, Markup(u''.join(buf))) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
361 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
362 elif kind is END: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
363 yield _emit(kind, data, Markup('</%s>' % data)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
364 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
365 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
366 if in_cdata: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
367 yield _emit(kind, data, data) |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
368 else: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
369 yield _emit(kind, data, escape(data, quotes=False)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
370 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
371 elif kind is COMMENT: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
372 yield _emit(kind, data, Markup('<!--%s-->' % data)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
373 |
136 | 374 elif kind is DOCTYPE and not have_doctype: |
375 name, pubid, sysid = data | |
376 buf = ['<!DOCTYPE %s'] | |
377 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
378 buf.append(' PUBLIC "%s"') |
136 | 379 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
380 buf.append(' SYSTEM') |
136 | 381 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
382 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
383 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
384 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 385 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
386 |
729 | 387 elif kind is XML_DECL and not have_decl and not drop_xml_decl: |
388 version, encoding, standalone = data | |
389 buf = ['<?xml version="%s"' % version] | |
390 if encoding: | |
391 buf.append(' encoding="%s"' % encoding) | |
392 if standalone != -1: | |
393 standalone = standalone and 'yes' or 'no' | |
394 buf.append(' standalone="%s"' % standalone) | |
395 buf.append('?>\n') | |
396 yield Markup(u''.join(buf)) | |
397 have_decl = True | |
398 | |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
399 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
400 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
401 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
402 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
403 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
404 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
405 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
406 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
407 elif kind is PI: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
408 yield _emit(kind, data, Markup('<?%s %s?>' % data)) |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
409 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
410 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
411 class HTMLSerializer(XHTMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
412 """Produces HTML text from an event stream. |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
413 |
230 | 414 >>> from genshi.builder import tag |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
415 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
416 >>> print ''.join(HTMLSerializer()(elem.generate())) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
417 <div><a href="foo"></a><br><hr noshade></div> |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
418 """ |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
419 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
420 _NOESCAPE_ELEMS = frozenset([ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
421 QName('script'), QName('http://www.w3.org/1999/xhtml}script'), |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
422 QName('style'), QName('http://www.w3.org/1999/xhtml}style') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
423 ]) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
424 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
425 def __init__(self, doctype=None, strip_whitespace=True, cache=True): |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
426 """Initialize the HTML serializer. |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
427 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
428 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
429 DOCTYPE declaration that should be included at the top |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
430 of the generated output |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
431 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
432 stripped from the output |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
433 :param cache: whether to cache the text output per event, which |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
434 improves performance for repetitive markup |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
435 :note: Changed in 0.6: The `cache` parameter was added |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
436 """ |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
437 super(HTMLSerializer, self).__init__(doctype, False) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
438 self.filters = [EmptyTagFilter()] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
439 if strip_whitespace: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
440 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE, |
305 | 441 self._NOESCAPE_ELEMS)) |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
442 self.filters.append(NamespaceFlattener(prefixes={ |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
443 'http://www.w3.org/1999/xhtml': '' |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
444 }, cache=cache)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
445 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
446 self.filters.append(DocTypeInserter(doctype)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
447 self.cache = True |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
448 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
449 def __call__(self, stream): |
136 | 450 boolean_attrs = self._BOOLEAN_ATTRS |
451 empty_elems = self._EMPTY_ELEMS | |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
452 noescape_elems = self._NOESCAPE_ELEMS |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
453 have_doctype = False |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
454 noescape = False |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
455 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
456 cache = {} |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
457 cache_get = cache.get |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
458 if self.cache: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
459 def _emit(kind, input, output): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
460 cache[kind, input] = output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
461 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
462 else: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
463 def _emit(kind, input, output): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
464 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
465 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
466 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
467 stream = filter_(stream) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
468 for kind, data, _ in stream: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
469 output = cache_get((kind, data)) |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
470 if output is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
471 yield output |
831 | 472 if (kind is START or kind is EMPTY) \ |
473 and data[0] in noescape_elems: | |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
474 noescape = True |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
475 elif kind is END: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
476 noescape = False |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
477 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
478 elif kind is START or kind is EMPTY: |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
479 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
480 buf = ['<', tag] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
481 for attr, value in attrib: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
482 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
483 if value: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
484 buf += [' ', attr] |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
485 elif ':' in attr: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
486 if attr == 'xml:lang' and u'lang' not in attrib: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
487 buf += [' lang="', escape(value), '"'] |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
488 elif attr != 'xmlns': |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
489 buf += [' ', attr, '="', escape(value), '"'] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
490 buf.append('>') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
491 if kind is EMPTY: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
492 if tag not in empty_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
493 buf.append('</%s>' % tag) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
494 yield _emit(kind, data, Markup(u''.join(buf))) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
495 if tag in noescape_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
496 noescape = True |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
497 |
69 | 498 elif kind is END: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
499 yield _emit(kind, data, Markup('</%s>' % data)) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
500 noescape = False |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
501 |
69 | 502 elif kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
503 if noescape: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
504 yield _emit(kind, data, data) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
505 else: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
506 yield _emit(kind, data, escape(data, quotes=False)) |
1 | 507 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
508 elif kind is COMMENT: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
509 yield _emit(kind, data, Markup('<!--%s-->' % data)) |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
510 |
136 | 511 elif kind is DOCTYPE and not have_doctype: |
512 name, pubid, sysid = data | |
513 buf = ['<!DOCTYPE %s'] | |
514 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
515 buf.append(' PUBLIC "%s"') |
136 | 516 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
517 buf.append(' SYSTEM') |
136 | 518 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
519 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
520 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
521 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 522 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
523 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
524 elif kind is PI: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
525 yield _emit(kind, data, Markup('<?%s %s?>' % data)) |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
526 |
1 | 527 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
528 class TextSerializer(object): |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
529 """Produces plain text from an event stream. |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
530 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
531 Only text events are included in the output. Unlike the other serializer, |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
532 special XML characters are not escaped: |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
533 |
230 | 534 >>> from genshi.builder import tag |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
535 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br) |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
536 >>> print elem |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
537 <div><a href="foo"><Hello!></a><br/></div> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
538 >>> print ''.join(TextSerializer()(elem.generate())) |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
539 <Hello!> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
540 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
541 If text events contain literal markup (instances of the `Markup` class), |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
542 that markup is by default passed through unchanged: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
543 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
544 >>> elem = tag.div(Markup('<a href="foo">Hello & Bye!</a><br/>')) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
545 >>> print elem.generate().render(TextSerializer) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
546 <a href="foo">Hello & Bye!</a><br/> |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
547 |
740
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
548 You can use the ``strip_markup`` to change this behavior, so that tags and |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
549 entities are stripped from the output (or in the case of entities, |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
550 replaced with the equivalent character): |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
551 |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
552 >>> print elem.generate().render(TextSerializer, strip_markup=True) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
553 Hello & Bye! |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
554 """ |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
555 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
556 def __init__(self, strip_markup=False): |
740
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
557 """Create the serializer. |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
558 |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
559 :param strip_markup: whether markup (tags and encoded characters) found |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
560 in the text should be removed |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
561 """ |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
562 self.strip_markup = strip_markup |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
563 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
564 def __call__(self, stream): |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
565 strip_markup = self.strip_markup |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
566 for event in stream: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
567 if event[0] is TEXT: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
568 data = event[1] |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
569 if strip_markup and type(data) is Markup: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
570 data = data.striptags().stripentities() |
201
c5e0a1c86173
The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents:
200
diff
changeset
|
571 yield unicode(data) |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
572 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
573 |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
574 class EmptyTagFilter(object): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
575 """Combines `START` and `STOP` events into `EMPTY` events for elements that |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
576 have no contents. |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
577 """ |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
578 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
579 EMPTY = StreamEventKind('EMPTY') |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
580 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
581 def __call__(self, stream): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
582 prev = (None, None, None) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
583 for ev in stream: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
584 if prev[0] is START: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
585 if ev[0] is END: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
586 prev = EMPTY, prev[1], prev[2] |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
587 yield prev |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
588 continue |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
589 else: |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
590 yield prev |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
591 if ev[0] is not START: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
592 yield ev |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
593 prev = ev |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
594 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
595 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
596 EMPTY = EmptyTagFilter.EMPTY |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
597 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
598 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
599 class NamespaceFlattener(object): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
600 r"""Output stream filter that removes namespace information from the stream, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
601 instead adding namespace attributes and prefixes as needed. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
602 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
603 :param prefixes: optional mapping of namespace URIs to prefixes |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
604 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
605 >>> from genshi.input import XML |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
606 >>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2"> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
607 ... <two:item/> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
608 ... </doc>''') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
609 >>> for kind, data, pos in NamespaceFlattener()(xml): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
610 ... print kind, repr(data) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
611 START (u'doc', Attrs([(u'xmlns', u'NS1'), (u'xmlns:two', u'NS2')])) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
612 TEXT u'\n ' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
613 START (u'two:item', Attrs()) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
614 END u'two:item' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
615 TEXT u'\n' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
616 END u'doc' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
617 """ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
618 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
619 def __init__(self, prefixes=None, cache=True): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
620 self.prefixes = {XML_NAMESPACE.uri: 'xml'} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
621 if prefixes is not None: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
622 self.prefixes.update(prefixes) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
623 self.cache = cache |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
624 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
625 def __call__(self, stream): |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
626 cache = {} |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
627 cache_get = cache.get |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
628 if self.cache: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
629 def _emit(kind, input, output, pos): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
630 cache[kind, input] = output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
631 return kind, output, pos |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
632 else: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
633 def _emit(kind, input, output, pos): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
634 return output |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
635 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
636 prefixes = dict([(v, [k]) for k, v in self.prefixes.items()]) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
637 namespaces = {XML_NAMESPACE.uri: ['xml']} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
638 def _push_ns(prefix, uri): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
639 namespaces.setdefault(uri, []).append(prefix) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
640 prefixes.setdefault(prefix, []).append(uri) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
641 cache.clear() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
642 def _pop_ns(prefix): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
643 uris = prefixes.get(prefix) |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
644 uri = uris.pop() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
645 if not uris: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
646 del prefixes[prefix] |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
647 if uri not in uris or uri != uris[-1]: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
648 uri_prefixes = namespaces[uri] |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
649 uri_prefixes.pop() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
650 if not uri_prefixes: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
651 del namespaces[uri] |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
652 cache.clear() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
653 return uri |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
654 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
655 ns_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
656 _push_ns_attr = ns_attrs.append |
437 | 657 def _make_ns_attr(prefix, uri): |
658 return u'xmlns%s' % (prefix and ':%s' % prefix or ''), uri | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
659 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
660 def _gen_prefix(): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
661 val = 0 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
662 while 1: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
663 val += 1 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
664 yield 'ns%d' % val |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
665 _gen_prefix = _gen_prefix().next |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
666 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
667 for kind, data, pos in stream: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
668 output = cache_get((kind, data)) |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
669 if output is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
670 yield kind, output, pos |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
671 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
672 elif kind is START or kind is EMPTY: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
673 tag, attrs = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
674 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
675 tagname = tag.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
676 tagns = tag.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
677 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
678 if tagns in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
679 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
680 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
681 tagname = u'%s:%s' % (prefix, tagname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
682 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
683 _push_ns_attr((u'xmlns', tagns)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
684 _push_ns('', tagns) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
685 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
686 new_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
687 for attr, value in attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
688 attrname = attr.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
689 attrns = attr.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
690 if attrns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
691 if attrns not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
692 prefix = _gen_prefix() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
693 _push_ns(prefix, attrns) |
412
bd51adc20a67
Actually write xmlns declaratons for generated attribute namespace prefixes.
cmlenz
parents:
410
diff
changeset
|
694 _push_ns_attr(('xmlns:%s' % prefix, attrns)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
695 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
696 prefix = namespaces[attrns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
697 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
698 attrname = u'%s:%s' % (prefix, attrname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
699 new_attrs.append((attrname, value)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
700 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
701 yield _emit(kind, data, (tagname, Attrs(ns_attrs + new_attrs)), pos) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
702 del ns_attrs[:] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
703 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
704 elif kind is END: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
705 tagname = data.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
706 tagns = data.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
707 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
708 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
709 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
710 tagname = u'%s:%s' % (prefix, tagname) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
711 yield _emit(kind, data, tagname, pos) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
712 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
713 elif kind is START_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
714 prefix, uri = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
715 if uri not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
716 prefix = prefixes.get(uri, [prefix])[-1] |
437 | 717 _push_ns_attr(_make_ns_attr(prefix, uri)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
718 _push_ns(prefix, uri) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
719 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
720 elif kind is END_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
721 if data in prefixes: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
722 uri = _pop_ns(data) |
437 | 723 if ns_attrs: |
724 attr = _make_ns_attr(data, uri) | |
725 if attr in ns_attrs: | |
726 ns_attrs.remove(attr) | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
727 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
728 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
729 yield kind, data, pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
730 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
731 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
732 class WhitespaceFilter(object): |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
733 """A filter that removes extraneous ignorable white space from the |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
734 stream. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
735 """ |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
736 |
305 | 737 def __init__(self, preserve=None, noescape=None): |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
738 """Initialize the filter. |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
739 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
740 :param preserve: a set or sequence of tag names for which white-space |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
741 should be preserved |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
742 :param noescape: a set or sequence of tag names for which text content |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
743 should not be escaped |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
744 |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
745 The `noescape` set is expected to refer to elements that cannot contain |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
746 further child elements (such as ``<style>`` or ``<script>`` in HTML |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
747 documents). |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
748 """ |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
749 if preserve is None: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
750 preserve = [] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
751 self.preserve = frozenset(preserve) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
752 if noescape is None: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
753 noescape = [] |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
754 self.noescape = frozenset(noescape) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
755 |
219 | 756 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'], |
757 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub, | |
758 collapse_lines=re.compile('\n{2,}').sub): | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
759 mjoin = Markup('').join |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
760 preserve_elems = self.preserve |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
761 preserve = 0 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
762 noescape_elems = self.noescape |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
763 noescape = False |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
764 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
765 textbuf = [] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
766 push_text = textbuf.append |
136 | 767 pop_text = textbuf.pop |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
768 for kind, data, pos in chain(stream, [(None, None, None)]): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
769 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
770 if kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
771 if noescape: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
772 data = Markup(data) |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
773 push_text(data) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
774 else: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
775 if textbuf: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
776 if len(textbuf) > 1: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
777 text = mjoin(textbuf, escape_quotes=False) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
778 del textbuf[:] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
779 else: |
136 | 780 text = escape(pop_text(), quotes=False) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
781 if not preserve: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
782 text = collapse_lines('\n', trim_trailing_space('', text)) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
783 yield TEXT, Markup(text), pos |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
784 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
785 if kind is START: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
786 tag, attrs = data |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
787 if preserve or (tag in preserve_elems or |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
788 attrs.get(space) == 'preserve'): |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
789 preserve += 1 |
219 | 790 if not noescape and tag in noescape_elems: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
791 noescape = True |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
792 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
793 elif kind is END: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
794 noescape = False |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
795 if preserve: |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
796 preserve -= 1 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
797 |
305 | 798 elif kind is START_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
799 noescape = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
800 |
305 | 801 elif kind is END_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
802 noescape = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
803 |
136 | 804 if kind: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
805 yield kind, data, pos |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
806 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
807 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
808 class DocTypeInserter(object): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
809 """A filter that inserts the DOCTYPE declaration in the correct location, |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
810 after the XML declaration. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
811 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
812 def __init__(self, doctype): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
813 """Initialize the filter. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
814 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
815 :param doctype: DOCTYPE as a string or DocType object. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
816 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
817 if isinstance(doctype, basestring): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
818 doctype = DocType.get(doctype) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
819 self.doctype_event = (DOCTYPE, doctype, (None, -1, -1)) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
820 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
821 def __call__(self, stream): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
822 doctype_inserted = False |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
823 for kind, data, pos in stream: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
824 if not doctype_inserted: |
672
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
825 doctype_inserted = True |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
826 if kind is XML_DECL: |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
827 yield (kind, data, pos) |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
828 yield self.doctype_event |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
829 continue |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
830 yield self.doctype_event |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
831 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
832 yield (kind, data, pos) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
833 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
834 if not doctype_inserted: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
835 yield self.doctype_event |