Mercurial > genshi > mirror
annotate genshi/output.py @ 958:6fc92535c888 experimental-performance-improvement-exploration
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
author | hodgestar |
---|---|
date | Tue, 13 Mar 2012 03:03:02 +0000 |
parents | f15334b65cf8 |
children |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
854
4d9bef447df9
More work on reducing the size of the diff produced by 2to3.
cmlenz
parents:
853
diff
changeset
|
3 # Copyright (C) 2006-2009 Edgewall Software |
1 | 4 # All rights reserved. |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
230 | 8 # are also available at http://genshi.edgewall.org/wiki/License. |
1 | 9 # |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
230 | 12 # history and logs, available at http://genshi.edgewall.org/log/. |
1 | 13 |
14 """This module provides different kinds of serialization methods for XML event | |
15 streams. | |
16 """ | |
17 | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
18 from itertools import chain |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
19 import re |
1 | 20 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
21 from genshi.core import escape, Attrs, Markup, Namespace, QName, StreamEventKind |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
22 from genshi.core import START, END, TEXT, XML_DECL, DOCTYPE, START_NS, END_NS, \ |
958
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
23 START_CDATA, END_CDATA, PI, COMMENT, XML_NAMESPACE, ATTRS |
1 | 24 |
462 | 25 __all__ = ['encode', 'get_serializer', 'DocType', 'XMLSerializer', |
26 'XHTMLSerializer', 'HTMLSerializer', 'TextSerializer'] | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
27 __docformat__ = 'restructuredtext en' |
1 | 28 |
863 | 29 |
932 | 30 def encode(iterator, method='xml', encoding=None, out=None): |
462 | 31 """Encode serializer output into a string. |
32 | |
33 :param iterator: the iterator returned from serializing a stream (basically | |
34 any iterator that yields unicode objects) | |
35 :param method: the serialization method; determines how characters not | |
36 representable in the specified encoding are treated | |
37 :param encoding: how the output string should be encoded; if set to `None`, | |
38 this method returns a `unicode` object | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
39 :param out: a file-like object that the output should be written to |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
40 instead of being returned as one big string; note that if |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
41 this is a file or socket (or similar), the `encoding` must |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
42 not be `None` (that is, the output must be encoded) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
43 :return: a `str` or `unicode` object (depending on the `encoding` |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
44 parameter), or `None` if the `out` parameter is provided |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
45 |
462 | 46 :since: version 0.4.1 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
47 :note: Changed in 0.5: added the `out` parameter |
462 | 48 """ |
49 if encoding is not None: | |
50 errors = 'replace' | |
51 if method != 'text' and not isinstance(method, TextSerializer): | |
52 errors = 'xmlcharrefreplace' | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
53 _encode = lambda string: string.encode(encoding, errors) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
54 else: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
55 _encode = lambda string: string |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
56 if out is None: |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
57 return _encode(''.join(list(iterator))) |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
58 for chunk in iterator: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
59 out.write(_encode(chunk)) |
462 | 60 |
863 | 61 |
462 | 62 def get_serializer(method='xml', **kwargs): |
63 """Return a serializer object for the given method. | |
64 | |
65 :param method: the serialization method; can be either "xml", "xhtml", | |
66 "html", "text", or a custom serializer class | |
67 | |
68 Any additional keyword arguments are passed to the serializer, and thus | |
69 depend on the `method` parameter value. | |
70 | |
71 :see: `XMLSerializer`, `XHTMLSerializer`, `HTMLSerializer`, `TextSerializer` | |
72 :since: version 0.4.1 | |
73 """ | |
74 if isinstance(method, basestring): | |
75 method = {'xml': XMLSerializer, | |
76 'xhtml': XHTMLSerializer, | |
77 'html': HTMLSerializer, | |
78 'text': TextSerializer}[method.lower()] | |
79 return method(**kwargs) | |
80 | |
1 | 81 |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
82 def _prepare_cache(use_cache=True): |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
83 """Prepare a private token serialization cache. |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
84 |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
85 :param use_cache: boolean indicating whether a real cache should |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
86 be used or not. If not, the returned functions |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
87 are no-ops. |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
88 |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
89 :return: emit and get functions, for storing and retrieving |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
90 serialized values from the cache. |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
91 """ |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
92 cache = {} |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
93 if use_cache: |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
94 def _emit(kind, input, output): |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
95 cache[kind, input] = output |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
96 return output |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
97 _get = cache.get |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
98 else: |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
99 def _emit(kind, input, output): |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
100 return output |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
101 def _get(key): |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
102 pass |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
103 return _emit, _get, cache |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
104 |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
105 |
85 | 106 class DocType(object): |
107 """Defines a number of commonly used DOCTYPE declarations as constants.""" | |
108 | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
109 HTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
110 'html', '-//W3C//DTD HTML 4.01//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
111 'http://www.w3.org/TR/html4/strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
112 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
113 HTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
114 'html', '-//W3C//DTD HTML 4.01 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
115 'http://www.w3.org/TR/html4/loose.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
116 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
117 HTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
118 'html', '-//W3C//DTD HTML 4.01 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
119 'http://www.w3.org/TR/html4/frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
120 ) |
85 | 121 HTML = HTML_STRICT |
122 | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
123 HTML5 = ('html', None, None) |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
124 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
125 XHTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
126 'html', '-//W3C//DTD XHTML 1.0 Strict//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
127 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
128 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
129 XHTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
130 'html', '-//W3C//DTD XHTML 1.0 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
131 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
132 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
133 XHTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
134 'html', '-//W3C//DTD XHTML 1.0 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
135 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
136 ) |
85 | 137 XHTML = XHTML_STRICT |
138 | |
729 | 139 XHTML11 = ( |
140 'html', '-//W3C//DTD XHTML 1.1//EN', | |
141 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd' | |
142 ) | |
143 | |
663 | 144 SVG_FULL = ( |
145 'svg', '-//W3C//DTD SVG 1.1//EN', | |
146 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd' | |
147 ) | |
148 SVG_BASIC = ( | |
149 'svg', '-//W3C//DTD SVG Basic 1.1//EN', | |
150 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd' | |
151 ) | |
152 SVG_TINY = ( | |
153 'svg', '-//W3C//DTD SVG Tiny 1.1//EN', | |
154 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd' | |
155 ) | |
156 SVG = SVG_FULL | |
157 | |
822
70fddd2262f5
Get rid of some Python 2.3 legacy that's no longer needed now that 2.4 is the baseline.
cmlenz
parents:
750
diff
changeset
|
158 @classmethod |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
159 def get(cls, name): |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
160 """Return the ``(name, pubid, sysid)`` tuple of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
161 declaration for the specified name. |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
162 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
163 The following names are recognized in this version: |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
164 * "html" or "html-strict" for the HTML 4.01 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
165 * "html-transitional" for the HTML 4.01 transitional DTD |
745 | 166 * "html-frameset" for the HTML 4.01 frameset DTD |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
167 * "html5" for the ``DOCTYPE`` proposed for HTML5 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
168 * "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
169 * "xhtml-transitional" for the XHTML 1.0 transitional DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
170 * "xhtml-frameset" for the XHTML 1.0 frameset DTD |
729 | 171 * "xhtml11" for the XHTML 1.1 DTD |
663 | 172 * "svg" or "svg-full" for the SVG 1.1 DTD |
173 * "svg-basic" for the SVG Basic 1.1 DTD | |
174 * "svg-tiny" for the SVG Tiny 1.1 DTD | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
175 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
176 :param name: the name of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
177 :return: the ``(name, pubid, sysid)`` tuple for the requested |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
178 ``DOCTYPE``, or ``None`` if the name is not recognized |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
179 :since: version 0.4.1 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
180 """ |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
181 return { |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
182 'html': cls.HTML, 'html-strict': cls.HTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
183 'html-transitional': DocType.HTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
184 'html-frameset': DocType.HTML_FRAMESET, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
185 'html5': cls.HTML5, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
186 'xhtml': cls.XHTML, 'xhtml-strict': cls.XHTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
187 'xhtml-transitional': cls.XHTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
188 'xhtml-frameset': cls.XHTML_FRAMESET, |
729 | 189 'xhtml11': cls.XHTML11, |
663 | 190 'svg': cls.SVG, 'svg-full': cls.SVG_FULL, |
191 'svg-basic': cls.SVG_BASIC, | |
192 'svg-tiny': cls.SVG_TINY | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
193 }.get(name.lower()) |
448 | 194 |
85 | 195 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
196 class XMLSerializer(object): |
1 | 197 """Produces XML text from an event stream. |
198 | |
230 | 199 >>> from genshi.builder import tag |
20 | 200 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
853
f33ecf3c319e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
852
diff
changeset
|
201 >>> print(''.join(XMLSerializer()(elem.generate()))) |
1 | 202 <div><a href="foo"/><br/><hr noshade="True"/></div> |
203 """ | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
204 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
205 _PRESERVE_SPACE = frozenset() |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
206 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
207 def __init__(self, doctype=None, strip_whitespace=True, |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
208 namespace_prefixes=None, cache=True): |
85 | 209 """Initialize the XML serializer. |
210 | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
211 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
212 DOCTYPE declaration that should be included at the top |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
213 of the generated output, or the name of a DOCTYPE as |
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
214 defined in `DocType.get` |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
215 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
216 stripped from the output |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
217 :param cache: whether to cache the text output per event, which |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
218 improves performance for repetitive markup |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
219 :note: Changed in 0.4.2: The `doctype` parameter can now be a string. |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
220 :note: Changed in 0.6: The `cache` parameter was added |
85 | 221 """ |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
222 self.filters = [EmptyTagFilter()] |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
223 if strip_whitespace: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
224 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
225 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes, |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
226 cache=cache)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
227 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
228 self.filters.append(DocTypeInserter(doctype)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
229 self.cache = cache |
1 | 230 |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
231 def _prepare_cache(self): |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
232 return _prepare_cache(self.cache)[:2] |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
233 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
234 def __call__(self, stream): |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
235 have_decl = have_doctype = False |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
236 in_cdata = False |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
237 _emit, _get = self._prepare_cache() |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
238 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
239 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
240 stream = filter_(stream) |
1 | 241 for kind, data, pos in stream: |
939
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
242 if kind is TEXT and isinstance(data, Markup): |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
243 yield data |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
244 continue |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
245 cached = _get((kind, data)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
246 if cached is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
247 yield cached |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
248 elif kind is START or kind is EMPTY: |
1 | 249 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
250 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
251 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
252 buf += [' ', attr, '="', escape(value), '"'] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
253 buf.append(kind is EMPTY and '/>' or '>') |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
254 yield _emit(kind, data, Markup(''.join(buf))) |
1 | 255 |
69 | 256 elif kind is END: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
257 yield _emit(kind, data, Markup('</%s>' % data)) |
1 | 258 |
69 | 259 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
260 if in_cdata: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
261 yield _emit(kind, data, data) |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
262 else: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
263 yield _emit(kind, data, escape(data, quotes=False)) |
1 | 264 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
265 elif kind is COMMENT: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
266 yield _emit(kind, data, Markup('<!--%s-->' % data)) |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
267 |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
268 elif kind is XML_DECL and not have_decl: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
269 version, encoding, standalone = data |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
270 buf = ['<?xml version="%s"' % version] |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
271 if encoding: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
272 buf.append(' encoding="%s"' % encoding) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
273 if standalone != -1: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
274 standalone = standalone and 'yes' or 'no' |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
275 buf.append(' standalone="%s"' % standalone) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
276 buf.append('?>\n') |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
277 yield Markup(''.join(buf)) |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
278 have_decl = True |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
279 |
136 | 280 elif kind is DOCTYPE and not have_doctype: |
281 name, pubid, sysid = data | |
282 buf = ['<!DOCTYPE %s'] | |
283 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
284 buf.append(' PUBLIC "%s"') |
136 | 285 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
286 buf.append(' SYSTEM') |
136 | 287 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
288 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
289 buf.append('>\n') |
854
4d9bef447df9
More work on reducing the size of the diff produced by 2to3.
cmlenz
parents:
853
diff
changeset
|
290 yield Markup(''.join(buf)) % tuple([p for p in data if p]) |
136 | 291 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
292 |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
293 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
294 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
295 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
296 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
297 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
298 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
299 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
300 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
301 elif kind is PI: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
302 yield _emit(kind, data, Markup('<?%s %s?>' % data)) |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
303 |
958
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
304 elif kind is ATTRS: |
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
305 # this is specifically to support the rendering of |
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
306 # streams generated by genshi.path.select() and provides |
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
307 # backwards compatibility with genshi < 0.7 |
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
308 yield data.concatenate_values() |
6fc92535c888
Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents:
939
diff
changeset
|
309 |
1 | 310 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
311 class XHTMLSerializer(XMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
312 """Produces XHTML text from an event stream. |
1 | 313 |
230 | 314 >>> from genshi.builder import tag |
20 | 315 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
853
f33ecf3c319e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
852
diff
changeset
|
316 >>> print(''.join(XHTMLSerializer()(elem.generate()))) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
317 <div><a href="foo"></a><br /><hr noshade="noshade" /></div> |
1 | 318 """ |
319 | |
320 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame', | |
321 'hr', 'img', 'input', 'isindex', 'link', 'meta', | |
322 'param']) | |
323 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare', | |
324 'defer', 'disabled', 'ismap', 'multiple', | |
325 'nohref', 'noresize', 'noshade', 'nowrap']) | |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
326 _PRESERVE_SPACE = frozenset([ |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
327 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'), |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
328 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea') |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
329 ]) |
1 | 330 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
331 def __init__(self, doctype=None, strip_whitespace=True, |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
332 namespace_prefixes=None, drop_xml_decl=True, cache=True): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
333 super(XHTMLSerializer, self).__init__(doctype, False) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
334 self.filters = [EmptyTagFilter()] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
335 if strip_whitespace: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
336 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
337 namespace_prefixes = namespace_prefixes or {} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
338 namespace_prefixes['http://www.w3.org/1999/xhtml'] = '' |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
339 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes, |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
340 cache=cache)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
341 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
342 self.filters.append(DocTypeInserter(doctype)) |
729 | 343 self.drop_xml_decl = drop_xml_decl |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
344 self.cache = cache |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
345 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
346 def __call__(self, stream): |
136 | 347 boolean_attrs = self._BOOLEAN_ATTRS |
348 empty_elems = self._EMPTY_ELEMS | |
729 | 349 drop_xml_decl = self.drop_xml_decl |
350 have_decl = have_doctype = False | |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
351 in_cdata = False |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
352 _emit, _get = self._prepare_cache() |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
353 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
354 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
355 stream = filter_(stream) |
1 | 356 for kind, data, pos in stream: |
939
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
357 if kind is TEXT and isinstance(data, Markup): |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
358 yield data |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
359 continue |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
360 cached = _get((kind, data)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
361 if cached is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
362 yield cached |
1 | 363 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
364 elif kind is START or kind is EMPTY: |
1 | 365 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
366 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
367 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
368 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
369 value = attr |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
370 elif attr == 'xml:lang' and 'lang' not in attrib: |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
371 buf += [' lang="', escape(value), '"'] |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
372 elif attr == 'xml:space': |
689
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
373 continue |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
374 buf += [' ', attr, '="', escape(value), '"'] |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
375 if kind is EMPTY: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
376 if tag in empty_elems: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
377 buf.append(' />') |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
378 else: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
379 buf.append('></%s>' % tag) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
380 else: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
381 buf.append('>') |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
382 yield _emit(kind, data, Markup(''.join(buf))) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
383 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
384 elif kind is END: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
385 yield _emit(kind, data, Markup('</%s>' % data)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
386 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
387 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
388 if in_cdata: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
389 yield _emit(kind, data, data) |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
390 else: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
391 yield _emit(kind, data, escape(data, quotes=False)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
392 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
393 elif kind is COMMENT: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
394 yield _emit(kind, data, Markup('<!--%s-->' % data)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
395 |
136 | 396 elif kind is DOCTYPE and not have_doctype: |
397 name, pubid, sysid = data | |
398 buf = ['<!DOCTYPE %s'] | |
399 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
400 buf.append(' PUBLIC "%s"') |
136 | 401 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
402 buf.append(' SYSTEM') |
136 | 403 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
404 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
405 buf.append('>\n') |
854
4d9bef447df9
More work on reducing the size of the diff produced by 2to3.
cmlenz
parents:
853
diff
changeset
|
406 yield Markup(''.join(buf)) % tuple([p for p in data if p]) |
136 | 407 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
408 |
729 | 409 elif kind is XML_DECL and not have_decl and not drop_xml_decl: |
410 version, encoding, standalone = data | |
411 buf = ['<?xml version="%s"' % version] | |
412 if encoding: | |
413 buf.append(' encoding="%s"' % encoding) | |
414 if standalone != -1: | |
415 standalone = standalone and 'yes' or 'no' | |
416 buf.append(' standalone="%s"' % standalone) | |
417 buf.append('?>\n') | |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
418 yield Markup(''.join(buf)) |
729 | 419 have_decl = True |
420 | |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
421 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
422 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
423 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
424 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
425 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
426 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
427 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
428 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
429 elif kind is PI: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
430 yield _emit(kind, data, Markup('<?%s %s?>' % data)) |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
431 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
432 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
433 class HTMLSerializer(XHTMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
434 """Produces HTML text from an event stream. |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
435 |
230 | 436 >>> from genshi.builder import tag |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
437 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
853
f33ecf3c319e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
852
diff
changeset
|
438 >>> print(''.join(HTMLSerializer()(elem.generate()))) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
439 <div><a href="foo"></a><br><hr noshade></div> |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
440 """ |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
441 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
442 _NOESCAPE_ELEMS = frozenset([ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
443 QName('script'), QName('http://www.w3.org/1999/xhtml}script'), |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
444 QName('style'), QName('http://www.w3.org/1999/xhtml}style') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
445 ]) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
446 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
447 def __init__(self, doctype=None, strip_whitespace=True, cache=True): |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
448 """Initialize the HTML serializer. |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
449 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
450 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
451 DOCTYPE declaration that should be included at the top |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
452 of the generated output |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
453 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
454 stripped from the output |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
455 :param cache: whether to cache the text output per event, which |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
456 improves performance for repetitive markup |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
457 :note: Changed in 0.6: The `cache` parameter was added |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
458 """ |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
459 super(HTMLSerializer, self).__init__(doctype, False) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
460 self.filters = [EmptyTagFilter()] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
461 if strip_whitespace: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
462 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE, |
305 | 463 self._NOESCAPE_ELEMS)) |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
464 self.filters.append(NamespaceFlattener(prefixes={ |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
465 'http://www.w3.org/1999/xhtml': '' |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
466 }, cache=cache)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
467 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
468 self.filters.append(DocTypeInserter(doctype)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
469 self.cache = True |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
470 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
471 def __call__(self, stream): |
136 | 472 boolean_attrs = self._BOOLEAN_ATTRS |
473 empty_elems = self._EMPTY_ELEMS | |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
474 noescape_elems = self._NOESCAPE_ELEMS |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
475 have_doctype = False |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
476 noescape = False |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
477 _emit, _get = self._prepare_cache() |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
478 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
479 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
480 stream = filter_(stream) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
481 for kind, data, _ in stream: |
939
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
482 if kind is TEXT and isinstance(data, Markup): |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
483 yield data |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
484 continue |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
485 output = _get((kind, data)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
486 if output is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
487 yield output |
831 | 488 if (kind is START or kind is EMPTY) \ |
489 and data[0] in noescape_elems: | |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
490 noescape = True |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
491 elif kind is END: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
492 noescape = False |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
493 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
494 elif kind is START or kind is EMPTY: |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
495 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
496 buf = ['<', tag] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
497 for attr, value in attrib: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
498 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
499 if value: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
500 buf += [' ', attr] |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
501 elif ':' in attr: |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
502 if attr == 'xml:lang' and 'lang' not in attrib: |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
503 buf += [' lang="', escape(value), '"'] |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
504 elif attr != 'xmlns': |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
505 buf += [' ', attr, '="', escape(value), '"'] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
506 buf.append('>') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
507 if kind is EMPTY: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
508 if tag not in empty_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
509 buf.append('</%s>' % tag) |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
510 yield _emit(kind, data, Markup(''.join(buf))) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
511 if tag in noescape_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
512 noescape = True |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
513 |
69 | 514 elif kind is END: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
515 yield _emit(kind, data, Markup('</%s>' % data)) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
516 noescape = False |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
517 |
69 | 518 elif kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
519 if noescape: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
520 yield _emit(kind, data, data) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
521 else: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
522 yield _emit(kind, data, escape(data, quotes=False)) |
1 | 523 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
524 elif kind is COMMENT: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
525 yield _emit(kind, data, Markup('<!--%s-->' % data)) |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
526 |
136 | 527 elif kind is DOCTYPE and not have_doctype: |
528 name, pubid, sysid = data | |
529 buf = ['<!DOCTYPE %s'] | |
530 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
531 buf.append(' PUBLIC "%s"') |
136 | 532 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
533 buf.append(' SYSTEM') |
136 | 534 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
535 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
536 buf.append('>\n') |
854
4d9bef447df9
More work on reducing the size of the diff produced by 2to3.
cmlenz
parents:
853
diff
changeset
|
537 yield Markup(''.join(buf)) % tuple([p for p in data if p]) |
136 | 538 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
539 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
540 elif kind is PI: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
541 yield _emit(kind, data, Markup('<?%s %s?>' % data)) |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
542 |
1 | 543 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
544 class TextSerializer(object): |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
545 """Produces plain text from an event stream. |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
546 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
547 Only text events are included in the output. Unlike the other serializer, |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
548 special XML characters are not escaped: |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
549 |
230 | 550 >>> from genshi.builder import tag |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
551 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br) |
853
f33ecf3c319e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
852
diff
changeset
|
552 >>> print(elem) |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
553 <div><a href="foo"><Hello!></a><br/></div> |
853
f33ecf3c319e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
852
diff
changeset
|
554 >>> print(''.join(TextSerializer()(elem.generate()))) |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
555 <Hello!> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
556 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
557 If text events contain literal markup (instances of the `Markup` class), |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
558 that markup is by default passed through unchanged: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
559 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
560 >>> elem = tag.div(Markup('<a href="foo">Hello & Bye!</a><br/>')) |
863 | 561 >>> print(elem.generate().render(TextSerializer, encoding=None)) |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
562 <a href="foo">Hello & Bye!</a><br/> |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
563 |
740
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
564 You can use the ``strip_markup`` to change this behavior, so that tags and |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
565 entities are stripped from the output (or in the case of entities, |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
566 replaced with the equivalent character): |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
567 |
863 | 568 >>> print(elem.generate().render(TextSerializer, strip_markup=True, |
569 ... encoding=None)) | |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
570 Hello & Bye! |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
571 """ |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
572 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
573 def __init__(self, strip_markup=False): |
740
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
574 """Create the serializer. |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
575 |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
576 :param strip_markup: whether markup (tags and encoded characters) found |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
577 in the text should be removed |
0c3a2d7bf9a1
Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents:
729
diff
changeset
|
578 """ |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
579 self.strip_markup = strip_markup |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
580 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
581 def __call__(self, stream): |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
582 strip_markup = self.strip_markup |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
583 for event in stream: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
584 if event[0] is TEXT: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
585 data = event[1] |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
586 if strip_markup and type(data) is Markup: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
587 data = data.striptags().stripentities() |
201
c5e0a1c86173
The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents:
200
diff
changeset
|
588 yield unicode(data) |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
589 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
590 |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
591 class EmptyTagFilter(object): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
592 """Combines `START` and `STOP` events into `EMPTY` events for elements that |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
593 have no contents. |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
594 """ |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
595 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
596 EMPTY = StreamEventKind('EMPTY') |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
597 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
598 def __call__(self, stream): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
599 prev = (None, None, None) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
600 for ev in stream: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
601 if prev[0] is START: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
602 if ev[0] is END: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
603 prev = EMPTY, prev[1], prev[2] |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
604 yield prev |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
605 continue |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
606 else: |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
607 yield prev |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
608 if ev[0] is not START: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
609 yield ev |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
610 prev = ev |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
611 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
612 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
613 EMPTY = EmptyTagFilter.EMPTY |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
614 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
615 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
616 class NamespaceFlattener(object): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
617 r"""Output stream filter that removes namespace information from the stream, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
618 instead adding namespace attributes and prefixes as needed. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
619 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
620 :param prefixes: optional mapping of namespace URIs to prefixes |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
621 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
622 >>> from genshi.input import XML |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
623 >>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2"> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
624 ... <two:item/> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
625 ... </doc>''') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
626 >>> for kind, data, pos in NamespaceFlattener()(xml): |
853
f33ecf3c319e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
852
diff
changeset
|
627 ... print('%s %r' % (kind, data)) |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
628 START (u'doc', Attrs([('xmlns', u'NS1'), (u'xmlns:two', u'NS2')])) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
629 TEXT u'\n ' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
630 START (u'two:item', Attrs()) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
631 END u'two:item' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
632 TEXT u'\n' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
633 END u'doc' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
634 """ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
635 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
636 def __init__(self, prefixes=None, cache=True): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
637 self.prefixes = {XML_NAMESPACE.uri: 'xml'} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
638 if prefixes is not None: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
639 self.prefixes.update(prefixes) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
640 self.cache = cache |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
641 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
642 def __call__(self, stream): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
643 prefixes = dict([(v, [k]) for k, v in self.prefixes.items()]) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
644 namespaces = {XML_NAMESPACE.uri: ['xml']} |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
645 _emit, _get, cache = _prepare_cache(self.cache) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
646 def _push_ns(prefix, uri): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
647 namespaces.setdefault(uri, []).append(prefix) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
648 prefixes.setdefault(prefix, []).append(uri) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
649 cache.clear() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
650 def _pop_ns(prefix): |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
651 uris = prefixes.get(prefix) |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
652 uri = uris.pop() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
653 if not uris: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
654 del prefixes[prefix] |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
655 if uri not in uris or uri != uris[-1]: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
656 uri_prefixes = namespaces[uri] |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
657 uri_prefixes.pop() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
658 if not uri_prefixes: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
659 del namespaces[uri] |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
660 cache.clear() |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
661 return uri |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
662 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
663 ns_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
664 _push_ns_attr = ns_attrs.append |
437 | 665 def _make_ns_attr(prefix, uri): |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
666 return 'xmlns%s' % (prefix and ':%s' % prefix or ''), uri |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
667 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
668 def _gen_prefix(): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
669 val = 0 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
670 while 1: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
671 val += 1 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
672 yield 'ns%d' % val |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
673 _gen_prefix = _gen_prefix().next |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
674 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
675 for kind, data, pos in stream: |
939
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
676 if kind is TEXT and isinstance(data, Markup): |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
677 yield kind, data, pos |
f15334b65cf8
Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents:
938
diff
changeset
|
678 continue |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
679 output = _get((kind, data)) |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
680 if output is not None: |
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
681 yield kind, output, pos |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
682 |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
683 elif kind is START or kind is EMPTY: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
684 tag, attrs = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
685 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
686 tagname = tag.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
687 tagns = tag.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
688 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
689 if tagns in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
690 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
691 if prefix: |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
692 tagname = '%s:%s' % (prefix, tagname) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
693 else: |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
694 _push_ns_attr(('xmlns', tagns)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
695 _push_ns('', tagns) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
696 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
697 new_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
698 for attr, value in attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
699 attrname = attr.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
700 attrns = attr.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
701 if attrns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
702 if attrns not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
703 prefix = _gen_prefix() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
704 _push_ns(prefix, attrns) |
412
bd51adc20a67
Actually write xmlns declaratons for generated attribute namespace prefixes.
cmlenz
parents:
410
diff
changeset
|
705 _push_ns_attr(('xmlns:%s' % prefix, attrns)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
706 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
707 prefix = namespaces[attrns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
708 if prefix: |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
709 attrname = '%s:%s' % (prefix, attrname) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
710 new_attrs.append((attrname, value)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
711 |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
712 data = _emit(kind, data, (tagname, Attrs(ns_attrs + new_attrs))) |
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
713 yield kind, data, pos |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
714 del ns_attrs[:] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
715 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
716 elif kind is END: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
717 tagname = data.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
718 tagns = data.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
719 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
720 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
721 if prefix: |
852
07f4339fecb0
Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents:
831
diff
changeset
|
722 tagname = '%s:%s' % (prefix, tagname) |
938
8d0f693081b5
Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents:
932
diff
changeset
|
723 yield kind, _emit(kind, data, tagname), pos |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
724 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
725 elif kind is START_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
726 prefix, uri = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
727 if uri not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
728 prefix = prefixes.get(uri, [prefix])[-1] |
437 | 729 _push_ns_attr(_make_ns_attr(prefix, uri)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
730 _push_ns(prefix, uri) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
731 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
732 elif kind is END_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
733 if data in prefixes: |
829
6e46513e1c5c
Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents:
822
diff
changeset
|
734 uri = _pop_ns(data) |
437 | 735 if ns_attrs: |
736 attr = _make_ns_attr(data, uri) | |
737 if attr in ns_attrs: | |
738 ns_attrs.remove(attr) | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
739 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
740 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
741 yield kind, data, pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
742 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
743 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
744 class WhitespaceFilter(object): |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
745 """A filter that removes extraneous ignorable white space from the |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
746 stream. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
747 """ |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
748 |
305 | 749 def __init__(self, preserve=None, noescape=None): |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
750 """Initialize the filter. |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
751 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
752 :param preserve: a set or sequence of tag names for which white-space |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
753 should be preserved |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
754 :param noescape: a set or sequence of tag names for which text content |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
755 should not be escaped |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
756 |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
757 The `noescape` set is expected to refer to elements that cannot contain |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
758 further child elements (such as ``<style>`` or ``<script>`` in HTML |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
759 documents). |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
760 """ |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
761 if preserve is None: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
762 preserve = [] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
763 self.preserve = frozenset(preserve) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
764 if noescape is None: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
765 noescape = [] |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
766 self.noescape = frozenset(noescape) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
767 |
219 | 768 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'], |
769 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub, | |
770 collapse_lines=re.compile('\n{2,}').sub): | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
771 mjoin = Markup('').join |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
772 preserve_elems = self.preserve |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
773 preserve = 0 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
774 noescape_elems = self.noescape |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
775 noescape = False |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
776 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
777 textbuf = [] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
778 push_text = textbuf.append |
136 | 779 pop_text = textbuf.pop |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
780 for kind, data, pos in chain(stream, [(None, None, None)]): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
781 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
782 if kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
783 if noescape: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
784 data = Markup(data) |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
785 push_text(data) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
786 else: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
787 if textbuf: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
788 if len(textbuf) > 1: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
789 text = mjoin(textbuf, escape_quotes=False) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
790 del textbuf[:] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
791 else: |
136 | 792 text = escape(pop_text(), quotes=False) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
793 if not preserve: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
794 text = collapse_lines('\n', trim_trailing_space('', text)) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
795 yield TEXT, Markup(text), pos |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
796 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
797 if kind is START: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
798 tag, attrs = data |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
799 if preserve or (tag in preserve_elems or |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
800 attrs.get(space) == 'preserve'): |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
801 preserve += 1 |
219 | 802 if not noescape and tag in noescape_elems: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
803 noescape = True |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
804 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
805 elif kind is END: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
806 noescape = False |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
807 if preserve: |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
808 preserve -= 1 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
809 |
305 | 810 elif kind is START_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
811 noescape = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
812 |
305 | 813 elif kind is END_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
814 noescape = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
815 |
136 | 816 if kind: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
817 yield kind, data, pos |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
818 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
819 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
820 class DocTypeInserter(object): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
821 """A filter that inserts the DOCTYPE declaration in the correct location, |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
822 after the XML declaration. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
823 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
824 def __init__(self, doctype): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
825 """Initialize the filter. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
826 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
827 :param doctype: DOCTYPE as a string or DocType object. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
828 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
829 if isinstance(doctype, basestring): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
830 doctype = DocType.get(doctype) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
831 self.doctype_event = (DOCTYPE, doctype, (None, -1, -1)) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
832 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
833 def __call__(self, stream): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
834 doctype_inserted = False |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
835 for kind, data, pos in stream: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
836 if not doctype_inserted: |
672
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
837 doctype_inserted = True |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
838 if kind is XML_DECL: |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
839 yield (kind, data, pos) |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
840 yield self.doctype_event |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
841 continue |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
842 yield self.doctype_event |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
843 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
844 yield (kind, data, pos) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
845 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
846 if not doctype_inserted: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
847 yield self.doctype_event |