Mercurial > genshi > mirror
annotate genshi/output.py @ 713:5420fe9d99a9 trunk
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
author | cmlenz |
---|---|
date | Tue, 08 Apr 2008 18:18:18 +0000 |
parents | 3881a602048a |
children | 4bc6741b2811 |
rev | line source |
---|---|
1 | 1 # -*- coding: utf-8 -*- |
2 # | |
408 | 3 # Copyright (C) 2006-2007 Edgewall Software |
1 | 4 # All rights reserved. |
5 # | |
6 # This software is licensed as described in the file COPYING, which | |
7 # you should have received as part of this distribution. The terms | |
230 | 8 # are also available at http://genshi.edgewall.org/wiki/License. |
1 | 9 # |
10 # This software consists of voluntary contributions made by many | |
11 # individuals. For the exact contribution history, see the revision | |
230 | 12 # history and logs, available at http://genshi.edgewall.org/log/. |
1 | 13 |
14 """This module provides different kinds of serialization methods for XML event | |
15 streams. | |
16 """ | |
17 | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
18 from itertools import chain |
1 | 19 try: |
20 frozenset | |
21 except NameError: | |
22 from sets import ImmutableSet as frozenset | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
23 import re |
1 | 24 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
25 from genshi.core import escape, Attrs, Markup, Namespace, QName, StreamEventKind |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
26 from genshi.core import START, END, TEXT, XML_DECL, DOCTYPE, START_NS, END_NS, \ |
402
c199e9b95884
Fix output of namespace declarations for namespace URLs appearing more than once in a stream. Thanks to Jeff Cutsinger for reporting the problem.
cmlenz
parents:
397
diff
changeset
|
27 START_CDATA, END_CDATA, PI, COMMENT, XML_NAMESPACE |
1 | 28 |
462 | 29 __all__ = ['encode', 'get_serializer', 'DocType', 'XMLSerializer', |
30 'XHTMLSerializer', 'HTMLSerializer', 'TextSerializer'] | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
31 __docformat__ = 'restructuredtext en' |
1 | 32 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
33 def encode(iterator, method='xml', encoding='utf-8', out=None): |
462 | 34 """Encode serializer output into a string. |
35 | |
36 :param iterator: the iterator returned from serializing a stream (basically | |
37 any iterator that yields unicode objects) | |
38 :param method: the serialization method; determines how characters not | |
39 representable in the specified encoding are treated | |
40 :param encoding: how the output string should be encoded; if set to `None`, | |
41 this method returns a `unicode` object | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
42 :param out: a file-like object that the output should be written to |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
43 instead of being returned as one big string; note that if |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
44 this is a file or socket (or similar), the `encoding` must |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
45 not be `None` (that is, the output must be encoded) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
46 :return: a `str` or `unicode` object (depending on the `encoding` |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
47 parameter), or `None` if the `out` parameter is provided |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
48 |
462 | 49 :since: version 0.4.1 |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
50 :note: Changed in 0.5: added the `out` parameter |
462 | 51 """ |
52 if encoding is not None: | |
53 errors = 'replace' | |
54 if method != 'text' and not isinstance(method, TextSerializer): | |
55 errors = 'xmlcharrefreplace' | |
688
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
56 _encode = lambda string: string.encode(encoding, errors) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
57 else: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
58 _encode = lambda string: string |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
59 if out is None: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
60 return _encode(u''.join(list(iterator))) |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
61 for chunk in iterator: |
d8571da25bc5
The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents:
672
diff
changeset
|
62 out.write(_encode(chunk)) |
462 | 63 |
64 def get_serializer(method='xml', **kwargs): | |
65 """Return a serializer object for the given method. | |
66 | |
67 :param method: the serialization method; can be either "xml", "xhtml", | |
68 "html", "text", or a custom serializer class | |
69 | |
70 Any additional keyword arguments are passed to the serializer, and thus | |
71 depend on the `method` parameter value. | |
72 | |
73 :see: `XMLSerializer`, `XHTMLSerializer`, `HTMLSerializer`, `TextSerializer` | |
74 :since: version 0.4.1 | |
75 """ | |
76 if isinstance(method, basestring): | |
77 method = {'xml': XMLSerializer, | |
78 'xhtml': XHTMLSerializer, | |
79 'html': HTMLSerializer, | |
80 'text': TextSerializer}[method.lower()] | |
81 return method(**kwargs) | |
82 | |
1 | 83 |
85 | 84 class DocType(object): |
85 """Defines a number of commonly used DOCTYPE declarations as constants.""" | |
86 | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
87 HTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
88 'html', '-//W3C//DTD HTML 4.01//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
89 'http://www.w3.org/TR/html4/strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
90 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
91 HTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
92 'html', '-//W3C//DTD HTML 4.01 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
93 'http://www.w3.org/TR/html4/loose.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
94 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
95 HTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
96 'html', '-//W3C//DTD HTML 4.01 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
97 'http://www.w3.org/TR/html4/frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
98 ) |
85 | 99 HTML = HTML_STRICT |
100 | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
101 HTML5 = ('html', None, None) |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
102 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
103 XHTML_STRICT = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
104 'html', '-//W3C//DTD XHTML 1.0 Strict//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
105 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
106 ) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
107 XHTML_TRANSITIONAL = ( |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
108 'html', '-//W3C//DTD XHTML 1.0 Transitional//EN', |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
109 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
110 ) |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
111 XHTML_FRAMESET = ( |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
112 'html', '-//W3C//DTD XHTML 1.0 Frameset//EN', |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
113 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd' |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
114 ) |
85 | 115 XHTML = XHTML_STRICT |
116 | |
663 | 117 SVG_FULL = ( |
118 'svg', '-//W3C//DTD SVG 1.1//EN', | |
119 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd' | |
120 ) | |
121 SVG_BASIC = ( | |
122 'svg', '-//W3C//DTD SVG Basic 1.1//EN', | |
123 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd' | |
124 ) | |
125 SVG_TINY = ( | |
126 'svg', '-//W3C//DTD SVG Tiny 1.1//EN', | |
127 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd' | |
128 ) | |
129 SVG = SVG_FULL | |
130 | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
131 def get(cls, name): |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
132 """Return the ``(name, pubid, sysid)`` tuple of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
133 declaration for the specified name. |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
134 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
135 The following names are recognized in this version: |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
136 * "html" or "html-strict" for the HTML 4.01 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
137 * "html-transitional" for the HTML 4.01 transitional DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
138 * "html-transitional" for the HTML 4.01 frameset DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
139 * "html5" for the ``DOCTYPE`` proposed for HTML5 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
140 * "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
141 * "xhtml-transitional" for the XHTML 1.0 transitional DTD |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
142 * "xhtml-frameset" for the XHTML 1.0 frameset DTD |
663 | 143 * "svg" or "svg-full" for the SVG 1.1 DTD |
144 * "svg-basic" for the SVG Basic 1.1 DTD | |
145 * "svg-tiny" for the SVG Tiny 1.1 DTD | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
146 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
147 :param name: the name of the ``DOCTYPE`` |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
148 :return: the ``(name, pubid, sysid)`` tuple for the requested |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
149 ``DOCTYPE``, or ``None`` if the name is not recognized |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
150 :since: version 0.4.1 |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
151 """ |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
152 return { |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
153 'html': cls.HTML, 'html-strict': cls.HTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
154 'html-transitional': DocType.HTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
155 'html-frameset': DocType.HTML_FRAMESET, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
156 'html5': cls.HTML5, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
157 'xhtml': cls.XHTML, 'xhtml-strict': cls.XHTML_STRICT, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
158 'xhtml-transitional': cls.XHTML_TRANSITIONAL, |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
159 'xhtml-frameset': cls.XHTML_FRAMESET, |
663 | 160 'svg': cls.SVG, 'svg-full': cls.SVG_FULL, |
161 'svg-basic': cls.SVG_BASIC, | |
162 'svg-tiny': cls.SVG_TINY | |
464
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
163 }.get(name.lower()) |
2f13c5fc4a4d
Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents:
462
diff
changeset
|
164 get = classmethod(get) |
448 | 165 |
85 | 166 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
167 class XMLSerializer(object): |
1 | 168 """Produces XML text from an event stream. |
169 | |
230 | 170 >>> from genshi.builder import tag |
20 | 171 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
172 >>> print ''.join(XMLSerializer()(elem.generate())) |
1 | 173 <div><a href="foo"/><br/><hr noshade="True"/></div> |
174 """ | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
175 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
176 _PRESERVE_SPACE = frozenset() |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
177 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
178 def __init__(self, doctype=None, strip_whitespace=True, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
179 namespace_prefixes=None): |
85 | 180 """Initialize the XML serializer. |
181 | |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
182 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
183 DOCTYPE declaration that should be included at the top |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
184 of the generated output, or the name of a DOCTYPE as |
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
185 defined in `DocType.get` |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
186 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
187 stripped from the output |
494
942d73ba938c
The `doctype` parameter for serializers can now be a string.
cmlenz
parents:
464
diff
changeset
|
188 :note: Changed in 0.4.2: The `doctype` parameter can now be a string. |
85 | 189 """ |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
190 self.filters = [EmptyTagFilter()] |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
191 if strip_whitespace: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
192 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
193 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
194 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
195 self.filters.append(DocTypeInserter(doctype)) |
1 | 196 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
197 def __call__(self, stream): |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
198 have_decl = have_doctype = False |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
199 in_cdata = False |
1 | 200 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
201 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
202 stream = filter_(stream) |
1 | 203 for kind, data, pos in stream: |
204 | |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
205 if kind is START or kind is EMPTY: |
1 | 206 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
207 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
208 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
209 buf += [' ', attr, '="', escape(value), '"'] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
210 buf.append(kind is EMPTY and '/>' or '>') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
211 yield Markup(u''.join(buf)) |
1 | 212 |
69 | 213 elif kind is END: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
214 yield Markup('</%s>' % data) |
1 | 215 |
69 | 216 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
217 if in_cdata: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
218 yield data |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
219 else: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
220 yield escape(data, quotes=False) |
1 | 221 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
222 elif kind is COMMENT: |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
223 yield Markup('<!--%s-->' % data) |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
224 |
460
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
225 elif kind is XML_DECL and not have_decl: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
226 version, encoding, standalone = data |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
227 buf = ['<?xml version="%s"' % version] |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
228 if encoding: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
229 buf.append(' encoding="%s"' % encoding) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
230 if standalone != -1: |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
231 standalone = standalone and 'yes' or 'no' |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
232 buf.append(' standalone="%s"' % standalone) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
233 buf.append('?>\n') |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
234 yield Markup(u''.join(buf)) |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
235 have_decl = True |
75425671b437
Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents:
448
diff
changeset
|
236 |
136 | 237 elif kind is DOCTYPE and not have_doctype: |
238 name, pubid, sysid = data | |
239 buf = ['<!DOCTYPE %s'] | |
240 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
241 buf.append(' PUBLIC "%s"') |
136 | 242 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
243 buf.append(' SYSTEM') |
136 | 244 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
245 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
246 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
247 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 248 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
249 |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
250 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
251 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
252 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
253 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
254 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
255 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
256 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
257 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
258 elif kind is PI: |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
259 yield Markup('<?%s %s?>' % data) |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
260 |
1 | 261 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
262 class XHTMLSerializer(XMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
263 """Produces XHTML text from an event stream. |
1 | 264 |
230 | 265 >>> from genshi.builder import tag |
20 | 266 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
267 >>> print ''.join(XHTMLSerializer()(elem.generate())) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
268 <div><a href="foo"></a><br /><hr noshade="noshade" /></div> |
1 | 269 """ |
270 | |
271 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame', | |
272 'hr', 'img', 'input', 'isindex', 'link', 'meta', | |
273 'param']) | |
274 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare', | |
275 'defer', 'disabled', 'ismap', 'multiple', | |
276 'nohref', 'noresize', 'noshade', 'nowrap']) | |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
277 _PRESERVE_SPACE = frozenset([ |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
278 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'), |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
279 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea') |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
280 ]) |
1 | 281 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
282 def __init__(self, doctype=None, strip_whitespace=True, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
283 namespace_prefixes=None): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
284 super(XHTMLSerializer, self).__init__(doctype, False) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
285 self.filters = [EmptyTagFilter()] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
286 if strip_whitespace: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
287 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
288 namespace_prefixes = namespace_prefixes or {} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
289 namespace_prefixes['http://www.w3.org/1999/xhtml'] = '' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
290 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes)) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
291 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
292 self.filters.append(DocTypeInserter(doctype)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
293 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
294 def __call__(self, stream): |
136 | 295 boolean_attrs = self._BOOLEAN_ATTRS |
296 empty_elems = self._EMPTY_ELEMS | |
85 | 297 have_doctype = False |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
298 in_cdata = False |
1 | 299 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
300 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
301 stream = filter_(stream) |
1 | 302 for kind, data, pos in stream: |
303 | |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
304 if kind is START or kind is EMPTY: |
1 | 305 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
306 buf = ['<', tag] |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
307 for attr, value in attrib: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
308 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
309 value = attr |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
310 elif attr == u'xml:lang' and u'lang' not in attrib: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
311 buf += [' lang="', escape(value), '"'] |
689
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
312 elif attr == u'xml:space': |
3881a602048a
The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents:
688
diff
changeset
|
313 continue |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
314 buf += [' ', attr, '="', escape(value), '"'] |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
315 if kind is EMPTY: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
316 if tag in empty_elems: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
317 buf.append(' />') |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
318 else: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
319 buf.append('></%s>' % tag) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
320 else: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
321 buf.append('>') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
322 yield Markup(u''.join(buf)) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
323 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
324 elif kind is END: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
325 yield Markup('</%s>' % data) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
326 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
327 elif kind is TEXT: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
328 if in_cdata: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
329 yield data |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
330 else: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
331 yield escape(data, quotes=False) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
332 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
333 elif kind is COMMENT: |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
334 yield Markup('<!--%s-->' % data) |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
335 |
136 | 336 elif kind is DOCTYPE and not have_doctype: |
337 name, pubid, sysid = data | |
338 buf = ['<!DOCTYPE %s'] | |
339 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
340 buf.append(' PUBLIC "%s"') |
136 | 341 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
342 buf.append(' SYSTEM') |
136 | 343 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
344 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
345 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
346 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 347 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
348 |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
349 elif kind is START_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
350 yield Markup('<![CDATA[') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
351 in_cdata = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
352 |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
353 elif kind is END_CDATA: |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
354 yield Markup(']]>') |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
355 in_cdata = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
356 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
357 elif kind is PI: |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
358 yield Markup('<?%s %s?>' % data) |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
359 |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
360 |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
361 class HTMLSerializer(XHTMLSerializer): |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
362 """Produces HTML text from an event stream. |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
363 |
230 | 364 >>> from genshi.builder import tag |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
365 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True)) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
366 >>> print ''.join(HTMLSerializer()(elem.generate())) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
367 <div><a href="foo"></a><br><hr noshade></div> |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
368 """ |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
369 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
370 _NOESCAPE_ELEMS = frozenset([ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
371 QName('script'), QName('http://www.w3.org/1999/xhtml}script'), |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
372 QName('style'), QName('http://www.w3.org/1999/xhtml}style') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
373 ]) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
374 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
375 def __init__(self, doctype=None, strip_whitespace=True): |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
376 """Initialize the HTML serializer. |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
377 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
378 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
379 DOCTYPE declaration that should be included at the top |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
380 of the generated output |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
381 :param strip_whitespace: whether extraneous whitespace should be |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
382 stripped from the output |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
383 """ |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
384 super(HTMLSerializer, self).__init__(doctype, False) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
385 self.filters = [EmptyTagFilter()] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
386 if strip_whitespace: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
387 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE, |
305 | 388 self._NOESCAPE_ELEMS)) |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
389 self.filters.append(NamespaceFlattener(prefixes={ |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
390 'http://www.w3.org/1999/xhtml': '' |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
391 })) |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
392 if doctype: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
393 self.filters.append(DocTypeInserter(doctype)) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
394 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
395 def __call__(self, stream): |
136 | 396 boolean_attrs = self._BOOLEAN_ATTRS |
397 empty_elems = self._EMPTY_ELEMS | |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
398 noescape_elems = self._NOESCAPE_ELEMS |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
399 have_doctype = False |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
400 noescape = False |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
401 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
402 for filter_ in self.filters: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
403 stream = filter_(stream) |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
404 for kind, data, pos in stream: |
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
405 |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
406 if kind is START or kind is EMPTY: |
96
fa08aef181a2
Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents:
89
diff
changeset
|
407 tag, attrib = data |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
408 buf = ['<', tag] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
409 for attr, value in attrib: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
410 if attr in boolean_attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
411 if value: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
412 buf += [' ', attr] |
524
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
413 elif ':' in attr: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
414 if attr == 'xml:lang' and u'lang' not in attrib: |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
415 buf += [' lang="', escape(value), '"'] |
7553760b58af
Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents:
494
diff
changeset
|
416 elif attr != 'xmlns': |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
417 buf += [' ', attr, '="', escape(value), '"'] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
418 buf.append('>') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
419 if kind is EMPTY: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
420 if tag not in empty_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
421 buf.append('</%s>' % tag) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
422 yield Markup(u''.join(buf)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
423 if tag in noescape_elems: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
424 noescape = True |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
425 |
69 | 426 elif kind is END: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
427 yield Markup('</%s>' % data) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
428 noescape = False |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
429 |
69 | 430 elif kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
431 if noescape: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
432 yield data |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
433 else: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
434 yield escape(data, quotes=False) |
1 | 435 |
89
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
436 elif kind is COMMENT: |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
437 yield Markup('<!--%s-->' % data) |
80386d62814f
Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents:
85
diff
changeset
|
438 |
136 | 439 elif kind is DOCTYPE and not have_doctype: |
440 name, pubid, sysid = data | |
441 buf = ['<!DOCTYPE %s'] | |
442 if pubid: | |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
443 buf.append(' PUBLIC "%s"') |
136 | 444 elif sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
445 buf.append(' SYSTEM') |
136 | 446 if sysid: |
397
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
447 buf.append(' "%s"') |
31742fe6d47e
* Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents:
346
diff
changeset
|
448 buf.append('>\n') |
713
5420fe9d99a9
The `Markup` class now supports mappings for right hand of the `%` (modulo) operator in the same way the Python string classes do, except that the substituted values are escape. Also, the special constructor which took positional arguments that would be substituted was removed. Thus the `Markup` class now supports the same arguments as that of its `unicode` base class. Closes #211. Many thanks to Christian Boos for the patch!
cmlenz
parents:
689
diff
changeset
|
449 yield Markup(u''.join(buf)) % filter(None, data) |
136 | 450 have_doctype = True |
109
230ee6a2c6b2
Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents:
105
diff
changeset
|
451 |
105
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
452 elif kind is PI: |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
453 yield Markup('<?%s %s?>' % data) |
71f3db26eecb
Include processing instructions in serialized streams.
cmlenz
parents:
96
diff
changeset
|
454 |
1 | 455 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
456 class TextSerializer(object): |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
457 """Produces plain text from an event stream. |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
458 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
459 Only text events are included in the output. Unlike the other serializer, |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
460 special XML characters are not escaped: |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
461 |
230 | 462 >>> from genshi.builder import tag |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
463 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br) |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
464 >>> print elem |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
465 <div><a href="foo"><Hello!></a><br/></div> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
466 >>> print ''.join(TextSerializer()(elem.generate())) |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
467 <Hello!> |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
468 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
469 If text events contain literal markup (instances of the `Markup` class), |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
470 that markup is by default passed through unchanged: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
471 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
472 >>> elem = tag.div(Markup('<a href="foo">Hello & Bye!</a><br/>')) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
473 >>> print elem.generate().render(TextSerializer) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
474 <a href="foo">Hello & Bye!</a><br/> |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
475 |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
476 You can use the `strip_markup` to change this behavior, so that tags and |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
477 entities are stripped from the output (or in the case of entities, |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
478 replaced with the equivalent character): |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
479 |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
480 >>> print elem.generate().render(TextSerializer, strip_markup=True) |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
481 Hello & Bye! |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
482 """ |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
483 |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
484 def __init__(self, strip_markup=False): |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
485 self.strip_markup = strip_markup |
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
486 |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
487 def __call__(self, stream): |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
488 strip_markup = self.strip_markup |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
489 for event in stream: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
490 if event[0] is TEXT: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
491 data = event[1] |
658
5df08e5195b8
The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents:
524
diff
changeset
|
492 if strip_markup and type(data) is Markup: |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
493 data = data.striptags().stripentities() |
201
c5e0a1c86173
The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents:
200
diff
changeset
|
494 yield unicode(data) |
200
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
495 |
5861f4446c26
Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents:
178
diff
changeset
|
496 |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
497 class EmptyTagFilter(object): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
498 """Combines `START` and `STOP` events into `EMPTY` events for elements that |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
499 have no contents. |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
500 """ |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
501 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
502 EMPTY = StreamEventKind('EMPTY') |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
503 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
504 def __call__(self, stream): |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
505 prev = (None, None, None) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
506 for ev in stream: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
507 if prev[0] is START: |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
508 if ev[0] is END: |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
509 prev = EMPTY, prev[1], prev[2] |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
510 yield prev |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
511 continue |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
512 else: |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
513 yield prev |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
514 if ev[0] is not START: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
515 yield ev |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
516 prev = ev |
212
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
517 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
518 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
519 EMPTY = EmptyTagFilter.EMPTY |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
520 |
0141f45c18e1
Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents:
201
diff
changeset
|
521 |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
522 class NamespaceFlattener(object): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
523 r"""Output stream filter that removes namespace information from the stream, |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
524 instead adding namespace attributes and prefixes as needed. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
525 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
526 :param prefixes: optional mapping of namespace URIs to prefixes |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
527 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
528 >>> from genshi.input import XML |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
529 >>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2"> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
530 ... <two:item/> |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
531 ... </doc>''') |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
532 >>> for kind, data, pos in NamespaceFlattener()(xml): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
533 ... print kind, repr(data) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
534 START (u'doc', Attrs([(u'xmlns', u'NS1'), (u'xmlns:two', u'NS2')])) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
535 TEXT u'\n ' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
536 START (u'two:item', Attrs()) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
537 END u'two:item' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
538 TEXT u'\n' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
539 END u'doc' |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
540 """ |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
541 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
542 def __init__(self, prefixes=None): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
543 self.prefixes = {XML_NAMESPACE.uri: 'xml'} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
544 if prefixes is not None: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
545 self.prefixes.update(prefixes) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
546 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
547 def __call__(self, stream): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
548 prefixes = dict([(v, [k]) for k, v in self.prefixes.items()]) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
549 namespaces = {XML_NAMESPACE.uri: ['xml']} |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
550 def _push_ns(prefix, uri): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
551 namespaces.setdefault(uri, []).append(prefix) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
552 prefixes.setdefault(prefix, []).append(uri) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
553 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
554 ns_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
555 _push_ns_attr = ns_attrs.append |
437 | 556 def _make_ns_attr(prefix, uri): |
557 return u'xmlns%s' % (prefix and ':%s' % prefix or ''), uri | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
558 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
559 def _gen_prefix(): |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
560 val = 0 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
561 while 1: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
562 val += 1 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
563 yield 'ns%d' % val |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
564 _gen_prefix = _gen_prefix().next |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
565 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
566 for kind, data, pos in stream: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
567 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
568 if kind is START or kind is EMPTY: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
569 tag, attrs = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
570 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
571 tagname = tag.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
572 tagns = tag.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
573 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
574 if tagns in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
575 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
576 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
577 tagname = u'%s:%s' % (prefix, tagname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
578 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
579 _push_ns_attr((u'xmlns', tagns)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
580 _push_ns('', tagns) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
581 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
582 new_attrs = [] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
583 for attr, value in attrs: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
584 attrname = attr.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
585 attrns = attr.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
586 if attrns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
587 if attrns not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
588 prefix = _gen_prefix() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
589 _push_ns(prefix, attrns) |
412
bd51adc20a67
Actually write xmlns declaratons for generated attribute namespace prefixes.
cmlenz
parents:
410
diff
changeset
|
590 _push_ns_attr(('xmlns:%s' % prefix, attrns)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
591 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
592 prefix = namespaces[attrns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
593 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
594 attrname = u'%s:%s' % (prefix, attrname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
595 new_attrs.append((attrname, value)) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
596 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
597 yield kind, (tagname, Attrs(ns_attrs + new_attrs)), pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
598 del ns_attrs[:] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
599 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
600 elif kind is END: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
601 tagname = data.localname |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
602 tagns = data.namespace |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
603 if tagns: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
604 prefix = namespaces[tagns][-1] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
605 if prefix: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
606 tagname = u'%s:%s' % (prefix, tagname) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
607 yield kind, tagname, pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
608 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
609 elif kind is START_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
610 prefix, uri = data |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
611 if uri not in namespaces: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
612 prefix = prefixes.get(uri, [prefix])[-1] |
437 | 613 _push_ns_attr(_make_ns_attr(prefix, uri)) |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
614 _push_ns(prefix, uri) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
615 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
616 elif kind is END_NS: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
617 if data in prefixes: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
618 uris = prefixes.get(data) |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
619 uri = uris.pop() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
620 if not uris: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
621 del prefixes[data] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
622 if uri not in uris or uri != uris[-1]: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
623 uri_prefixes = namespaces[uri] |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
624 uri_prefixes.pop() |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
625 if not uri_prefixes: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
626 del namespaces[uri] |
437 | 627 if ns_attrs: |
628 attr = _make_ns_attr(data, uri) | |
629 if attr in ns_attrs: | |
630 ns_attrs.remove(attr) | |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
631 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
632 else: |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
633 yield kind, data, pos |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
634 |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
635 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
636 class WhitespaceFilter(object): |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
637 """A filter that removes extraneous ignorable white space from the |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
638 stream. |
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
639 """ |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
640 |
305 | 641 def __init__(self, preserve=None, noescape=None): |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
642 """Initialize the filter. |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
643 |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
644 :param preserve: a set or sequence of tag names for which white-space |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
645 should be preserved |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
646 :param noescape: a set or sequence of tag names for which text content |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
647 should not be escaped |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
648 |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
649 The `noescape` set is expected to refer to elements that cannot contain |
425
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
650 further child elements (such as ``<style>`` or ``<script>`` in HTML |
073640758a42
Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents:
412
diff
changeset
|
651 documents). |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
652 """ |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
653 if preserve is None: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
654 preserve = [] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
655 self.preserve = frozenset(preserve) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
656 if noescape is None: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
657 noescape = [] |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
658 self.noescape = frozenset(noescape) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
659 |
219 | 660 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'], |
661 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub, | |
662 collapse_lines=re.compile('\n{2,}').sub): | |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
663 mjoin = Markup('').join |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
664 preserve_elems = self.preserve |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
665 preserve = 0 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
666 noescape_elems = self.noescape |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
667 noescape = False |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
668 |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
669 textbuf = [] |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
670 push_text = textbuf.append |
136 | 671 pop_text = textbuf.pop |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
672 for kind, data, pos in chain(stream, [(None, None, None)]): |
410
d14d89995c29
Improve the handling of namespaces in serialization.
cmlenz
parents:
408
diff
changeset
|
673 |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
674 if kind is TEXT: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
675 if noescape: |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
676 data = Markup(data) |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
677 push_text(data) |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
678 else: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
679 if textbuf: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
680 if len(textbuf) > 1: |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
681 text = mjoin(textbuf, escape_quotes=False) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
682 del textbuf[:] |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
683 else: |
136 | 684 text = escape(pop_text(), quotes=False) |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
685 if not preserve: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
686 text = collapse_lines('\n', trim_trailing_space('', text)) |
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
687 yield TEXT, Markup(text), pos |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
688 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
689 if kind is START: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
690 tag, attrs = data |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
691 if preserve or (tag in preserve_elems or |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
692 attrs.get(space) == 'preserve'): |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
693 preserve += 1 |
219 | 694 if not noescape and tag in noescape_elems: |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
695 noescape = True |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
696 |
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
697 elif kind is END: |
346
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
698 noescape = False |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
699 if preserve: |
96882a191686
Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents:
345
diff
changeset
|
700 preserve -= 1 |
141
520a5b7dd6d2
* No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents:
140
diff
changeset
|
701 |
305 | 702 elif kind is START_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
703 noescape = True |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
704 |
305 | 705 elif kind is END_CDATA: |
143
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
706 noescape = False |
3d4c214c979a
CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents:
141
diff
changeset
|
707 |
136 | 708 if kind: |
123
10279d2eeec9
Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents:
109
diff
changeset
|
709 yield kind, data, pos |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
710 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
711 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
712 class DocTypeInserter(object): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
713 """A filter that inserts the DOCTYPE declaration in the correct location, |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
714 after the XML declaration. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
715 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
716 def __init__(self, doctype): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
717 """Initialize the filter. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
718 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
719 :param doctype: DOCTYPE as a string or DocType object. |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
720 """ |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
721 if isinstance(doctype, basestring): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
722 doctype = DocType.get(doctype) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
723 self.doctype_event = (DOCTYPE, doctype, (None, -1, -1)) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
724 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
725 def __call__(self, stream): |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
726 doctype_inserted = False |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
727 for kind, data, pos in stream: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
728 if not doctype_inserted: |
672
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
729 doctype_inserted = True |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
730 if kind is XML_DECL: |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
731 yield (kind, data, pos) |
571226acaeff
XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents:
671
diff
changeset
|
732 yield self.doctype_event |
671
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
733 continue |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
734 yield self.doctype_event |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
735 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
736 yield (kind, data, pos) |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
737 |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
738 if not doctype_inserted: |
8a9a7a8e9478
Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents:
669
diff
changeset
|
739 yield self.doctype_event |