annotate genshi/output.py @ 958:6fc92535c888 experimental-performance-improvement-exploration

Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
author hodgestar
date Tue, 13 Mar 2012 03:03:02 +0000
parents f15334b65cf8
children
rev   line source
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
2 #
854
4d9bef447df9 More work on reducing the size of the diff produced by 2to3.
cmlenz
parents: 853
diff changeset
3 # Copyright (C) 2006-2009 Edgewall Software
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
5 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
8 # are also available at http://genshi.edgewall.org/wiki/License.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
9 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
12 # history and logs, available at http://genshi.edgewall.org/log/.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
13
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
14 """This module provides different kinds of serialization methods for XML event
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
15 streams.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
16 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
17
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
18 from itertools import chain
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
19 import re
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
20
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
21 from genshi.core import escape, Attrs, Markup, Namespace, QName, StreamEventKind
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
22 from genshi.core import START, END, TEXT, XML_DECL, DOCTYPE, START_NS, END_NS, \
958
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
23 START_CDATA, END_CDATA, PI, COMMENT, XML_NAMESPACE, ATTRS
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
24
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
25 __all__ = ['encode', 'get_serializer', 'DocType', 'XMLSerializer',
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
26 'XHTMLSerializer', 'HTMLSerializer', 'TextSerializer']
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
27 __docformat__ = 'restructuredtext en'
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
28
863
869ca3cc2f4c Make the output tests skip the encoding step.
cmlenz
parents: 854
diff changeset
29
932
18209925c54e Merge r1140 from py3k:
hodgestar
parents: 863
diff changeset
30 def encode(iterator, method='xml', encoding=None, out=None):
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
31 """Encode serializer output into a string.
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
32
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
33 :param iterator: the iterator returned from serializing a stream (basically
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
34 any iterator that yields unicode objects)
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
35 :param method: the serialization method; determines how characters not
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
36 representable in the specified encoding are treated
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
37 :param encoding: how the output string should be encoded; if set to `None`,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
38 this method returns a `unicode` object
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
39 :param out: a file-like object that the output should be written to
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
40 instead of being returned as one big string; note that if
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
41 this is a file or socket (or similar), the `encoding` must
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
42 not be `None` (that is, the output must be encoded)
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
43 :return: a `str` or `unicode` object (depending on the `encoding`
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
44 parameter), or `None` if the `out` parameter is provided
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
45
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
46 :since: version 0.4.1
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
47 :note: Changed in 0.5: added the `out` parameter
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
48 """
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
49 if encoding is not None:
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
50 errors = 'replace'
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
51 if method != 'text' and not isinstance(method, TextSerializer):
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
52 errors = 'xmlcharrefreplace'
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
53 _encode = lambda string: string.encode(encoding, errors)
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
54 else:
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
55 _encode = lambda string: string
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
56 if out is None:
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
57 return _encode(''.join(list(iterator)))
688
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
58 for chunk in iterator:
d8571da25bc5 The `Stream.render` now accepts an optional `out` parameter that can be used to pass in a writable file-like object to use for assembling the output, instead of building a big string and returning it.
cmlenz
parents: 672
diff changeset
59 out.write(_encode(chunk))
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
60
863
869ca3cc2f4c Make the output tests skip the encoding step.
cmlenz
parents: 854
diff changeset
61
462
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
62 def get_serializer(method='xml', **kwargs):
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
63 """Return a serializer object for the given method.
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
64
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
65 :param method: the serialization method; can be either "xml", "xhtml",
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
66 "html", "text", or a custom serializer class
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
67
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
68 Any additional keyword arguments are passed to the serializer, and thus
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
69 depend on the `method` parameter value.
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
70
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
71 :see: `XMLSerializer`, `XHTMLSerializer`, `HTMLSerializer`, `TextSerializer`
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
72 :since: version 0.4.1
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
73 """
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
74 if isinstance(method, basestring):
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
75 method = {'xml': XMLSerializer,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
76 'xhtml': XHTMLSerializer,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
77 'html': HTMLSerializer,
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
78 'text': TextSerializer}[method.lower()]
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
79 return method(**kwargs)
d5e2a7b58116 Add lower-level serialization functions.
cmlenz
parents: 460
diff changeset
80
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
81
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
82 def _prepare_cache(use_cache=True):
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
83 """Prepare a private token serialization cache.
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
84
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
85 :param use_cache: boolean indicating whether a real cache should
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
86 be used or not. If not, the returned functions
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
87 are no-ops.
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
88
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
89 :return: emit and get functions, for storing and retrieving
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
90 serialized values from the cache.
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
91 """
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
92 cache = {}
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
93 if use_cache:
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
94 def _emit(kind, input, output):
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
95 cache[kind, input] = output
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
96 return output
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
97 _get = cache.get
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
98 else:
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
99 def _emit(kind, input, output):
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
100 return output
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
101 def _get(key):
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
102 pass
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
103 return _emit, _get, cache
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
104
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
105
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
106 class DocType(object):
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
107 """Defines a number of commonly used DOCTYPE declarations as constants."""
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
108
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
109 HTML_STRICT = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
110 'html', '-//W3C//DTD HTML 4.01//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
111 'http://www.w3.org/TR/html4/strict.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
112 )
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
113 HTML_TRANSITIONAL = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
114 'html', '-//W3C//DTD HTML 4.01 Transitional//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
115 'http://www.w3.org/TR/html4/loose.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
116 )
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
117 HTML_FRAMESET = (
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
118 'html', '-//W3C//DTD HTML 4.01 Frameset//EN',
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
119 'http://www.w3.org/TR/html4/frameset.dtd'
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
120 )
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
121 HTML = HTML_STRICT
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
122
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
123 HTML5 = ('html', None, None)
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
124
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
125 XHTML_STRICT = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
126 'html', '-//W3C//DTD XHTML 1.0 Strict//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
127 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
128 )
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
129 XHTML_TRANSITIONAL = (
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
130 'html', '-//W3C//DTD XHTML 1.0 Transitional//EN',
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
131 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
132 )
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
133 XHTML_FRAMESET = (
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
134 'html', '-//W3C//DTD XHTML 1.0 Frameset//EN',
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
135 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd'
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
136 )
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
137 XHTML = XHTML_STRICT
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
138
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
139 XHTML11 = (
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
140 'html', '-//W3C//DTD XHTML 1.1//EN',
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
141 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
142 )
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
143
663
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
144 SVG_FULL = (
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
145 'svg', '-//W3C//DTD SVG 1.1//EN',
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
146 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd'
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
147 )
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
148 SVG_BASIC = (
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
149 'svg', '-//W3C//DTD SVG Basic 1.1//EN',
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
150 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd'
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
151 )
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
152 SVG_TINY = (
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
153 'svg', '-//W3C//DTD SVG Tiny 1.1//EN',
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
154 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-tiny.dtd'
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
155 )
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
156 SVG = SVG_FULL
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
157
822
70fddd2262f5 Get rid of some Python 2.3 legacy that's no longer needed now that 2.4 is the baseline.
cmlenz
parents: 750
diff changeset
158 @classmethod
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
159 def get(cls, name):
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
160 """Return the ``(name, pubid, sysid)`` tuple of the ``DOCTYPE``
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
161 declaration for the specified name.
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
162
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
163 The following names are recognized in this version:
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
164 * "html" or "html-strict" for the HTML 4.01 strict DTD
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
165 * "html-transitional" for the HTML 4.01 transitional DTD
745
74b5c5476ddb Preparing for [milestone:0.5] release.
cmlenz
parents: 740
diff changeset
166 * "html-frameset" for the HTML 4.01 frameset DTD
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
167 * "html5" for the ``DOCTYPE`` proposed for HTML5
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
168 * "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
169 * "xhtml-transitional" for the XHTML 1.0 transitional DTD
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
170 * "xhtml-frameset" for the XHTML 1.0 frameset DTD
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
171 * "xhtml11" for the XHTML 1.1 DTD
663
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
172 * "svg" or "svg-full" for the SVG 1.1 DTD
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
173 * "svg-basic" for the SVG Basic 1.1 DTD
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
174 * "svg-tiny" for the SVG Tiny 1.1 DTD
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
175
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
176 :param name: the name of the ``DOCTYPE``
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
177 :return: the ``(name, pubid, sysid)`` tuple for the requested
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
178 ``DOCTYPE``, or ``None`` if the name is not recognized
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
179 :since: version 0.4.1
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
180 """
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
181 return {
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
182 'html': cls.HTML, 'html-strict': cls.HTML_STRICT,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
183 'html-transitional': DocType.HTML_TRANSITIONAL,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
184 'html-frameset': DocType.HTML_FRAMESET,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
185 'html5': cls.HTML5,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
186 'xhtml': cls.XHTML, 'xhtml-strict': cls.XHTML_STRICT,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
187 'xhtml-transitional': cls.XHTML_TRANSITIONAL,
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
188 'xhtml-frameset': cls.XHTML_FRAMESET,
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
189 'xhtml11': cls.XHTML11,
663
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
190 'svg': cls.SVG, 'svg-full': cls.SVG_FULL,
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
191 'svg-basic': cls.SVG_BASIC,
c50d2705016e Add SVG DTDs to `DocType` class. Closes #161.
cmlenz
parents: 658
diff changeset
192 'svg-tiny': cls.SVG_TINY
464
2f13c5fc4a4d Move the mapping of doctype names to tuples out of the plugin into the `DocType` class.
cmlenz
parents: 462
diff changeset
193 }.get(name.lower())
448
1154f2aadb6c Add support for HTML5 doctype.
cmlenz
parents: 437
diff changeset
194
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
195
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
196 class XMLSerializer(object):
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
197 """Produces XML text from an event stream.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
198
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
199 >>> from genshi.builder import tag
20
cc92d74ce9e5 Fix tests broken in [20].
cmlenz
parents: 19
diff changeset
200 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 852
diff changeset
201 >>> print(''.join(XMLSerializer()(elem.generate())))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
202 <div><a href="foo"/><br/><hr noshade="True"/></div>
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
203 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
204
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
205 _PRESERVE_SPACE = frozenset()
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
206
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
207 def __init__(self, doctype=None, strip_whitespace=True,
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
208 namespace_prefixes=None, cache=True):
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
209 """Initialize the XML serializer.
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
210
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
211 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
212 DOCTYPE declaration that should be included at the top
494
942d73ba938c The `doctype` parameter for serializers can now be a string.
cmlenz
parents: 464
diff changeset
213 of the generated output, or the name of a DOCTYPE as
942d73ba938c The `doctype` parameter for serializers can now be a string.
cmlenz
parents: 464
diff changeset
214 defined in `DocType.get`
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
215 :param strip_whitespace: whether extraneous whitespace should be
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
216 stripped from the output
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
217 :param cache: whether to cache the text output per event, which
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
218 improves performance for repetitive markup
494
942d73ba938c The `doctype` parameter for serializers can now be a string.
cmlenz
parents: 464
diff changeset
219 :note: Changed in 0.4.2: The `doctype` parameter can now be a string.
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
220 :note: Changed in 0.6: The `cache` parameter was added
85
4938c310d904 Improve handling of DOCTYPE declarations.
cmlenz
parents: 73
diff changeset
221 """
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
222 self.filters = [EmptyTagFilter()]
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
223 if strip_whitespace:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
224 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
225 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes,
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
226 cache=cache))
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
227 if doctype:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
228 self.filters.append(DocTypeInserter(doctype))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
229 self.cache = cache
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
230
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
231 def _prepare_cache(self):
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
232 return _prepare_cache(self.cache)[:2]
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
233
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
234 def __call__(self, stream):
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
235 have_decl = have_doctype = False
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
236 in_cdata = False
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
237 _emit, _get = self._prepare_cache()
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
238
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
239 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
240 stream = filter_(stream)
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
241 for kind, data, pos in stream:
939
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
242 if kind is TEXT and isinstance(data, Markup):
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
243 yield data
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
244 continue
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
245 cached = _get((kind, data))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
246 if cached is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
247 yield cached
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
248 elif kind is START or kind is EMPTY:
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
249 tag, attrib = data
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
250 buf = ['<', tag]
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
251 for attr, value in attrib:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
252 buf += [' ', attr, '="', escape(value), '"']
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
253 buf.append(kind is EMPTY and '/>' or '>')
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
254 yield _emit(kind, data, Markup(''.join(buf)))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
255
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
256 elif kind is END:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
257 yield _emit(kind, data, Markup('</%s>' % data))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
258
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
259 elif kind is TEXT:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
260 if in_cdata:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
261 yield _emit(kind, data, data)
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
262 else:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
263 yield _emit(kind, data, escape(data, quotes=False))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
264
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
265 elif kind is COMMENT:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
266 yield _emit(kind, data, Markup('<!--%s-->' % data))
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
267
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
268 elif kind is XML_DECL and not have_decl:
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
269 version, encoding, standalone = data
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
270 buf = ['<?xml version="%s"' % version]
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
271 if encoding:
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
272 buf.append(' encoding="%s"' % encoding)
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
273 if standalone != -1:
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
274 standalone = standalone and 'yes' or 'no'
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
275 buf.append(' standalone="%s"' % standalone)
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
276 buf.append('?>\n')
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
277 yield Markup(''.join(buf))
460
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
278 have_decl = True
75425671b437 Apply patch by Alec Thomas for processing XML declarations (#111). Thanks!
cmlenz
parents: 448
diff changeset
279
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
280 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
281 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
282 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
283 if pubid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
284 buf.append(' PUBLIC "%s"')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
285 elif sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
286 buf.append(' SYSTEM')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
287 if sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
288 buf.append(' "%s"')
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
289 buf.append('>\n')
854
4d9bef447df9 More work on reducing the size of the diff produced by 2to3.
cmlenz
parents: 853
diff changeset
290 yield Markup(''.join(buf)) % tuple([p for p in data if p])
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
291 have_doctype = True
109
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
292
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
293 elif kind is START_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
294 yield Markup('<![CDATA[')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
295 in_cdata = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
296
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
297 elif kind is END_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
298 yield Markup(']]>')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
299 in_cdata = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
300
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
301 elif kind is PI:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
302 yield _emit(kind, data, Markup('<?%s %s?>' % data))
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
303
958
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
304 elif kind is ATTRS:
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
305 # this is specifically to support the rendering of
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
306 # streams generated by genshi.path.select() and provides
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
307 # backwards compatibility with genshi < 0.7
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
308 yield data.concatenate_values()
6fc92535c888 Be more careful about what is passed into streams as events and remove many uses of _ensure as a result. An ATTRS event is added for handling Attributes returned by gensh.path.select().
hodgestar
parents: 939
diff changeset
309
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
310
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
311 class XHTMLSerializer(XMLSerializer):
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
312 """Produces XHTML text from an event stream.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
313
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
314 >>> from genshi.builder import tag
20
cc92d74ce9e5 Fix tests broken in [20].
cmlenz
parents: 19
diff changeset
315 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 852
diff changeset
316 >>> print(''.join(XHTMLSerializer()(elem.generate())))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
317 <div><a href="foo"></a><br /><hr noshade="noshade" /></div>
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
318 """
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
319
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
320 _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
321 'hr', 'img', 'input', 'isindex', 'link', 'meta',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
322 'param'])
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
323 _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
324 'defer', 'disabled', 'ismap', 'multiple',
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
325 'nohref', 'noresize', 'noshade', 'nowrap'])
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
326 _PRESERVE_SPACE = frozenset([
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
327 QName('pre'), QName('http://www.w3.org/1999/xhtml}pre'),
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
328 QName('textarea'), QName('http://www.w3.org/1999/xhtml}textarea')
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
329 ])
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
330
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
331 def __init__(self, doctype=None, strip_whitespace=True,
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
332 namespace_prefixes=None, drop_xml_decl=True, cache=True):
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
333 super(XHTMLSerializer, self).__init__(doctype, False)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
334 self.filters = [EmptyTagFilter()]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
335 if strip_whitespace:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
336 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE))
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
337 namespace_prefixes = namespace_prefixes or {}
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
338 namespace_prefixes['http://www.w3.org/1999/xhtml'] = ''
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
339 self.filters.append(NamespaceFlattener(prefixes=namespace_prefixes,
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
340 cache=cache))
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
341 if doctype:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
342 self.filters.append(DocTypeInserter(doctype))
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
343 self.drop_xml_decl = drop_xml_decl
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
344 self.cache = cache
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
345
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
346 def __call__(self, stream):
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
347 boolean_attrs = self._BOOLEAN_ATTRS
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
348 empty_elems = self._EMPTY_ELEMS
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
349 drop_xml_decl = self.drop_xml_decl
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
350 have_decl = have_doctype = False
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
351 in_cdata = False
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
352 _emit, _get = self._prepare_cache()
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
353
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
354 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
355 stream = filter_(stream)
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
356 for kind, data, pos in stream:
939
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
357 if kind is TEXT and isinstance(data, Markup):
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
358 yield data
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
359 continue
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
360 cached = _get((kind, data))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
361 if cached is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
362 yield cached
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
363
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
364 elif kind is START or kind is EMPTY:
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
365 tag, attrib = data
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
366 buf = ['<', tag]
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
367 for attr, value in attrib:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
368 if attr in boolean_attrs:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
369 value = attr
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
370 elif attr == 'xml:lang' and 'lang' not in attrib:
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
371 buf += [' lang="', escape(value), '"']
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
372 elif attr == 'xml:space':
689
3881a602048a The XHTML serializer now strips `xml:space` attributes as they are only allowed on very few tags.
cmlenz
parents: 688
diff changeset
373 continue
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
374 buf += [' ', attr, '="', escape(value), '"']
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
375 if kind is EMPTY:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
376 if tag in empty_elems:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
377 buf.append(' />')
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
378 else:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
379 buf.append('></%s>' % tag)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
380 else:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
381 buf.append('>')
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
382 yield _emit(kind, data, Markup(''.join(buf)))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
383
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
384 elif kind is END:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
385 yield _emit(kind, data, Markup('</%s>' % data))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
386
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
387 elif kind is TEXT:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
388 if in_cdata:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
389 yield _emit(kind, data, data)
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
390 else:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
391 yield _emit(kind, data, escape(data, quotes=False))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
392
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
393 elif kind is COMMENT:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
394 yield _emit(kind, data, Markup('<!--%s-->' % data))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
395
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
396 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
397 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
398 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
399 if pubid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
400 buf.append(' PUBLIC "%s"')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
401 elif sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
402 buf.append(' SYSTEM')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
403 if sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
404 buf.append(' "%s"')
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
405 buf.append('>\n')
854
4d9bef447df9 More work on reducing the size of the diff produced by 2to3.
cmlenz
parents: 853
diff changeset
406 yield Markup(''.join(buf)) % tuple([p for p in data if p])
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
407 have_doctype = True
109
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
408
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
409 elif kind is XML_DECL and not have_decl and not drop_xml_decl:
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
410 version, encoding, standalone = data
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
411 buf = ['<?xml version="%s"' % version]
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
412 if encoding:
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
413 buf.append(' encoding="%s"' % encoding)
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
414 if standalone != -1:
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
415 standalone = standalone and 'yes' or 'no'
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
416 buf.append(' standalone="%s"' % standalone)
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
417 buf.append('?>\n')
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
418 yield Markup(''.join(buf))
729
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
419 have_decl = True
be0b4a7b2fd4 * Add XHTML 1.1 doctype (closes #228).
cmlenz
parents: 719
diff changeset
420
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
421 elif kind is START_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
422 yield Markup('<![CDATA[')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
423 in_cdata = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
424
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
425 elif kind is END_CDATA:
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
426 yield Markup(']]>')
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
427 in_cdata = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
428
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
429 elif kind is PI:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
430 yield _emit(kind, data, Markup('<?%s %s?>' % data))
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
431
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
432
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
433 class HTMLSerializer(XHTMLSerializer):
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
434 """Produces HTML text from an event stream.
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
435
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
436 >>> from genshi.builder import tag
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
437 >>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 852
diff changeset
438 >>> print(''.join(HTMLSerializer()(elem.generate())))
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
439 <div><a href="foo"></a><br><hr noshade></div>
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
440 """
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
441
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
442 _NOESCAPE_ELEMS = frozenset([
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
443 QName('script'), QName('http://www.w3.org/1999/xhtml}script'),
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
444 QName('style'), QName('http://www.w3.org/1999/xhtml}style')
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
445 ])
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
446
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
447 def __init__(self, doctype=None, strip_whitespace=True, cache=True):
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
448 """Initialize the HTML serializer.
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
449
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
450 :param doctype: a ``(name, pubid, sysid)`` tuple that represents the
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
451 DOCTYPE declaration that should be included at the top
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
452 of the generated output
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
453 :param strip_whitespace: whether extraneous whitespace should be
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
454 stripped from the output
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
455 :param cache: whether to cache the text output per event, which
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
456 improves performance for repetitive markup
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
457 :note: Changed in 0.6: The `cache` parameter was added
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
458 """
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
459 super(HTMLSerializer, self).__init__(doctype, False)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
460 self.filters = [EmptyTagFilter()]
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
461 if strip_whitespace:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
462 self.filters.append(WhitespaceFilter(self._PRESERVE_SPACE,
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
463 self._NOESCAPE_ELEMS))
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
464 self.filters.append(NamespaceFlattener(prefixes={
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
465 'http://www.w3.org/1999/xhtml': ''
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
466 }, cache=cache))
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
467 if doctype:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
468 self.filters.append(DocTypeInserter(doctype))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
469 self.cache = True
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
470
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
471 def __call__(self, stream):
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
472 boolean_attrs = self._BOOLEAN_ATTRS
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
473 empty_elems = self._EMPTY_ELEMS
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
474 noescape_elems = self._NOESCAPE_ELEMS
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
475 have_doctype = False
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
476 noescape = False
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
477 _emit, _get = self._prepare_cache()
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
478
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
479 for filter_ in self.filters:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
480 stream = filter_(stream)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
481 for kind, data, _ in stream:
939
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
482 if kind is TEXT and isinstance(data, Markup):
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
483 yield data
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
484 continue
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
485 output = _get((kind, data))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
486 if output is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
487 yield output
831
7a422be6f6a6 Follow-up fix for [1038].
cmlenz
parents: 829
diff changeset
488 if (kind is START or kind is EMPTY) \
7a422be6f6a6 Follow-up fix for [1038].
cmlenz
parents: 829
diff changeset
489 and data[0] in noescape_elems:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
490 noescape = True
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
491 elif kind is END:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
492 noescape = False
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
493
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
494 elif kind is START or kind is EMPTY:
96
fa08aef181a2 Add an XHTML serialization method. Now really need to get rid of some code duplication in the `markup.output` module.
cmlenz
parents: 89
diff changeset
495 tag, attrib = data
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
496 buf = ['<', tag]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
497 for attr, value in attrib:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
498 if attr in boolean_attrs:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
499 if value:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
500 buf += [' ', attr]
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
501 elif ':' in attr:
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
502 if attr == 'xml:lang' and 'lang' not in attrib:
524
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
503 buf += [' lang="', escape(value), '"']
7553760b58af Add special handling for `xml:lang` to HTML/XHTML serialization.
cmlenz
parents: 494
diff changeset
504 elif attr != 'xmlns':
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
505 buf += [' ', attr, '="', escape(value), '"']
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
506 buf.append('>')
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
507 if kind is EMPTY:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
508 if tag not in empty_elems:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
509 buf.append('</%s>' % tag)
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
510 yield _emit(kind, data, Markup(''.join(buf)))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
511 if tag in noescape_elems:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
512 noescape = True
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
513
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
514 elif kind is END:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
515 yield _emit(kind, data, Markup('</%s>' % data))
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
516 noescape = False
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
517
69
c40a5dcd2b55 A couple of minor performance improvements.
cmlenz
parents: 66
diff changeset
518 elif kind is TEXT:
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
519 if noescape:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
520 yield _emit(kind, data, data)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
521 else:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
522 yield _emit(kind, data, escape(data, quotes=False))
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
523
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
524 elif kind is COMMENT:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
525 yield _emit(kind, data, Markup('<!--%s-->' % data))
89
80386d62814f Support comments in templates that are not included in the output, in the same way Kid does: if the comment text starts with a `!` character, it is stripped from the output.
cmlenz
parents: 85
diff changeset
526
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
527 elif kind is DOCTYPE and not have_doctype:
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
528 name, pubid, sysid = data
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
529 buf = ['<!DOCTYPE %s']
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
530 if pubid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
531 buf.append(' PUBLIC "%s"')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
532 elif sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
533 buf.append(' SYSTEM')
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
534 if sysid:
397
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
535 buf.append(' "%s"')
31742fe6d47e * Moved some utility functions from `genshi.core` to `genshi.util` (backwards compatibility preserved via imports)
cmlenz
parents: 346
diff changeset
536 buf.append('>\n')
854
4d9bef447df9 More work on reducing the size of the diff produced by 2to3.
cmlenz
parents: 853
diff changeset
537 yield Markup(''.join(buf)) % tuple([p for p in data if p])
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
538 have_doctype = True
109
230ee6a2c6b2 Reorder the conditional branches in the serializers so that the more common event kinds are on top.
cmlenz
parents: 105
diff changeset
539
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
540 elif kind is PI:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
541 yield _emit(kind, data, Markup('<?%s %s?>' % data))
105
71f3db26eecb Include processing instructions in serialized streams.
cmlenz
parents: 96
diff changeset
542
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
543
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
544 class TextSerializer(object):
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
545 """Produces plain text from an event stream.
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
546
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
547 Only text events are included in the output. Unlike the other serializer,
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
548 special XML characters are not escaped:
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
549
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 219
diff changeset
550 >>> from genshi.builder import tag
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
551 >>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br)
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 852
diff changeset
552 >>> print(elem)
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
553 <div><a href="foo">&lt;Hello!&gt;</a><br/></div>
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 852
diff changeset
554 >>> print(''.join(TextSerializer()(elem.generate())))
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
555 <Hello!>
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
556
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
557 If text events contain literal markup (instances of the `Markup` class),
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
558 that markup is by default passed through unchanged:
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
559
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
560 >>> elem = tag.div(Markup('<a href="foo">Hello &amp; Bye!</a><br/>'))
863
869ca3cc2f4c Make the output tests skip the encoding step.
cmlenz
parents: 854
diff changeset
561 >>> print(elem.generate().render(TextSerializer, encoding=None))
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
562 <a href="foo">Hello &amp; Bye!</a><br/>
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
563
740
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
564 You can use the ``strip_markup`` to change this behavior, so that tags and
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
565 entities are stripped from the output (or in the case of entities,
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
566 replaced with the equivalent character):
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
567
863
869ca3cc2f4c Make the output tests skip the encoding step.
cmlenz
parents: 854
diff changeset
568 >>> print(elem.generate().render(TextSerializer, strip_markup=True,
869ca3cc2f4c Make the output tests skip the encoding step.
cmlenz
parents: 854
diff changeset
569 ... encoding=None))
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
570 Hello & Bye!
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
571 """
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
572
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
573 def __init__(self, strip_markup=False):
740
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
574 """Create the serializer.
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
575
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
576 :param strip_markup: whether markup (tags and encoded characters) found
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
577 in the text should be removed
0c3a2d7bf9a1 Fix a bad reference in the `TextSerializer` docstring.
cmlenz
parents: 729
diff changeset
578 """
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
579 self.strip_markup = strip_markup
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
580
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
581 def __call__(self, stream):
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
582 strip_markup = self.strip_markup
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
583 for event in stream:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
584 if event[0] is TEXT:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
585 data = event[1]
658
5df08e5195b8 The `TextSerializer` class no longer strips all markup in text by default, so that it is still possible to use the Genshi `escape` function even with text templates. The old behavior is available via the `strip_markup` option of the serializer. Closes #146.
cmlenz
parents: 524
diff changeset
586 if strip_markup and type(data) is Markup:
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
587 data = data.striptags().stripentities()
201
c5e0a1c86173 The `TextSerializer` should produce `unicode` objects, not `Markup` objects.
cmlenz
parents: 200
diff changeset
588 yield unicode(data)
200
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
589
5861f4446c26 Add serialization to plain text, based on cboos' patch. Closes #41.
cmlenz
parents: 178
diff changeset
590
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
591 class EmptyTagFilter(object):
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
592 """Combines `START` and `STOP` events into `EMPTY` events for elements that
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
593 have no contents.
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
594 """
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
595
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
596 EMPTY = StreamEventKind('EMPTY')
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
597
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
598 def __call__(self, stream):
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
599 prev = (None, None, None)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
600 for ev in stream:
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
601 if prev[0] is START:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
602 if ev[0] is END:
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
603 prev = EMPTY, prev[1], prev[2]
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
604 yield prev
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
605 continue
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
606 else:
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
607 yield prev
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
608 if ev[0] is not START:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
609 yield ev
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
610 prev = ev
212
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
611
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
612
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
613 EMPTY = EmptyTagFilter.EMPTY
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
614
0141f45c18e1 Refactored the handling of empty tags in the serializer: use an `EmptyTagFilter` that combines adjacent start/end events, instead of the generic pushback-iterator.
cmlenz
parents: 201
diff changeset
615
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
616 class NamespaceFlattener(object):
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
617 r"""Output stream filter that removes namespace information from the stream,
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
618 instead adding namespace attributes and prefixes as needed.
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
619
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
620 :param prefixes: optional mapping of namespace URIs to prefixes
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
621
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
622 >>> from genshi.input import XML
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
623 >>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2">
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
624 ... <two:item/>
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
625 ... </doc>''')
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
626 >>> for kind, data, pos in NamespaceFlattener()(xml):
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 852
diff changeset
627 ... print('%s %r' % (kind, data))
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
628 START (u'doc', Attrs([('xmlns', u'NS1'), (u'xmlns:two', u'NS2')]))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
629 TEXT u'\n '
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
630 START (u'two:item', Attrs())
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
631 END u'two:item'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
632 TEXT u'\n'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
633 END u'doc'
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
634 """
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
635
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
636 def __init__(self, prefixes=None, cache=True):
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
637 self.prefixes = {XML_NAMESPACE.uri: 'xml'}
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
638 if prefixes is not None:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
639 self.prefixes.update(prefixes)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
640 self.cache = cache
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
641
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
642 def __call__(self, stream):
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
643 prefixes = dict([(v, [k]) for k, v in self.prefixes.items()])
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
644 namespaces = {XML_NAMESPACE.uri: ['xml']}
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
645 _emit, _get, cache = _prepare_cache(self.cache)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
646 def _push_ns(prefix, uri):
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
647 namespaces.setdefault(uri, []).append(prefix)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
648 prefixes.setdefault(prefix, []).append(uri)
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
649 cache.clear()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
650 def _pop_ns(prefix):
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
651 uris = prefixes.get(prefix)
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
652 uri = uris.pop()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
653 if not uris:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
654 del prefixes[prefix]
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
655 if uri not in uris or uri != uris[-1]:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
656 uri_prefixes = namespaces[uri]
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
657 uri_prefixes.pop()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
658 if not uri_prefixes:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
659 del namespaces[uri]
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
660 cache.clear()
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
661 return uri
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
662
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
663 ns_attrs = []
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
664 _push_ns_attr = ns_attrs.append
437
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
665 def _make_ns_attr(prefix, uri):
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
666 return 'xmlns%s' % (prefix and ':%s' % prefix or ''), uri
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
667
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
668 def _gen_prefix():
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
669 val = 0
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
670 while 1:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
671 val += 1
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
672 yield 'ns%d' % val
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
673 _gen_prefix = _gen_prefix().next
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
674
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
675 for kind, data, pos in stream:
939
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
676 if kind is TEXT and isinstance(data, Markup):
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
677 yield kind, data, pos
f15334b65cf8 Don't cache (TEXT, Markup) events in serializers. This is not needed and since Markup instances compare equal to the same non-Markup string this can lead to incorrect cached output being retrieved. Fixes #429. This is patch t429-fix.2.patch from that ticket. It includes an additional unrelated test to check that the WhitespaceFilter actually removes ignorable whitespace.
hodgestar
parents: 938
diff changeset
678 continue
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
679 output = _get((kind, data))
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
680 if output is not None:
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
681 yield kind, output, pos
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
682
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
683 elif kind is START or kind is EMPTY:
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
684 tag, attrs = data
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
685
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
686 tagname = tag.localname
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
687 tagns = tag.namespace
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
688 if tagns:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
689 if tagns in namespaces:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
690 prefix = namespaces[tagns][-1]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
691 if prefix:
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
692 tagname = '%s:%s' % (prefix, tagname)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
693 else:
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
694 _push_ns_attr(('xmlns', tagns))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
695 _push_ns('', tagns)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
696
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
697 new_attrs = []
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
698 for attr, value in attrs:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
699 attrname = attr.localname
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
700 attrns = attr.namespace
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
701 if attrns:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
702 if attrns not in namespaces:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
703 prefix = _gen_prefix()
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
704 _push_ns(prefix, attrns)
412
bd51adc20a67 Actually write xmlns declaratons for generated attribute namespace prefixes.
cmlenz
parents: 410
diff changeset
705 _push_ns_attr(('xmlns:%s' % prefix, attrns))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
706 else:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
707 prefix = namespaces[attrns][-1]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
708 if prefix:
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
709 attrname = '%s:%s' % (prefix, attrname)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
710 new_attrs.append((attrname, value))
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
711
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
712 data = _emit(kind, data, (tagname, Attrs(ns_attrs + new_attrs)))
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
713 yield kind, data, pos
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
714 del ns_attrs[:]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
715
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
716 elif kind is END:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
717 tagname = data.localname
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
718 tagns = data.namespace
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
719 if tagns:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
720 prefix = namespaces[tagns][-1]
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
721 if prefix:
852
07f4339fecb0 Remove usage of unicode literals in a couple of places where they were not strictly necessary.
cmlenz
parents: 831
diff changeset
722 tagname = '%s:%s' % (prefix, tagname)
938
8d0f693081b5 Refactor string cache creation to remove repeated code in preparation for fixing issue #429. This is patch t429-refactor-r1038.2.patch from ticket (from cboos).
hodgestar
parents: 932
diff changeset
723 yield kind, _emit(kind, data, tagname), pos
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
724
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
725 elif kind is START_NS:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
726 prefix, uri = data
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
727 if uri not in namespaces:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
728 prefix = prefixes.get(uri, [prefix])[-1]
437
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
729 _push_ns_attr(_make_ns_attr(prefix, uri))
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
730 _push_ns(prefix, uri)
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
731
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
732 elif kind is END_NS:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
733 if data in prefixes:
829
6e46513e1c5c Add caching in the serialization stage, which speeds up the serialization of markup that has a lot of repetitive elements.
cmlenz
parents: 822
diff changeset
734 uri = _pop_ns(data)
437
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
735 if ns_attrs:
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
736 attr = _make_ns_attr(data, uri)
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
737 if attr in ns_attrs:
821fc97d3c0a Fix for #107.
cmlenz
parents: 425
diff changeset
738 ns_attrs.remove(attr)
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
739
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
740 else:
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
741 yield kind, data, pos
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
742
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
743
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
744 class WhitespaceFilter(object):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
745 """A filter that removes extraneous ignorable white space from the
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
746 stream.
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
747 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
748
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
749 def __init__(self, preserve=None, noescape=None):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
750 """Initialize the filter.
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
751
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
752 :param preserve: a set or sequence of tag names for which white-space
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
753 should be preserved
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
754 :param noescape: a set or sequence of tag names for which text content
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
755 should not be escaped
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
756
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
757 The `noescape` set is expected to refer to elements that cannot contain
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
758 further child elements (such as ``<style>`` or ``<script>`` in HTML
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 412
diff changeset
759 documents).
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
760 """
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
761 if preserve is None:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
762 preserve = []
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
763 self.preserve = frozenset(preserve)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
764 if noescape is None:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
765 noescape = []
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
766 self.noescape = frozenset(noescape)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
767
219
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
768 def __call__(self, stream, ctxt=None, space=XML_NAMESPACE['space'],
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
769 trim_trailing_space=re.compile('[ \t]+(?=\n)').sub,
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
770 collapse_lines=re.compile('\n{2,}').sub):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
771 mjoin = Markup('').join
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
772 preserve_elems = self.preserve
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
773 preserve = 0
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
774 noescape_elems = self.noescape
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
775 noescape = False
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
776
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
777 textbuf = []
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
778 push_text = textbuf.append
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
779 pop_text = textbuf.pop
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
780 for kind, data, pos in chain(stream, [(None, None, None)]):
410
d14d89995c29 Improve the handling of namespaces in serialization.
cmlenz
parents: 408
diff changeset
781
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
782 if kind is TEXT:
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
783 if noescape:
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
784 data = Markup(data)
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
785 push_text(data)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
786 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
787 if textbuf:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
788 if len(textbuf) > 1:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
789 text = mjoin(textbuf, escape_quotes=False)
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
790 del textbuf[:]
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
791 else:
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
792 text = escape(pop_text(), quotes=False)
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
793 if not preserve:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
794 text = collapse_lines('\n', trim_trailing_space('', text))
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
795 yield TEXT, Markup(text), pos
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
796
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
797 if kind is START:
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
798 tag, attrs = data
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
799 if preserve or (tag in preserve_elems or
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
800 attrs.get(space) == 'preserve'):
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
801 preserve += 1
219
ebceef564b79 Minor improvements to `WhitespaceFilter`.
cmlenz
parents: 213
diff changeset
802 if not noescape and tag in noescape_elems:
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
803 noescape = True
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
804
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
805 elif kind is END:
346
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
806 noescape = False
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
807 if preserve:
96882a191686 Whitespace was not getting preserved in HTML `<pre>` elements that contained other HTML elements.
cmlenz
parents: 345
diff changeset
808 preserve -= 1
141
520a5b7dd6d2 * No escaping of `<script>` or `<style>` tags in HTML output (see #24)
cmlenz
parents: 140
diff changeset
809
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
810 elif kind is START_CDATA:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
811 noescape = True
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
812
305
60111a041e7c Various performance-oriented tweaks.
cmlenz
parents: 280
diff changeset
813 elif kind is END_CDATA:
143
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
814 noescape = False
3d4c214c979a CDATA sections in XML input now appear as CDATA sections in the output. This should address the problem with escaping the contents of `<style>` and `<script>` elements, which would only get interpreted correctly if the output was served as `application/xhtml+xml`. Closes #24.
cmlenz
parents: 141
diff changeset
815
136
b86f496f6035 Minor performance improvements in serialization.
cmlenz
parents: 123
diff changeset
816 if kind:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 109
diff changeset
817 yield kind, data, pos
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
818
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
819
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
820 class DocTypeInserter(object):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
821 """A filter that inserts the DOCTYPE declaration in the correct location,
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
822 after the XML declaration.
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
823 """
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
824 def __init__(self, doctype):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
825 """Initialize the filter.
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
826
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
827 :param doctype: DOCTYPE as a string or DocType object.
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
828 """
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
829 if isinstance(doctype, basestring):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
830 doctype = DocType.get(doctype)
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
831 self.doctype_event = (DOCTYPE, doctype, (None, -1, -1))
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
832
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
833 def __call__(self, stream):
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
834 doctype_inserted = False
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
835 for kind, data, pos in stream:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
836 if not doctype_inserted:
672
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
837 doctype_inserted = True
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
838 if kind is XML_DECL:
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
839 yield (kind, data, pos)
571226acaeff XML_DECL must be the absolute first item, so don't bother buffering whitespace.
athomas
parents: 671
diff changeset
840 yield self.doctype_event
671
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
841 continue
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
842 yield self.doctype_event
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
843
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
844 yield (kind, data, pos)
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
845
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
846 if not doctype_inserted:
8a9a7a8e9478 Add a stream filter to insert the XML DOCTYPE in the correct location (ie.
athomas
parents: 669
diff changeset
847 yield self.doctype_event
Copyright (C) 2012-2017 Edgewall Software