Mercurial > genshi > genshi-test
annotate doc/streams.txt @ 880:3b16d762445b
Default XInclude-included template class to the class of the including template. Closes #302.
author | cmlenz |
---|---|
date | Thu, 15 Apr 2010 21:44:28 +0000 |
parents | 24733a5854d9 |
children |
rev | line source |
---|---|
226 | 1 .. -*- mode: rst; encoding: utf-8 -*- |
2 | |
3 ============== | |
4 Markup Streams | |
5 ============== | |
6 | |
7 A stream is the common representation of markup as a *stream of events*. | |
8 | |
9 | |
10 .. contents:: Contents | |
745 | 11 :depth: 2 |
226 | 12 .. sectnum:: |
13 | |
14 | |
15 Basics | |
16 ====== | |
17 | |
18 A stream can be attained in a number of ways. It can be: | |
19 | |
20 * the result of parsing XML or HTML text, or | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
21 * the result of selecting a subset of another stream using XPath, or |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
22 * programmatically generated. |
226 | 23 |
24 For example, the functions ``XML()`` and ``HTML()`` can be used to convert | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
25 literal XML or HTML text to a markup stream: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
26 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
27 .. code-block:: pycon |
226 | 28 |
230 | 29 >>> from genshi import XML |
226 | 30 >>> stream = XML('<p class="intro">Some text and ' |
31 ... '<a href="http://example.org/">a link</a>.' | |
32 ... '<br/></p>') | |
33 >>> stream | |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
34 <genshi.core.Stream object at ...> |
226 | 35 |
36 The stream is the result of parsing the text into events. Each event is a tuple | |
37 of the form ``(kind, data, pos)``, where: | |
38 | |
39 * ``kind`` defines what kind of event it is (such as the start of an element, | |
40 text, a comment, etc). | |
41 * ``data`` is the actual data associated with the event. How this looks depends | |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
42 on the event kind (see `event kinds`_) |
226 | 43 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the |
44 event “comes from”. | |
45 | |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
46 .. code-block:: pycon |
226 | 47 |
48 >>> for kind, data, pos in stream: | |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
49 ... print('%s %r %r' % (kind, data, pos)) |
226 | 50 ... |
857
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
51 START (QName('p'), Attrs([(QName('class'), u'intro')])) (None, 1, 0) |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
52 TEXT u'Some text and ' (None, 1, 17) |
857
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
53 START (QName('a'), Attrs([(QName('href'), u'http://example.org/')])) (None, 1, 31) |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
54 TEXT u'a link' (None, 1, 61) |
857
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
55 END QName('a') (None, 1, 67) |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
56 TEXT u'.' (None, 1, 71) |
857
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
57 START (QName('br'), Attrs()) (None, 1, 72) |
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
58 END QName('br') (None, 1, 77) |
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
59 END QName('p') (None, 1, 77) |
226 | 60 |
61 | |
62 Filtering | |
63 ========= | |
64 | |
65 One important feature of markup streams is that you can apply *filters* to the | |
230 | 66 stream, either filters that come with Genshi, or your own custom filters. |
226 | 67 |
68 A filter is simply a callable that accepts the stream as parameter, and returns | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
69 the filtered stream: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
70 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
71 .. code-block:: python |
226 | 72 |
73 def noop(stream): | |
74 """A filter that doesn't actually do anything with the stream.""" | |
75 for kind, data, pos in stream: | |
76 yield kind, data, pos | |
77 | |
78 Filters can be applied in a number of ways. The simplest is to just call the | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
79 filter directly: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
80 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
81 .. code-block:: python |
226 | 82 |
83 stream = noop(stream) | |
84 | |
85 The ``Stream`` class also provides a ``filter()`` method, which takes an | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
86 arbitrary number of filter callables and applies them all: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
87 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
88 .. code-block:: python |
226 | 89 |
90 stream = stream.filter(noop) | |
91 | |
92 Finally, filters can also be applied using the *bitwise or* operator (``|``), | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
93 which allows a syntax similar to pipes on Unix shells: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
94 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
95 .. code-block:: python |
226 | 96 |
97 stream = stream | noop | |
98 | |
230 | 99 One example of a filter included with Genshi is the ``HTMLSanitizer`` in |
100 ``genshi.filters``. It processes a stream of HTML markup, and strips out any | |
226 | 101 potentially dangerous constructs, such as Javascript event handlers. |
102 ``HTMLSanitizer`` is not a function, but rather a class that implements | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
103 ``__call__``, which means instances of the class are callable: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
104 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
105 .. code-block:: python |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
106 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
107 stream = stream | HTMLSanitizer() |
226 | 108 |
109 Both the ``filter()`` method and the pipe operator allow easy chaining of | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
110 filters: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
111 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
112 .. code-block:: python |
226 | 113 |
230 | 114 from genshi.filters import HTMLSanitizer |
226 | 115 stream = stream.filter(noop, HTMLSanitizer()) |
116 | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
117 That is equivalent to: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
118 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
119 .. code-block:: python |
226 | 120 |
121 stream = stream | noop | HTMLSanitizer() | |
122 | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
123 For more information about the built-in filters, see `Stream Filters`_. |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
124 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
125 .. _`Stream Filters`: filters.html |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
126 |
226 | 127 |
128 Serialization | |
129 ============= | |
130 | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
131 Serialization means producing some kind of textual output from a stream of |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
132 events, which you'll need when you want to transmit or store the results of |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
133 generating or otherwise processing markup. |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
134 |
745 | 135 The ``Stream`` class provides two methods for serialization: ``serialize()`` |
136 and ``render()``. The former is a generator that yields chunks of ``Markup`` | |
137 objects (which are basically unicode strings that are considered safe for | |
138 output on the web). The latter returns a single string, by default UTF-8 | |
139 encoded. | |
226 | 140 |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
141 Here's the output from ``serialize()``: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
142 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
143 .. code-block:: pycon |
226 | 144 |
145 >>> for output in stream.serialize(): | |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
146 ... print(repr(output)) |
226 | 147 ... |
148 <Markup u'<p class="intro">'> | |
149 <Markup u'Some text and '> | |
150 <Markup u'<a href="http://example.org/">'> | |
151 <Markup u'a link'> | |
152 <Markup u'</a>'> | |
153 <Markup u'.'> | |
154 <Markup u'<br/>'> | |
155 <Markup u'</p>'> | |
156 | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
157 And here's the output from ``render()``: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
158 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
159 .. code-block:: pycon |
226 | 160 |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
161 >>> print(stream.render()) |
226 | 162 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> |
163 | |
164 Both methods can be passed a ``method`` parameter that determines how exactly | |
745 | 165 the events are serialized to text. This parameter can be either a string or a |
166 custom serializer class: | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
167 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
168 .. code-block:: pycon |
226 | 169 |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
170 >>> print(stream.render('html')) |
226 | 171 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> |
172 | |
173 Note how the `<br>` element isn't closed, which is the right thing to do for | |
745 | 174 HTML. See `serialization methods`_ for more details. |
226 | 175 |
176 In addition, the ``render()`` method takes an ``encoding`` parameter, which | |
177 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. | |
178 | |
230 | 179 The different serializer classes in ``genshi.output`` can also be used |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
180 directly: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
181 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
182 .. code-block:: pycon |
226 | 183 |
230 | 184 >>> from genshi.filters import HTMLSanitizer |
185 >>> from genshi.output import TextSerializer | |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
186 >>> print(''.join(TextSerializer()(HTMLSanitizer()(stream)))) |
226 | 187 Some text and a link. |
188 | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
189 The pipe operator allows a nicer syntax: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
190 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
191 .. code-block:: pycon |
226 | 192 |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
193 >>> print(stream | HTMLSanitizer() | TextSerializer()) |
226 | 194 Some text and a link. |
195 | |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
196 |
745 | 197 .. _`serialization methods`: |
198 | |
199 Serialization Methods | |
200 --------------------- | |
201 | |
202 Genshi supports the use of different serialization methods to use for creating | |
203 a text representation of a markup stream. | |
204 | |
205 ``xml`` | |
206 The ``XMLSerializer`` is the default serialization method and results in | |
207 proper XML output including namespace support, the XML declaration, CDATA | |
208 sections, and so on. It is not generally not suitable for serving HTML or | |
209 XHTML web pages (unless you want to use true XHTML 1.1), for which the | |
210 ``xhtml`` and ``html`` serializers described below should be preferred. | |
211 | |
212 ``xhtml`` | |
213 The ``XHTMLSerializer`` is a specialization of the generic ``XMLSerializer`` | |
214 that understands the pecularities of producing XML-compliant output that can | |
215 also be parsed without problems by the HTML parsers found in modern web | |
216 browsers. Thus, the output by this serializer should be usable whether sent | |
217 as "text/html" or "application/xhtml+html" (although there are a lot of | |
218 subtle issues to pay attention to when switching between the two, in | |
219 particular with respect to differences in the DOM and CSS). | |
220 | |
221 For example, instead of rendering a script tag as ``<script/>`` (which | |
222 confuses the HTML parser in many browsers), it will produce | |
223 ``<script></script>``. Also, it will normalize any boolean attributes values | |
224 that are minimized in HTML, so that for example ``<hr noshade="1"/>`` | |
225 becomes ``<hr noshade="noshade" />``. | |
226 | |
227 This serializer supports the use of namespaces for compound documents, for | |
228 example to use inline SVG inside an XHTML document. | |
229 | |
230 ``html`` | |
231 The ``HTMLSerializer`` produces proper HTML markup. The main differences | |
232 compared to ``xhtml`` serialization are that boolean attributes are | |
233 minimized, empty tags are not self-closing (so it's ``<br>`` instead of | |
234 ``<br />``), and that the contents of ``<script>`` and ``<style>`` elements | |
235 are not escaped. | |
236 | |
237 ``text`` | |
238 The ``TextSerializer`` produces plain text from markup streams. This is | |
239 useful primarily for `text templates`_, but can also be used to produce | |
240 plain text output from markup templates or other sources. | |
241 | |
242 .. _`text templates`: text-templates.html | |
243 | |
244 | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
245 Serialization Options |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
246 --------------------- |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
247 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
248 Both ``serialize()`` and ``render()`` support additional keyword arguments that |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
249 are passed through to the initializer of the serializer class. The following |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
250 options are supported by the built-in serializers: |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
251 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
252 ``strip_whitespace`` |
745 | 253 Whether the serializer should remove trailing spaces and empty lines. |
254 Defaults to ``True``. | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
255 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
256 (This option is not available for serialization to plain text.) |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
257 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
258 ``doctype`` |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
259 A ``(name, pubid, sysid)`` tuple defining the name, publid identifier, and |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
260 system identifier of a ``DOCTYPE`` declaration to prepend to the generated |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
261 output. If provided, this declaration will override any ``DOCTYPE`` |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
262 declaration in the stream. |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
263 |
745 | 264 The parameter can also be specified as a string to refer to commonly used |
265 doctypes: | |
266 | |
267 +-----------------------------+-------------------------------------------+ | |
268 | Shorthand | DOCTYPE | | |
269 +=============================+===========================================+ | |
270 | ``html`` or | HTML 4.01 Strict | | |
271 | ``html-strict`` | | | |
272 +-----------------------------+-------------------------------------------+ | |
273 | ``html-transitional`` | HTML 4.01 Transitional | | |
274 +-----------------------------+-------------------------------------------+ | |
275 | ``html-frameset`` | HTML 4.01 Frameset | | |
276 +-----------------------------+-------------------------------------------+ | |
277 | ``html5`` | DOCTYPE proposed for the work-in-progress | | |
278 | | HTML5 standard | | |
279 +-----------------------------+-------------------------------------------+ | |
280 | ``xhtml`` or | XHTML 1.0 Strict | | |
281 | ``xhtml-strict`` | | | |
282 +-----------------------------+-------------------------------------------+ | |
283 | ``xhtml-transitional`` | XHTML 1.0 Transitional | | |
284 +-----------------------------+-------------------------------------------+ | |
285 | ``xhtml-frameset`` | XHTML 1.0 Frameset | | |
286 +-----------------------------+-------------------------------------------+ | |
287 | ``xhtml11`` | XHTML 1.1 | | |
288 +-----------------------------+-------------------------------------------+ | |
289 | ``svg`` or ``svg-full`` | SVG 1.1 | | |
290 +-----------------------------+-------------------------------------------+ | |
291 | ``svg-basic`` | SVG 1.1 Basic | | |
292 +-----------------------------+-------------------------------------------+ | |
293 | ``svg-tiny`` | SVG 1.1 Tiny | | |
294 +-----------------------------+-------------------------------------------+ | |
295 | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
296 (This option is not available for serialization to plain text.) |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
297 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
298 ``namespace_prefixes`` |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
299 The namespace prefixes to use for namespace that are not bound to a prefix |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
300 in the stream itself. |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
301 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
302 (This option is not available for serialization to HTML or plain text.) |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
303 |
729 | 304 ``drop_xml_decl`` |
305 Whether to remove the XML declaration (the ``<?xml ?>`` part at the | |
306 beginning of a document) when serializing. This defaults to ``True`` as an | |
307 XML declaration throws some older browsers into "Quirks" rendering mode. | |
308 | |
309 (This option is only available for serialization to XHTML.) | |
310 | |
745 | 311 ``strip_markup`` |
312 Whether the text serializer should detect and remove any tags or entity | |
313 encoded characters in the text. | |
314 | |
315 (This option is only available for serialization to plain text.) | |
316 | |
438
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
317 |
6fd7e4dc0318
Added documentation page on the builtin stream filters.
cmlenz
parents:
394
diff
changeset
|
318 |
226 | 319 Using XPath |
320 =========== | |
321 | |
322 XPath can be used to extract a specific subset of the stream via the | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
323 ``select()`` method: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
324 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
325 .. code-block:: pycon |
226 | 326 |
327 >>> substream = stream.select('a') | |
328 >>> substream | |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
329 <genshi.core.Stream object at ...> |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
330 >>> print(substream) |
226 | 331 <a href="http://example.org/">a link</a> |
332 | |
333 Often, streams cannot be reused: in the above example, the sub-stream is based | |
334 on a generator. Once it has been serialized, it will have been fully consumed, | |
335 and cannot be rendered again. To work around this, you can wrap such a stream | |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
336 in a ``list``: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
337 |
510
ca7d707d51b0
Use syntax highlighting on all the other doc pages, too.
cmlenz
parents:
508
diff
changeset
|
338 .. code-block:: pycon |
226 | 339 |
230 | 340 >>> from genshi import Stream |
226 | 341 >>> substream = Stream(list(stream.select('a'))) |
342 >>> substream | |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
343 <genshi.core.Stream object at ...> |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
344 >>> print(substream) |
226 | 345 <a href="http://example.org/">a link</a> |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
346 >>> print(substream.select('@href')) |
226 | 347 http://example.org/ |
853
4376010bb97e
Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents:
774
diff
changeset
|
348 >>> print(substream.select('text()')) |
226 | 349 a link |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
350 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
351 See `Using XPath in Genshi`_ for more information about the XPath support in |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
352 Genshi. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
353 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
354 .. _`Using XPath in Genshi`: xpath.html |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
355 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
356 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
357 .. _`event kinds`: |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
358 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
359 Event Kinds |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
360 =========== |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
361 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
362 Every event in a stream is of one of several *kinds*, which also determines |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
363 what the ``data`` item of the event tuple looks like. The different kinds of |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
364 events are documented below. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
365 |
394 | 366 .. note:: The ``data`` item is generally immutable. If the data is to be |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
367 modified when processing a stream, it must be replaced by a new tuple. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
368 Effectively, this means the entire event tuple is immutable. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
369 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
370 START |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
371 ----- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
372 The opening tag of an element. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
373 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
374 For this kind of event, the ``data`` item is a tuple of the form |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
375 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
376 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
377 the attribute names and values associated with the tag (excluding namespace |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
378 declarations): |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
379 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
380 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
381 |
857
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
382 START, (QName('p'), Attrs([(QName('class'), u'intro')])), pos |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
383 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
384 END |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
385 --- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
386 The closing tag of an element. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
387 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
388 The ``data`` item of end events consists of just a ``QName`` instance |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
389 describing the qualified name of the tag: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
390 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
391 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
392 |
857
24733a5854d9
Avoid unicode literals in `repr`s of `QName` and `Namespace` when not necessary.
cmlenz
parents:
853
diff
changeset
|
393 END, QName('p'), pos |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
394 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
395 TEXT |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
396 ---- |
394 | 397 Character data outside of elements and comments. |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
398 |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
399 For text events, the ``data`` item should be a unicode object: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
400 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
401 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
402 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
403 TEXT, u'Hello, world!', pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
404 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
405 START_NS |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
406 -------- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
407 The start of a namespace mapping, binding a namespace prefix to a URI. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
408 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
409 The ``data`` item of this kind of event is a tuple of the form |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
410 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
411 full URI to which the prefix is bound. Both should be unicode objects. If the |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
412 namespace is not bound to any prefix, the ``prefix`` item is an empty string: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
413 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
414 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
415 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
416 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
417 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
418 END_NS |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
419 ------ |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
420 The end of a namespace mapping. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
421 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
422 The ``data`` item of such events consists of only the namespace prefix (a |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
423 unicode object): |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
424 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
425 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
426 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
427 END_NS, u'svg', pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
428 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
429 DOCTYPE |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
430 ------- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
431 A document type declaration. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
432 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
433 For this type of event, the ``data`` item is a tuple of the form |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
434 ``(name, pubid, sysid)``, where ``name`` is the name of the root element, |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
435 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
436 the system identifier of the DTD (or ``None``): |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
437 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
438 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
439 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
440 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
441 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
442 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
443 COMMENT |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
444 ------- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
445 A comment. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
446 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
447 For such events, the ``data`` item is a unicode object containing all character |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
448 data between the comment delimiters: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
449 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
450 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
451 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
452 COMMENT, u'Commented out', pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
453 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
454 PI |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
455 -- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
456 A processing instruction. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
457 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
458 The ``data`` item is a tuple of the form ``(target, data)`` for processing |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
459 instructions, where ``target`` is the target of the PI (used to identify the |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
460 application by which the instruction should be processed), and ``data`` is text |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
461 following the target (excluding the terminating question mark): |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
462 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
463 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
464 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
465 PI, (u'php', u'echo "Yo" '), pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
466 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
467 START_CDATA |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
468 ----------- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
469 Marks the beginning of a ``CDATA`` section. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
470 |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
471 The ``data`` item for such events is always ``None``: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
472 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
473 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
474 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
475 START_CDATA, None, pos |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
476 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
477 END_CDATA |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
478 --------- |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
479 Marks the end of a ``CDATA`` section. |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
480 |
508
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
481 The ``data`` item for such events is always ``None``: |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
482 |
cabd80e75dad
Enable syntax highlighting (with Pygments) on doc page.
cmlenz
parents:
438
diff
changeset
|
483 .. code-block:: python |
382
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
484 |
d7da3fba7faf
* Added documentation for the various stream event kinds.
cmlenz
parents:
230
diff
changeset
|
485 END_CDATA, None, pos |