Mercurial > genshi > genshi-test
comparison doc/streams.txt @ 511:1a29617a5d87 stable-0.4.x
Ported [611:614] to 0.4.x branch.
author | cmlenz |
---|---|
date | Wed, 06 Jun 2007 11:18:46 +0000 |
parents | 6fd7e4dc0318 |
children |
comparison
equal
deleted
inserted
replaced
495:90eecd360b18 | 511:1a29617a5d87 |
---|---|
20 * the result of parsing XML or HTML text, or | 20 * the result of parsing XML or HTML text, or |
21 * the result of selecting a subset of another stream using XPath, or | 21 * the result of selecting a subset of another stream using XPath, or |
22 * programmatically generated. | 22 * programmatically generated. |
23 | 23 |
24 For example, the functions ``XML()`` and ``HTML()`` can be used to convert | 24 For example, the functions ``XML()`` and ``HTML()`` can be used to convert |
25 literal XML or HTML text to a markup stream:: | 25 literal XML or HTML text to a markup stream: |
26 | |
27 .. code-block:: pycon | |
26 | 28 |
27 >>> from genshi import XML | 29 >>> from genshi import XML |
28 >>> stream = XML('<p class="intro">Some text and ' | 30 >>> stream = XML('<p class="intro">Some text and ' |
29 ... '<a href="http://example.org/">a link</a>.' | 31 ... '<a href="http://example.org/">a link</a>.' |
30 ... '<br/></p>') | 32 ... '<br/></p>') |
39 * ``data`` is the actual data associated with the event. How this looks depends | 41 * ``data`` is the actual data associated with the event. How this looks depends |
40 on the event kind (see `event kinds`_) | 42 on the event kind (see `event kinds`_) |
41 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the | 43 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the |
42 event “comes from”. | 44 event “comes from”. |
43 | 45 |
44 :: | 46 .. code-block:: pycon |
45 | 47 |
46 >>> for kind, data, pos in stream: | 48 >>> for kind, data, pos in stream: |
47 ... print kind, `data`, pos | 49 ... print kind, `data`, pos |
48 ... | 50 ... |
49 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0) | 51 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0) |
62 | 64 |
63 One important feature of markup streams is that you can apply *filters* to the | 65 One important feature of markup streams is that you can apply *filters* to the |
64 stream, either filters that come with Genshi, or your own custom filters. | 66 stream, either filters that come with Genshi, or your own custom filters. |
65 | 67 |
66 A filter is simply a callable that accepts the stream as parameter, and returns | 68 A filter is simply a callable that accepts the stream as parameter, and returns |
67 the filtered stream:: | 69 the filtered stream: |
70 | |
71 .. code-block:: python | |
68 | 72 |
69 def noop(stream): | 73 def noop(stream): |
70 """A filter that doesn't actually do anything with the stream.""" | 74 """A filter that doesn't actually do anything with the stream.""" |
71 for kind, data, pos in stream: | 75 for kind, data, pos in stream: |
72 yield kind, data, pos | 76 yield kind, data, pos |
73 | 77 |
74 Filters can be applied in a number of ways. The simplest is to just call the | 78 Filters can be applied in a number of ways. The simplest is to just call the |
75 filter directly:: | 79 filter directly: |
80 | |
81 .. code-block:: python | |
76 | 82 |
77 stream = noop(stream) | 83 stream = noop(stream) |
78 | 84 |
79 The ``Stream`` class also provides a ``filter()`` method, which takes an | 85 The ``Stream`` class also provides a ``filter()`` method, which takes an |
80 arbitrary number of filter callables and applies them all:: | 86 arbitrary number of filter callables and applies them all: |
87 | |
88 .. code-block:: python | |
81 | 89 |
82 stream = stream.filter(noop) | 90 stream = stream.filter(noop) |
83 | 91 |
84 Finally, filters can also be applied using the *bitwise or* operator (``|``), | 92 Finally, filters can also be applied using the *bitwise or* operator (``|``), |
85 which allows a syntax similar to pipes on Unix shells:: | 93 which allows a syntax similar to pipes on Unix shells: |
94 | |
95 .. code-block:: python | |
86 | 96 |
87 stream = stream | noop | 97 stream = stream | noop |
88 | 98 |
89 One example of a filter included with Genshi is the ``HTMLSanitizer`` in | 99 One example of a filter included with Genshi is the ``HTMLSanitizer`` in |
90 ``genshi.filters``. It processes a stream of HTML markup, and strips out any | 100 ``genshi.filters``. It processes a stream of HTML markup, and strips out any |
91 potentially dangerous constructs, such as Javascript event handlers. | 101 potentially dangerous constructs, such as Javascript event handlers. |
92 ``HTMLSanitizer`` is not a function, but rather a class that implements | 102 ``HTMLSanitizer`` is not a function, but rather a class that implements |
93 ``__call__``, which means instances of the class are callable:: | 103 ``__call__``, which means instances of the class are callable: |
104 | |
105 .. code-block:: python | |
94 | 106 |
95 stream = stream | HTMLSanitizer() | 107 stream = stream | HTMLSanitizer() |
96 | 108 |
97 Both the ``filter()`` method and the pipe operator allow easy chaining of | 109 Both the ``filter()`` method and the pipe operator allow easy chaining of |
98 filters:: | 110 filters: |
111 | |
112 .. code-block:: python | |
99 | 113 |
100 from genshi.filters import HTMLSanitizer | 114 from genshi.filters import HTMLSanitizer |
101 stream = stream.filter(noop, HTMLSanitizer()) | 115 stream = stream.filter(noop, HTMLSanitizer()) |
102 | 116 |
103 That is equivalent to:: | 117 That is equivalent to: |
118 | |
119 .. code-block:: python | |
104 | 120 |
105 stream = stream | noop | HTMLSanitizer() | 121 stream = stream | noop | HTMLSanitizer() |
106 | 122 |
107 For more information about the built-in filters, see `Stream Filters`_. | 123 For more information about the built-in filters, see `Stream Filters`_. |
108 | 124 |
119 The ``Stream`` class provides two methods for serialization: ``serialize()`` and | 135 The ``Stream`` class provides two methods for serialization: ``serialize()`` and |
120 ``render()``. The former is a generator that yields chunks of ``Markup`` objects | 136 ``render()``. The former is a generator that yields chunks of ``Markup`` objects |
121 (which are basically unicode strings that are considered safe for output on the | 137 (which are basically unicode strings that are considered safe for output on the |
122 web). The latter returns a single string, by default UTF-8 encoded. | 138 web). The latter returns a single string, by default UTF-8 encoded. |
123 | 139 |
124 Here's the output from ``serialize()``:: | 140 Here's the output from ``serialize()``: |
141 | |
142 .. code-block:: pycon | |
125 | 143 |
126 >>> for output in stream.serialize(): | 144 >>> for output in stream.serialize(): |
127 ... print `output` | 145 ... print `output` |
128 ... | 146 ... |
129 <Markup u'<p class="intro">'> | 147 <Markup u'<p class="intro">'> |
133 <Markup u'</a>'> | 151 <Markup u'</a>'> |
134 <Markup u'.'> | 152 <Markup u'.'> |
135 <Markup u'<br/>'> | 153 <Markup u'<br/>'> |
136 <Markup u'</p>'> | 154 <Markup u'</p>'> |
137 | 155 |
138 And here's the output from ``render()``:: | 156 And here's the output from ``render()``: |
157 | |
158 .. code-block:: pycon | |
139 | 159 |
140 >>> print stream.render() | 160 >>> print stream.render() |
141 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> | 161 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> |
142 | 162 |
143 Both methods can be passed a ``method`` parameter that determines how exactly | 163 Both methods can be passed a ``method`` parameter that determines how exactly |
144 the events are serialzed to text. This parameter can be either “xml” (the | 164 the events are serialzed to text. This parameter can be either “xml” (the |
145 default), “xhtml”, “html”, “text”, or a custom serializer class:: | 165 default), “xhtml”, “html”, “text”, or a custom serializer class: |
166 | |
167 .. code-block:: pycon | |
146 | 168 |
147 >>> print stream.render('html') | 169 >>> print stream.render('html') |
148 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> | 170 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> |
149 | 171 |
150 Note how the `<br>` element isn't closed, which is the right thing to do for | 172 Note how the `<br>` element isn't closed, which is the right thing to do for |
152 | 174 |
153 In addition, the ``render()`` method takes an ``encoding`` parameter, which | 175 In addition, the ``render()`` method takes an ``encoding`` parameter, which |
154 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. | 176 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. |
155 | 177 |
156 The different serializer classes in ``genshi.output`` can also be used | 178 The different serializer classes in ``genshi.output`` can also be used |
157 directly:: | 179 directly: |
180 | |
181 .. code-block:: pycon | |
158 | 182 |
159 >>> from genshi.filters import HTMLSanitizer | 183 >>> from genshi.filters import HTMLSanitizer |
160 >>> from genshi.output import TextSerializer | 184 >>> from genshi.output import TextSerializer |
161 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream))) | 185 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream))) |
162 Some text and a link. | 186 Some text and a link. |
163 | 187 |
164 The pipe operator allows a nicer syntax:: | 188 The pipe operator allows a nicer syntax: |
189 | |
190 .. code-block:: pycon | |
165 | 191 |
166 >>> print stream | HTMLSanitizer() | TextSerializer() | 192 >>> print stream | HTMLSanitizer() | TextSerializer() |
167 Some text and a link. | 193 Some text and a link. |
168 | 194 |
169 | 195 |
198 | 224 |
199 Using XPath | 225 Using XPath |
200 =========== | 226 =========== |
201 | 227 |
202 XPath can be used to extract a specific subset of the stream via the | 228 XPath can be used to extract a specific subset of the stream via the |
203 ``select()`` method:: | 229 ``select()`` method: |
230 | |
231 .. code-block:: pycon | |
204 | 232 |
205 >>> substream = stream.select('a') | 233 >>> substream = stream.select('a') |
206 >>> substream | 234 >>> substream |
207 <genshi.core.Stream object at ...> | 235 <genshi.core.Stream object at ...> |
208 >>> print substream | 236 >>> print substream |
209 <a href="http://example.org/">a link</a> | 237 <a href="http://example.org/">a link</a> |
210 | 238 |
211 Often, streams cannot be reused: in the above example, the sub-stream is based | 239 Often, streams cannot be reused: in the above example, the sub-stream is based |
212 on a generator. Once it has been serialized, it will have been fully consumed, | 240 on a generator. Once it has been serialized, it will have been fully consumed, |
213 and cannot be rendered again. To work around this, you can wrap such a stream | 241 and cannot be rendered again. To work around this, you can wrap such a stream |
214 in a ``list``:: | 242 in a ``list``: |
243 | |
244 .. code-block:: pycon | |
215 | 245 |
216 >>> from genshi import Stream | 246 >>> from genshi import Stream |
217 >>> substream = Stream(list(stream.select('a'))) | 247 >>> substream = Stream(list(stream.select('a'))) |
218 >>> substream | 248 >>> substream |
219 <genshi.core.Stream object at ...> | 249 <genshi.core.Stream object at ...> |
249 | 279 |
250 For this kind of event, the ``data`` item is a tuple of the form | 280 For this kind of event, the ``data`` item is a tuple of the form |
251 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the | 281 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the |
252 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing | 282 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing |
253 the attribute names and values associated with the tag (excluding namespace | 283 the attribute names and values associated with the tag (excluding namespace |
254 declarations):: | 284 declarations): |
285 | |
286 .. code-block:: python | |
255 | 287 |
256 START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos | 288 START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos |
257 | 289 |
258 END | 290 END |
259 --- | 291 --- |
260 The closing tag of an element. | 292 The closing tag of an element. |
261 | 293 |
262 The ``data`` item of end events consists of just a ``QName`` instance | 294 The ``data`` item of end events consists of just a ``QName`` instance |
263 describing the qualified name of the tag:: | 295 describing the qualified name of the tag: |
296 | |
297 .. code-block:: python | |
264 | 298 |
265 END, QName(u'p'), pos | 299 END, QName(u'p'), pos |
266 | 300 |
267 TEXT | 301 TEXT |
268 ---- | 302 ---- |
269 Character data outside of elements and comments. | 303 Character data outside of elements and comments. |
270 | 304 |
271 For text events, the ``data`` item should be a unicode object:: | 305 For text events, the ``data`` item should be a unicode object: |
306 | |
307 .. code-block:: python | |
272 | 308 |
273 TEXT, u'Hello, world!', pos | 309 TEXT, u'Hello, world!', pos |
274 | 310 |
275 START_NS | 311 START_NS |
276 -------- | 312 -------- |
277 The start of a namespace mapping, binding a namespace prefix to a URI. | 313 The start of a namespace mapping, binding a namespace prefix to a URI. |
278 | 314 |
279 The ``data`` item of this kind of event is a tuple of the form | 315 The ``data`` item of this kind of event is a tuple of the form |
280 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the | 316 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the |
281 full URI to which the prefix is bound. Both should be unicode objects. If the | 317 full URI to which the prefix is bound. Both should be unicode objects. If the |
282 namespace is not bound to any prefix, the ``prefix`` item is an empty string:: | 318 namespace is not bound to any prefix, the ``prefix`` item is an empty string: |
319 | |
320 .. code-block:: python | |
283 | 321 |
284 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos | 322 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos |
285 | 323 |
286 END_NS | 324 END_NS |
287 ------ | 325 ------ |
288 The end of a namespace mapping. | 326 The end of a namespace mapping. |
289 | 327 |
290 The ``data`` item of such events consists of only the namespace prefix (a | 328 The ``data`` item of such events consists of only the namespace prefix (a |
291 unicode object):: | 329 unicode object): |
330 | |
331 .. code-block:: python | |
292 | 332 |
293 END_NS, u'svg', pos | 333 END_NS, u'svg', pos |
294 | 334 |
295 DOCTYPE | 335 DOCTYPE |
296 ------- | 336 ------- |
297 A document type declaration. | 337 A document type declaration. |
298 | 338 |
299 For this type of event, the ``data`` item is a tuple of the form | 339 For this type of event, the ``data`` item is a tuple of the form |
300 ``(name, pubid, sysid)``, where ``name`` is the name of the root element, | 340 ``(name, pubid, sysid)``, where ``name`` is the name of the root element, |
301 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is | 341 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is |
302 the system identifier of the DTD (or ``None``):: | 342 the system identifier of the DTD (or ``None``): |
343 | |
344 .. code-block:: python | |
303 | 345 |
304 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ | 346 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ |
305 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos | 347 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos |
306 | 348 |
307 COMMENT | 349 COMMENT |
308 ------- | 350 ------- |
309 A comment. | 351 A comment. |
310 | 352 |
311 For such events, the ``data`` item is a unicode object containing all character | 353 For such events, the ``data`` item is a unicode object containing all character |
312 data between the comment delimiters:: | 354 data between the comment delimiters: |
355 | |
356 .. code-block:: python | |
313 | 357 |
314 COMMENT, u'Commented out', pos | 358 COMMENT, u'Commented out', pos |
315 | 359 |
316 PI | 360 PI |
317 -- | 361 -- |
318 A processing instruction. | 362 A processing instruction. |
319 | 363 |
320 The ``data`` item is a tuple of the form ``(target, data)`` for processing | 364 The ``data`` item is a tuple of the form ``(target, data)`` for processing |
321 instructions, where ``target`` is the target of the PI (used to identify the | 365 instructions, where ``target`` is the target of the PI (used to identify the |
322 application by which the instruction should be processed), and ``data`` is text | 366 application by which the instruction should be processed), and ``data`` is text |
323 following the target (excluding the terminating question mark):: | 367 following the target (excluding the terminating question mark): |
368 | |
369 .. code-block:: python | |
324 | 370 |
325 PI, (u'php', u'echo "Yo" '), pos | 371 PI, (u'php', u'echo "Yo" '), pos |
326 | 372 |
327 START_CDATA | 373 START_CDATA |
328 ----------- | 374 ----------- |
329 Marks the beginning of a ``CDATA`` section. | 375 Marks the beginning of a ``CDATA`` section. |
330 | 376 |
331 The ``data`` item for such events is always ``None``:: | 377 The ``data`` item for such events is always ``None``: |
378 | |
379 .. code-block:: python | |
332 | 380 |
333 START_CDATA, None, pos | 381 START_CDATA, None, pos |
334 | 382 |
335 END_CDATA | 383 END_CDATA |
336 --------- | 384 --------- |
337 Marks the end of a ``CDATA`` section. | 385 Marks the end of a ``CDATA`` section. |
338 | 386 |
339 The ``data`` item for such events is always ``None``:: | 387 The ``data`` item for such events is always ``None``: |
388 | |
389 .. code-block:: python | |
340 | 390 |
341 END_CDATA, None, pos | 391 END_CDATA, None, pos |