comparison doc/streams.txt @ 511:1a29617a5d87 stable-0.4.x

Ported [611:614] to 0.4.x branch.
author cmlenz
date Wed, 06 Jun 2007 11:18:46 +0000
parents 6fd7e4dc0318
children
comparison
equal deleted inserted replaced
495:90eecd360b18 511:1a29617a5d87
20 * the result of parsing XML or HTML text, or 20 * the result of parsing XML or HTML text, or
21 * the result of selecting a subset of another stream using XPath, or 21 * the result of selecting a subset of another stream using XPath, or
22 * programmatically generated. 22 * programmatically generated.
23 23
24 For example, the functions ``XML()`` and ``HTML()`` can be used to convert 24 For example, the functions ``XML()`` and ``HTML()`` can be used to convert
25 literal XML or HTML text to a markup stream:: 25 literal XML or HTML text to a markup stream:
26
27 .. code-block:: pycon
26 28
27 >>> from genshi import XML 29 >>> from genshi import XML
28 >>> stream = XML('<p class="intro">Some text and ' 30 >>> stream = XML('<p class="intro">Some text and '
29 ... '<a href="http://example.org/">a link</a>.' 31 ... '<a href="http://example.org/">a link</a>.'
30 ... '<br/></p>') 32 ... '<br/></p>')
39 * ``data`` is the actual data associated with the event. How this looks depends 41 * ``data`` is the actual data associated with the event. How this looks depends
40 on the event kind (see `event kinds`_) 42 on the event kind (see `event kinds`_)
41 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the 43 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the
42 event “comes from”. 44 event “comes from”.
43 45
44 :: 46 .. code-block:: pycon
45 47
46 >>> for kind, data, pos in stream: 48 >>> for kind, data, pos in stream:
47 ... print kind, `data`, pos 49 ... print kind, `data`, pos
48 ... 50 ...
49 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0) 51 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0)
62 64
63 One important feature of markup streams is that you can apply *filters* to the 65 One important feature of markup streams is that you can apply *filters* to the
64 stream, either filters that come with Genshi, or your own custom filters. 66 stream, either filters that come with Genshi, or your own custom filters.
65 67
66 A filter is simply a callable that accepts the stream as parameter, and returns 68 A filter is simply a callable that accepts the stream as parameter, and returns
67 the filtered stream:: 69 the filtered stream:
70
71 .. code-block:: python
68 72
69 def noop(stream): 73 def noop(stream):
70 """A filter that doesn't actually do anything with the stream.""" 74 """A filter that doesn't actually do anything with the stream."""
71 for kind, data, pos in stream: 75 for kind, data, pos in stream:
72 yield kind, data, pos 76 yield kind, data, pos
73 77
74 Filters can be applied in a number of ways. The simplest is to just call the 78 Filters can be applied in a number of ways. The simplest is to just call the
75 filter directly:: 79 filter directly:
80
81 .. code-block:: python
76 82
77 stream = noop(stream) 83 stream = noop(stream)
78 84
79 The ``Stream`` class also provides a ``filter()`` method, which takes an 85 The ``Stream`` class also provides a ``filter()`` method, which takes an
80 arbitrary number of filter callables and applies them all:: 86 arbitrary number of filter callables and applies them all:
87
88 .. code-block:: python
81 89
82 stream = stream.filter(noop) 90 stream = stream.filter(noop)
83 91
84 Finally, filters can also be applied using the *bitwise or* operator (``|``), 92 Finally, filters can also be applied using the *bitwise or* operator (``|``),
85 which allows a syntax similar to pipes on Unix shells:: 93 which allows a syntax similar to pipes on Unix shells:
94
95 .. code-block:: python
86 96
87 stream = stream | noop 97 stream = stream | noop
88 98
89 One example of a filter included with Genshi is the ``HTMLSanitizer`` in 99 One example of a filter included with Genshi is the ``HTMLSanitizer`` in
90 ``genshi.filters``. It processes a stream of HTML markup, and strips out any 100 ``genshi.filters``. It processes a stream of HTML markup, and strips out any
91 potentially dangerous constructs, such as Javascript event handlers. 101 potentially dangerous constructs, such as Javascript event handlers.
92 ``HTMLSanitizer`` is not a function, but rather a class that implements 102 ``HTMLSanitizer`` is not a function, but rather a class that implements
93 ``__call__``, which means instances of the class are callable:: 103 ``__call__``, which means instances of the class are callable:
104
105 .. code-block:: python
94 106
95 stream = stream | HTMLSanitizer() 107 stream = stream | HTMLSanitizer()
96 108
97 Both the ``filter()`` method and the pipe operator allow easy chaining of 109 Both the ``filter()`` method and the pipe operator allow easy chaining of
98 filters:: 110 filters:
111
112 .. code-block:: python
99 113
100 from genshi.filters import HTMLSanitizer 114 from genshi.filters import HTMLSanitizer
101 stream = stream.filter(noop, HTMLSanitizer()) 115 stream = stream.filter(noop, HTMLSanitizer())
102 116
103 That is equivalent to:: 117 That is equivalent to:
118
119 .. code-block:: python
104 120
105 stream = stream | noop | HTMLSanitizer() 121 stream = stream | noop | HTMLSanitizer()
106 122
107 For more information about the built-in filters, see `Stream Filters`_. 123 For more information about the built-in filters, see `Stream Filters`_.
108 124
119 The ``Stream`` class provides two methods for serialization: ``serialize()`` and 135 The ``Stream`` class provides two methods for serialization: ``serialize()`` and
120 ``render()``. The former is a generator that yields chunks of ``Markup`` objects 136 ``render()``. The former is a generator that yields chunks of ``Markup`` objects
121 (which are basically unicode strings that are considered safe for output on the 137 (which are basically unicode strings that are considered safe for output on the
122 web). The latter returns a single string, by default UTF-8 encoded. 138 web). The latter returns a single string, by default UTF-8 encoded.
123 139
124 Here's the output from ``serialize()``:: 140 Here's the output from ``serialize()``:
141
142 .. code-block:: pycon
125 143
126 >>> for output in stream.serialize(): 144 >>> for output in stream.serialize():
127 ... print `output` 145 ... print `output`
128 ... 146 ...
129 <Markup u'<p class="intro">'> 147 <Markup u'<p class="intro">'>
133 <Markup u'</a>'> 151 <Markup u'</a>'>
134 <Markup u'.'> 152 <Markup u'.'>
135 <Markup u'<br/>'> 153 <Markup u'<br/>'>
136 <Markup u'</p>'> 154 <Markup u'</p>'>
137 155
138 And here's the output from ``render()``:: 156 And here's the output from ``render()``:
157
158 .. code-block:: pycon
139 159
140 >>> print stream.render() 160 >>> print stream.render()
141 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> 161 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p>
142 162
143 Both methods can be passed a ``method`` parameter that determines how exactly 163 Both methods can be passed a ``method`` parameter that determines how exactly
144 the events are serialzed to text. This parameter can be either “xml” (the 164 the events are serialzed to text. This parameter can be either “xml” (the
145 default), “xhtml”, “html”, “text”, or a custom serializer class:: 165 default), “xhtml”, “html”, “text”, or a custom serializer class:
166
167 .. code-block:: pycon
146 168
147 >>> print stream.render('html') 169 >>> print stream.render('html')
148 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> 170 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p>
149 171
150 Note how the `<br>` element isn't closed, which is the right thing to do for 172 Note how the `<br>` element isn't closed, which is the right thing to do for
152 174
153 In addition, the ``render()`` method takes an ``encoding`` parameter, which 175 In addition, the ``render()`` method takes an ``encoding`` parameter, which
154 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. 176 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string.
155 177
156 The different serializer classes in ``genshi.output`` can also be used 178 The different serializer classes in ``genshi.output`` can also be used
157 directly:: 179 directly:
180
181 .. code-block:: pycon
158 182
159 >>> from genshi.filters import HTMLSanitizer 183 >>> from genshi.filters import HTMLSanitizer
160 >>> from genshi.output import TextSerializer 184 >>> from genshi.output import TextSerializer
161 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream))) 185 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream)))
162 Some text and a link. 186 Some text and a link.
163 187
164 The pipe operator allows a nicer syntax:: 188 The pipe operator allows a nicer syntax:
189
190 .. code-block:: pycon
165 191
166 >>> print stream | HTMLSanitizer() | TextSerializer() 192 >>> print stream | HTMLSanitizer() | TextSerializer()
167 Some text and a link. 193 Some text and a link.
168 194
169 195
198 224
199 Using XPath 225 Using XPath
200 =========== 226 ===========
201 227
202 XPath can be used to extract a specific subset of the stream via the 228 XPath can be used to extract a specific subset of the stream via the
203 ``select()`` method:: 229 ``select()`` method:
230
231 .. code-block:: pycon
204 232
205 >>> substream = stream.select('a') 233 >>> substream = stream.select('a')
206 >>> substream 234 >>> substream
207 <genshi.core.Stream object at ...> 235 <genshi.core.Stream object at ...>
208 >>> print substream 236 >>> print substream
209 <a href="http://example.org/">a link</a> 237 <a href="http://example.org/">a link</a>
210 238
211 Often, streams cannot be reused: in the above example, the sub-stream is based 239 Often, streams cannot be reused: in the above example, the sub-stream is based
212 on a generator. Once it has been serialized, it will have been fully consumed, 240 on a generator. Once it has been serialized, it will have been fully consumed,
213 and cannot be rendered again. To work around this, you can wrap such a stream 241 and cannot be rendered again. To work around this, you can wrap such a stream
214 in a ``list``:: 242 in a ``list``:
243
244 .. code-block:: pycon
215 245
216 >>> from genshi import Stream 246 >>> from genshi import Stream
217 >>> substream = Stream(list(stream.select('a'))) 247 >>> substream = Stream(list(stream.select('a')))
218 >>> substream 248 >>> substream
219 <genshi.core.Stream object at ...> 249 <genshi.core.Stream object at ...>
249 279
250 For this kind of event, the ``data`` item is a tuple of the form 280 For this kind of event, the ``data`` item is a tuple of the form
251 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the 281 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the
252 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing 282 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing
253 the attribute names and values associated with the tag (excluding namespace 283 the attribute names and values associated with the tag (excluding namespace
254 declarations):: 284 declarations):
285
286 .. code-block:: python
255 287
256 START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos 288 START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos
257 289
258 END 290 END
259 --- 291 ---
260 The closing tag of an element. 292 The closing tag of an element.
261 293
262 The ``data`` item of end events consists of just a ``QName`` instance 294 The ``data`` item of end events consists of just a ``QName`` instance
263 describing the qualified name of the tag:: 295 describing the qualified name of the tag:
296
297 .. code-block:: python
264 298
265 END, QName(u'p'), pos 299 END, QName(u'p'), pos
266 300
267 TEXT 301 TEXT
268 ---- 302 ----
269 Character data outside of elements and comments. 303 Character data outside of elements and comments.
270 304
271 For text events, the ``data`` item should be a unicode object:: 305 For text events, the ``data`` item should be a unicode object:
306
307 .. code-block:: python
272 308
273 TEXT, u'Hello, world!', pos 309 TEXT, u'Hello, world!', pos
274 310
275 START_NS 311 START_NS
276 -------- 312 --------
277 The start of a namespace mapping, binding a namespace prefix to a URI. 313 The start of a namespace mapping, binding a namespace prefix to a URI.
278 314
279 The ``data`` item of this kind of event is a tuple of the form 315 The ``data`` item of this kind of event is a tuple of the form
280 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the 316 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the
281 full URI to which the prefix is bound. Both should be unicode objects. If the 317 full URI to which the prefix is bound. Both should be unicode objects. If the
282 namespace is not bound to any prefix, the ``prefix`` item is an empty string:: 318 namespace is not bound to any prefix, the ``prefix`` item is an empty string:
319
320 .. code-block:: python
283 321
284 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos 322 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos
285 323
286 END_NS 324 END_NS
287 ------ 325 ------
288 The end of a namespace mapping. 326 The end of a namespace mapping.
289 327
290 The ``data`` item of such events consists of only the namespace prefix (a 328 The ``data`` item of such events consists of only the namespace prefix (a
291 unicode object):: 329 unicode object):
330
331 .. code-block:: python
292 332
293 END_NS, u'svg', pos 333 END_NS, u'svg', pos
294 334
295 DOCTYPE 335 DOCTYPE
296 ------- 336 -------
297 A document type declaration. 337 A document type declaration.
298 338
299 For this type of event, the ``data`` item is a tuple of the form 339 For this type of event, the ``data`` item is a tuple of the form
300 ``(name, pubid, sysid)``, where ``name`` is the name of the root element, 340 ``(name, pubid, sysid)``, where ``name`` is the name of the root element,
301 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is 341 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is
302 the system identifier of the DTD (or ``None``):: 342 the system identifier of the DTD (or ``None``):
343
344 .. code-block:: python
303 345
304 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ 346 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \
305 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos 347 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos
306 348
307 COMMENT 349 COMMENT
308 ------- 350 -------
309 A comment. 351 A comment.
310 352
311 For such events, the ``data`` item is a unicode object containing all character 353 For such events, the ``data`` item is a unicode object containing all character
312 data between the comment delimiters:: 354 data between the comment delimiters:
355
356 .. code-block:: python
313 357
314 COMMENT, u'Commented out', pos 358 COMMENT, u'Commented out', pos
315 359
316 PI 360 PI
317 -- 361 --
318 A processing instruction. 362 A processing instruction.
319 363
320 The ``data`` item is a tuple of the form ``(target, data)`` for processing 364 The ``data`` item is a tuple of the form ``(target, data)`` for processing
321 instructions, where ``target`` is the target of the PI (used to identify the 365 instructions, where ``target`` is the target of the PI (used to identify the
322 application by which the instruction should be processed), and ``data`` is text 366 application by which the instruction should be processed), and ``data`` is text
323 following the target (excluding the terminating question mark):: 367 following the target (excluding the terminating question mark):
368
369 .. code-block:: python
324 370
325 PI, (u'php', u'echo "Yo" '), pos 371 PI, (u'php', u'echo "Yo" '), pos
326 372
327 START_CDATA 373 START_CDATA
328 ----------- 374 -----------
329 Marks the beginning of a ``CDATA`` section. 375 Marks the beginning of a ``CDATA`` section.
330 376
331 The ``data`` item for such events is always ``None``:: 377 The ``data`` item for such events is always ``None``:
378
379 .. code-block:: python
332 380
333 START_CDATA, None, pos 381 START_CDATA, None, pos
334 382
335 END_CDATA 383 END_CDATA
336 --------- 384 ---------
337 Marks the end of a ``CDATA`` section. 385 Marks the end of a ``CDATA`` section.
338 386
339 The ``data`` item for such events is always ``None``:: 387 The ``data`` item for such events is always ``None``:
388
389 .. code-block:: python
340 390
341 END_CDATA, None, pos 391 END_CDATA, None, pos
Copyright (C) 2012-2017 Edgewall Software