Mercurial > genshi > genshi-test
comparison doc/streams.txt @ 382:d7da3fba7faf
* Added documentation for the various stream event kinds.
* Move generation of HTML documentation into a custom distutils command, run by `setup.py build_doc`
* Added verification of doctest snippets in documentation, which can be run by `setup.py test_doc`
* Fixed `repr` of `Markup` instances.
author | cmlenz |
---|---|
date | Fri, 01 Dec 2006 23:43:59 +0000 |
parents | 24757b771651 |
children | ebc7c1a3bc4d |
comparison
equal
deleted
inserted
replaced
381:a6c2a9cd2e92 | 382:d7da3fba7faf |
---|---|
6 | 6 |
7 A stream is the common representation of markup as a *stream of events*. | 7 A stream is the common representation of markup as a *stream of events*. |
8 | 8 |
9 | 9 |
10 .. contents:: Contents | 10 .. contents:: Contents |
11 :depth: 2 | 11 :depth: 1 |
12 .. sectnum:: | 12 .. sectnum:: |
13 | 13 |
14 | 14 |
15 Basics | 15 Basics |
16 ====== | 16 ====== |
28 >>> from genshi import XML | 28 >>> from genshi import XML |
29 >>> stream = XML('<p class="intro">Some text and ' | 29 >>> stream = XML('<p class="intro">Some text and ' |
30 ... '<a href="http://example.org/">a link</a>.' | 30 ... '<a href="http://example.org/">a link</a>.' |
31 ... '<br/></p>') | 31 ... '<br/></p>') |
32 >>> stream | 32 >>> stream |
33 <genshi.core.Stream object at 0x6bef0> | 33 <genshi.core.Stream object at ...> |
34 | 34 |
35 The stream is the result of parsing the text into events. Each event is a tuple | 35 The stream is the result of parsing the text into events. Each event is a tuple |
36 of the form ``(kind, data, pos)``, where: | 36 of the form ``(kind, data, pos)``, where: |
37 | 37 |
38 * ``kind`` defines what kind of event it is (such as the start of an element, | 38 * ``kind`` defines what kind of event it is (such as the start of an element, |
39 text, a comment, etc). | 39 text, a comment, etc). |
40 * ``data`` is the actual data associated with the event. How this looks depends | 40 * ``data`` is the actual data associated with the event. How this looks depends |
41 on the event kind. | 41 on the event kind (see `event kinds`_) |
42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the | 42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the |
43 event “comes from”. | 43 event “comes from”. |
44 | 44 |
45 :: | 45 :: |
46 | 46 |
47 >>> for kind, data, pos in stream: | 47 >>> for kind, data, pos in stream: |
48 ... print kind, `data`, pos | 48 ... print kind, `data`, pos |
49 ... | 49 ... |
50 START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0) | 50 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0) |
51 TEXT u'Some text and ' ('<string>', 1, 31) | 51 TEXT u'Some text and ' (None, 1, 17) |
52 START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31) | 52 START (QName(u'a'), Attrs([(QName(u'href'), u'http://example.org/')])) (None, 1, 31) |
53 TEXT u'a link' ('<string>', 1, 67) | 53 TEXT u'a link' (None, 1, 61) |
54 END u'a' ('<string>', 1, 67) | 54 END QName(u'a') (None, 1, 67) |
55 TEXT u'.' ('<string>', 1, 72) | 55 TEXT u'.' (None, 1, 71) |
56 START (u'br', []) ('<string>', 1, 72) | 56 START (QName(u'br'), Attrs()) (None, 1, 72) |
57 END u'br' ('<string>', 1, 77) | 57 END QName(u'br') (None, 1, 77) |
58 END u'p' ('<string>', 1, 77) | 58 END QName(u'p') (None, 1, 77) |
59 | 59 |
60 | 60 |
61 Filtering | 61 Filtering |
62 ========= | 62 ========= |
63 | 63 |
148 The different serializer classes in ``genshi.output`` can also be used | 148 The different serializer classes in ``genshi.output`` can also be used |
149 directly:: | 149 directly:: |
150 | 150 |
151 >>> from genshi.filters import HTMLSanitizer | 151 >>> from genshi.filters import HTMLSanitizer |
152 >>> from genshi.output import TextSerializer | 152 >>> from genshi.output import TextSerializer |
153 >>> print TextSerializer()(HTMLSanitizer()(stream)) | 153 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream))) |
154 Some text and a link. | 154 Some text and a link. |
155 | 155 |
156 The pipe operator allows a nicer syntax:: | 156 The pipe operator allows a nicer syntax:: |
157 | 157 |
158 >>> print stream | HTMLSanitizer() | TextSerializer() | 158 >>> print stream | HTMLSanitizer() | TextSerializer() |
159 Some text and a link. | 159 Some text and a link. |
160 | 160 |
161 | |
161 Using XPath | 162 Using XPath |
162 =========== | 163 =========== |
163 | 164 |
164 XPath can be used to extract a specific subset of the stream via the | 165 XPath can be used to extract a specific subset of the stream via the |
165 ``select()`` method:: | 166 ``select()`` method:: |
166 | 167 |
167 >>> substream = stream.select('a') | 168 >>> substream = stream.select('a') |
168 >>> substream | 169 >>> substream |
169 <genshi.core.Stream object at 0x7118b0> | 170 <genshi.core.Stream object at ...> |
170 >>> print substream | 171 >>> print substream |
171 <a href="http://example.org/">a link</a> | 172 <a href="http://example.org/">a link</a> |
172 | 173 |
173 Often, streams cannot be reused: in the above example, the sub-stream is based | 174 Often, streams cannot be reused: in the above example, the sub-stream is based |
174 on a generator. Once it has been serialized, it will have been fully consumed, | 175 on a generator. Once it has been serialized, it will have been fully consumed, |
176 in a ``list``:: | 177 in a ``list``:: |
177 | 178 |
178 >>> from genshi import Stream | 179 >>> from genshi import Stream |
179 >>> substream = Stream(list(stream.select('a'))) | 180 >>> substream = Stream(list(stream.select('a'))) |
180 >>> substream | 181 >>> substream |
181 <genshi.core.Stream object at 0x7118b0> | 182 <genshi.core.Stream object at ...> |
182 >>> print substream | 183 >>> print substream |
183 <a href="http://example.org/">a link</a> | 184 <a href="http://example.org/">a link</a> |
184 >>> print substream.select('@href') | 185 >>> print substream.select('@href') |
185 http://example.org/ | 186 http://example.org/ |
186 >>> print substream.select('text()') | 187 >>> print substream.select('text()') |
187 a link | 188 a link |
189 | |
190 See `Using XPath in Genshi`_ for more information about the XPath support in | |
191 Genshi. | |
192 | |
193 .. _`Using XPath in Genshi`: xpath.html | |
194 | |
195 | |
196 .. _`event kinds`: | |
197 | |
198 Event Kinds | |
199 =========== | |
200 | |
201 Every event in a stream is of one of several *kinds*, which also determines | |
202 what the ``data`` item of the event tuple looks like. The different kinds of | |
203 events are documented below. | |
204 | |
205 .. note:: The ``data`` item is generally immutable. It the data is to be | |
206 modified when processing a stream, it must be replaced by a new tuple. | |
207 Effectively, this means the entire event tuple is immutable. | |
208 | |
209 START | |
210 ----- | |
211 The opening tag of an element. | |
212 | |
213 For this kind of event, the ``data`` item is a tuple of the form | |
214 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the | |
215 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing | |
216 the attribute names and values associated with the tag (excluding namespace | |
217 declarations):: | |
218 | |
219 START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos | |
220 | |
221 END | |
222 --- | |
223 The closing tag of an element. | |
224 | |
225 The ``data`` item of end events consists of just a ``QName`` instance | |
226 describing the qualified name of the tag:: | |
227 | |
228 END, QName(u'p'), pos | |
229 | |
230 TEXT | |
231 ---- | |
232 Character data outside of elements and other nodes. | |
233 | |
234 For text events, the ``data`` item should be a unicode object:: | |
235 | |
236 TEXT, u'Hello, world!', pos | |
237 | |
238 START_NS | |
239 -------- | |
240 The start of a namespace mapping, binding a namespace prefix to a URI. | |
241 | |
242 The ``data`` item of this kind of event is a tuple of the form | |
243 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the | |
244 full URI to which the prefix is bound. Both should be unicode objects. If the | |
245 namespace is not bound to any prefix, the ``prefix`` item is an empty string:: | |
246 | |
247 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos | |
248 | |
249 END_NS | |
250 ------ | |
251 The end of a namespace mapping. | |
252 | |
253 The ``data`` item of such events consists of only the namespace prefix (a | |
254 unicode object):: | |
255 | |
256 END_NS, u'svg', pos | |
257 | |
258 DOCTYPE | |
259 ------- | |
260 A document type declaration. | |
261 | |
262 For this type of event, the ``data`` item is a tuple of the form | |
263 ``(name, pubid, sysid)``, where ``name`` is the name of the root element, | |
264 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is | |
265 the system identifier of the DTD (or ``None``):: | |
266 | |
267 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ | |
268 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos | |
269 | |
270 COMMENT | |
271 ------- | |
272 A comment. | |
273 | |
274 For such events, the ``data`` item is a unicode object containing all character | |
275 data between the comment delimiters:: | |
276 | |
277 COMMENT, u'Commented out', pos | |
278 | |
279 PI | |
280 -- | |
281 A processing instruction. | |
282 | |
283 The ``data`` item is a tuple of the form ``(target, data)`` for processing | |
284 instructions, where ``target`` is the target of the PI (used to identify the | |
285 application by which the instruction should be processed), and ``data`` is text | |
286 following the target (excluding the terminating question mark):: | |
287 | |
288 PI, (u'php', u'echo "Yo" '), pos | |
289 | |
290 START_CDATA | |
291 ----------- | |
292 Marks the beginning of a ``CDATA`` section. | |
293 | |
294 The ``data`` item for such events is always ``None``:: | |
295 | |
296 START_CDATA, None, pos | |
297 | |
298 END_CDATA | |
299 --------- | |
300 Marks the end of a ``CDATA`` section. | |
301 | |
302 The ``data`` item for such events is always ``None``:: | |
303 | |
304 END_CDATA, None, pos |