comparison doc/streams.txt @ 382:2682dabbcd04 trunk

* Added documentation for the various stream event kinds. * Move generation of HTML documentation into a custom distutils command, run by `setup.py build_doc` * Added verification of doctest snippets in documentation, which can be run by `setup.py test_doc` * Fixed `repr` of `Markup` instances.
author cmlenz
date Fri, 01 Dec 2006 23:43:59 +0000
parents 84168828b074
children cab6b0256019
comparison
equal deleted inserted replaced
381:b9fc7a1f76ca 382:2682dabbcd04
6 6
7 A stream is the common representation of markup as a *stream of events*. 7 A stream is the common representation of markup as a *stream of events*.
8 8
9 9
10 .. contents:: Contents 10 .. contents:: Contents
11 :depth: 2 11 :depth: 1
12 .. sectnum:: 12 .. sectnum::
13 13
14 14
15 Basics 15 Basics
16 ====== 16 ======
28 >>> from genshi import XML 28 >>> from genshi import XML
29 >>> stream = XML('<p class="intro">Some text and ' 29 >>> stream = XML('<p class="intro">Some text and '
30 ... '<a href="http://example.org/">a link</a>.' 30 ... '<a href="http://example.org/">a link</a>.'
31 ... '<br/></p>') 31 ... '<br/></p>')
32 >>> stream 32 >>> stream
33 <genshi.core.Stream object at 0x6bef0> 33 <genshi.core.Stream object at ...>
34 34
35 The stream is the result of parsing the text into events. Each event is a tuple 35 The stream is the result of parsing the text into events. Each event is a tuple
36 of the form ``(kind, data, pos)``, where: 36 of the form ``(kind, data, pos)``, where:
37 37
38 * ``kind`` defines what kind of event it is (such as the start of an element, 38 * ``kind`` defines what kind of event it is (such as the start of an element,
39 text, a comment, etc). 39 text, a comment, etc).
40 * ``data`` is the actual data associated with the event. How this looks depends 40 * ``data`` is the actual data associated with the event. How this looks depends
41 on the event kind. 41 on the event kind (see `event kinds`_)
42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the 42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the
43 event “comes from”. 43 event “comes from”.
44 44
45 :: 45 ::
46 46
47 >>> for kind, data, pos in stream: 47 >>> for kind, data, pos in stream:
48 ... print kind, `data`, pos 48 ... print kind, `data`, pos
49 ... 49 ...
50 START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0) 50 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0)
51 TEXT u'Some text and ' ('<string>', 1, 31) 51 TEXT u'Some text and ' (None, 1, 17)
52 START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31) 52 START (QName(u'a'), Attrs([(QName(u'href'), u'http://example.org/')])) (None, 1, 31)
53 TEXT u'a link' ('<string>', 1, 67) 53 TEXT u'a link' (None, 1, 61)
54 END u'a' ('<string>', 1, 67) 54 END QName(u'a') (None, 1, 67)
55 TEXT u'.' ('<string>', 1, 72) 55 TEXT u'.' (None, 1, 71)
56 START (u'br', []) ('<string>', 1, 72) 56 START (QName(u'br'), Attrs()) (None, 1, 72)
57 END u'br' ('<string>', 1, 77) 57 END QName(u'br') (None, 1, 77)
58 END u'p' ('<string>', 1, 77) 58 END QName(u'p') (None, 1, 77)
59 59
60 60
61 Filtering 61 Filtering
62 ========= 62 =========
63 63
148 The different serializer classes in ``genshi.output`` can also be used 148 The different serializer classes in ``genshi.output`` can also be used
149 directly:: 149 directly::
150 150
151 >>> from genshi.filters import HTMLSanitizer 151 >>> from genshi.filters import HTMLSanitizer
152 >>> from genshi.output import TextSerializer 152 >>> from genshi.output import TextSerializer
153 >>> print TextSerializer()(HTMLSanitizer()(stream)) 153 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream)))
154 Some text and a link. 154 Some text and a link.
155 155
156 The pipe operator allows a nicer syntax:: 156 The pipe operator allows a nicer syntax::
157 157
158 >>> print stream | HTMLSanitizer() | TextSerializer() 158 >>> print stream | HTMLSanitizer() | TextSerializer()
159 Some text and a link. 159 Some text and a link.
160 160
161
161 Using XPath 162 Using XPath
162 =========== 163 ===========
163 164
164 XPath can be used to extract a specific subset of the stream via the 165 XPath can be used to extract a specific subset of the stream via the
165 ``select()`` method:: 166 ``select()`` method::
166 167
167 >>> substream = stream.select('a') 168 >>> substream = stream.select('a')
168 >>> substream 169 >>> substream
169 <genshi.core.Stream object at 0x7118b0> 170 <genshi.core.Stream object at ...>
170 >>> print substream 171 >>> print substream
171 <a href="http://example.org/">a link</a> 172 <a href="http://example.org/">a link</a>
172 173
173 Often, streams cannot be reused: in the above example, the sub-stream is based 174 Often, streams cannot be reused: in the above example, the sub-stream is based
174 on a generator. Once it has been serialized, it will have been fully consumed, 175 on a generator. Once it has been serialized, it will have been fully consumed,
176 in a ``list``:: 177 in a ``list``::
177 178
178 >>> from genshi import Stream 179 >>> from genshi import Stream
179 >>> substream = Stream(list(stream.select('a'))) 180 >>> substream = Stream(list(stream.select('a')))
180 >>> substream 181 >>> substream
181 <genshi.core.Stream object at 0x7118b0> 182 <genshi.core.Stream object at ...>
182 >>> print substream 183 >>> print substream
183 <a href="http://example.org/">a link</a> 184 <a href="http://example.org/">a link</a>
184 >>> print substream.select('@href') 185 >>> print substream.select('@href')
185 http://example.org/ 186 http://example.org/
186 >>> print substream.select('text()') 187 >>> print substream.select('text()')
187 a link 188 a link
189
190 See `Using XPath in Genshi`_ for more information about the XPath support in
191 Genshi.
192
193 .. _`Using XPath in Genshi`: xpath.html
194
195
196 .. _`event kinds`:
197
198 Event Kinds
199 ===========
200
201 Every event in a stream is of one of several *kinds*, which also determines
202 what the ``data`` item of the event tuple looks like. The different kinds of
203 events are documented below.
204
205 .. note:: The ``data`` item is generally immutable. It the data is to be
206 modified when processing a stream, it must be replaced by a new tuple.
207 Effectively, this means the entire event tuple is immutable.
208
209 START
210 -----
211 The opening tag of an element.
212
213 For this kind of event, the ``data`` item is a tuple of the form
214 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the
215 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing
216 the attribute names and values associated with the tag (excluding namespace
217 declarations)::
218
219 START, (QName(u'p'), Attrs([(u'class', u'intro')])), pos
220
221 END
222 ---
223 The closing tag of an element.
224
225 The ``data`` item of end events consists of just a ``QName`` instance
226 describing the qualified name of the tag::
227
228 END, QName(u'p'), pos
229
230 TEXT
231 ----
232 Character data outside of elements and other nodes.
233
234 For text events, the ``data`` item should be a unicode object::
235
236 TEXT, u'Hello, world!', pos
237
238 START_NS
239 --------
240 The start of a namespace mapping, binding a namespace prefix to a URI.
241
242 The ``data`` item of this kind of event is a tuple of the form
243 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the
244 full URI to which the prefix is bound. Both should be unicode objects. If the
245 namespace is not bound to any prefix, the ``prefix`` item is an empty string::
246
247 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos
248
249 END_NS
250 ------
251 The end of a namespace mapping.
252
253 The ``data`` item of such events consists of only the namespace prefix (a
254 unicode object)::
255
256 END_NS, u'svg', pos
257
258 DOCTYPE
259 -------
260 A document type declaration.
261
262 For this type of event, the ``data`` item is a tuple of the form
263 ``(name, pubid, sysid)``, where ``name`` is the name of the root element,
264 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is
265 the system identifier of the DTD (or ``None``)::
266
267 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \
268 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos
269
270 COMMENT
271 -------
272 A comment.
273
274 For such events, the ``data`` item is a unicode object containing all character
275 data between the comment delimiters::
276
277 COMMENT, u'Commented out', pos
278
279 PI
280 --
281 A processing instruction.
282
283 The ``data`` item is a tuple of the form ``(target, data)`` for processing
284 instructions, where ``target`` is the target of the PI (used to identify the
285 application by which the instruction should be processed), and ``data`` is text
286 following the target (excluding the terminating question mark)::
287
288 PI, (u'php', u'echo "Yo" '), pos
289
290 START_CDATA
291 -----------
292 Marks the beginning of a ``CDATA`` section.
293
294 The ``data`` item for such events is always ``None``::
295
296 START_CDATA, None, pos
297
298 END_CDATA
299 ---------
300 Marks the end of a ``CDATA`` section.
301
302 The ``data`` item for such events is always ``None``::
303
304 END_CDATA, None, pos
Copyright (C) 2012-2017 Edgewall Software