Mercurial > genshi > mirror
comparison doc/streams.txt @ 226:4d8a9e03b23d trunk
Add reStructuredText documentation files.
author | cmlenz |
---|---|
date | Fri, 08 Sep 2006 08:44:31 +0000 |
parents | |
children | 84168828b074 |
comparison
equal
deleted
inserted
replaced
225:16d7b5db7ef4 | 226:4d8a9e03b23d |
---|---|
1 .. -*- mode: rst; encoding: utf-8 -*- | |
2 | |
3 ============== | |
4 Markup Streams | |
5 ============== | |
6 | |
7 A stream is the common representation of markup as a *stream of events*. | |
8 | |
9 | |
10 .. contents:: Contents | |
11 :depth: 2 | |
12 .. sectnum:: | |
13 | |
14 | |
15 Basics | |
16 ====== | |
17 | |
18 A stream can be attained in a number of ways. It can be: | |
19 | |
20 * the result of parsing XML or HTML text, or | |
21 * programmatically generated, or | |
22 * the result of selecting a subset of another stream filtered by an XPath | |
23 expression. | |
24 | |
25 For example, the functions ``XML()`` and ``HTML()`` can be used to convert | |
26 literal XML or HTML text to a markup stream:: | |
27 | |
28 >>> from markup import XML | |
29 >>> stream = XML('<p class="intro">Some text and ' | |
30 ... '<a href="http://example.org/">a link</a>.' | |
31 ... '<br/></p>') | |
32 >>> stream | |
33 <markup.core.Stream object at 0x6bef0> | |
34 | |
35 The stream is the result of parsing the text into events. Each event is a tuple | |
36 of the form ``(kind, data, pos)``, where: | |
37 | |
38 * ``kind`` defines what kind of event it is (such as the start of an element, | |
39 text, a comment, etc). | |
40 * ``data`` is the actual data associated with the event. How this looks depends | |
41 on the event kind. | |
42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the | |
43 event “comes from”. | |
44 | |
45 :: | |
46 | |
47 >>> for kind, data, pos in stream: | |
48 ... print kind, `data`, pos | |
49 ... | |
50 START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0) | |
51 TEXT u'Some text and ' ('<string>', 1, 31) | |
52 START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31) | |
53 TEXT u'a link' ('<string>', 1, 67) | |
54 END u'a' ('<string>', 1, 67) | |
55 TEXT u'.' ('<string>', 1, 72) | |
56 START (u'br', []) ('<string>', 1, 72) | |
57 END u'br' ('<string>', 1, 77) | |
58 END u'p' ('<string>', 1, 77) | |
59 | |
60 | |
61 Filtering | |
62 ========= | |
63 | |
64 One important feature of markup streams is that you can apply *filters* to the | |
65 stream, either filters that come with Markup, or your own custom filters. | |
66 | |
67 A filter is simply a callable that accepts the stream as parameter, and returns | |
68 the filtered stream:: | |
69 | |
70 def noop(stream): | |
71 """A filter that doesn't actually do anything with the stream.""" | |
72 for kind, data, pos in stream: | |
73 yield kind, data, pos | |
74 | |
75 Filters can be applied in a number of ways. The simplest is to just call the | |
76 filter directly:: | |
77 | |
78 stream = noop(stream) | |
79 | |
80 The ``Stream`` class also provides a ``filter()`` method, which takes an | |
81 arbitrary number of filter callables and applies them all:: | |
82 | |
83 stream = stream.filter(noop) | |
84 | |
85 Finally, filters can also be applied using the *bitwise or* operator (``|``), | |
86 which allows a syntax similar to pipes on Unix shells:: | |
87 | |
88 stream = stream | noop | |
89 | |
90 One example of a filter included with Markup is the ``HTMLSanitizer`` in | |
91 ``markup.filters``. It processes a stream of HTML markup, and strips out any | |
92 potentially dangerous constructs, such as Javascript event handlers. | |
93 ``HTMLSanitizer`` is not a function, but rather a class that implements | |
94 ``__call__``, which means instances of the class are callable. | |
95 | |
96 Both the ``filter()`` method and the pipe operator allow easy chaining of | |
97 filters:: | |
98 | |
99 from markup.filters import HTMLSanitizer | |
100 stream = stream.filter(noop, HTMLSanitizer()) | |
101 | |
102 That is equivalent to:: | |
103 | |
104 stream = stream | noop | HTMLSanitizer() | |
105 | |
106 | |
107 Serialization | |
108 ============= | |
109 | |
110 The ``Stream`` class provides two methods for serializing this list of events: | |
111 ``serialize()`` and ``render()``. The former is a generator that yields chunks | |
112 of ``Markup`` objects (which are basically unicode strings). The latter returns | |
113 a single string, by default UTF-8 encoded. | |
114 | |
115 Here's the output from ``serialize()``:: | |
116 | |
117 >>> for output in stream.serialize(): | |
118 ... print `output` | |
119 ... | |
120 <Markup u'<p class="intro">'> | |
121 <Markup u'Some text and '> | |
122 <Markup u'<a href="http://example.org/">'> | |
123 <Markup u'a link'> | |
124 <Markup u'</a>'> | |
125 <Markup u'.'> | |
126 <Markup u'<br/>'> | |
127 <Markup u'</p>'> | |
128 | |
129 And here's the output from ``render()``:: | |
130 | |
131 >>> print stream.render() | |
132 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> | |
133 | |
134 Both methods can be passed a ``method`` parameter that determines how exactly | |
135 the events are serialzed to text. This parameter can be either “xml” (the | |
136 default), “xhtml”, “html”, “text”, or a custom serializer class:: | |
137 | |
138 >>> print stream.render('html') | |
139 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> | |
140 | |
141 Note how the `<br>` element isn't closed, which is the right thing to do for | |
142 HTML. | |
143 | |
144 In addition, the ``render()`` method takes an ``encoding`` parameter, which | |
145 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. | |
146 | |
147 The different serializer classes in ``markup.output`` can also be used | |
148 directly:: | |
149 | |
150 >>> from markup.filters import HTMLSanitizer | |
151 >>> from markup.output import TextSerializer | |
152 >>> print TextSerializer()(HTMLSanitizer()(stream)) | |
153 Some text and a link. | |
154 | |
155 The pipe operator allows a nicer syntax:: | |
156 | |
157 >>> print stream | HTMLSanitizer() | TextSerializer() | |
158 Some text and a link. | |
159 | |
160 Using XPath | |
161 =========== | |
162 | |
163 XPath can be used to extract a specific subset of the stream via the | |
164 ``select()`` method:: | |
165 | |
166 >>> substream = stream.select('a') | |
167 >>> substream | |
168 <markup.core.Stream object at 0x7118b0> | |
169 >>> print substream | |
170 <a href="http://example.org/">a link</a> | |
171 | |
172 Often, streams cannot be reused: in the above example, the sub-stream is based | |
173 on a generator. Once it has been serialized, it will have been fully consumed, | |
174 and cannot be rendered again. To work around this, you can wrap such a stream | |
175 in a ``list``:: | |
176 | |
177 >>> from markup import Stream | |
178 >>> substream = Stream(list(stream.select('a'))) | |
179 >>> substream | |
180 <markup.core.Stream object at 0x7118b0> | |
181 >>> print substream | |
182 <a href="http://example.org/">a link</a> | |
183 >>> print substream.select('@href') | |
184 http://example.org/ | |
185 >>> print substream.select('text()') | |
186 a link |