annotate doc/streams.txt @ 226:4d8a9e03b23d trunk

Add reStructuredText documentation files.
author cmlenz
date Fri, 08 Sep 2006 08:44:31 +0000
parents
children 84168828b074
rev   line source
226
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
1 .. -*- mode: rst; encoding: utf-8 -*-
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
2
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
3 ==============
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
4 Markup Streams
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
5 ==============
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
6
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
7 A stream is the common representation of markup as a *stream of events*.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
8
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
9
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
10 .. contents:: Contents
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
11 :depth: 2
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
12 .. sectnum::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
13
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
14
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
15 Basics
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
16 ======
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
17
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
18 A stream can be attained in a number of ways. It can be:
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
19
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
20 * the result of parsing XML or HTML text, or
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
21 * programmatically generated, or
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
22 * the result of selecting a subset of another stream filtered by an XPath
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
23 expression.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
24
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
25 For example, the functions ``XML()`` and ``HTML()`` can be used to convert
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
26 literal XML or HTML text to a markup stream::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
27
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
28 >>> from markup import XML
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
29 >>> stream = XML('<p class="intro">Some text and '
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
30 ... '<a href="http://example.org/">a link</a>.'
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
31 ... '<br/></p>')
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
32 >>> stream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
33 <markup.core.Stream object at 0x6bef0>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
34
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
35 The stream is the result of parsing the text into events. Each event is a tuple
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
36 of the form ``(kind, data, pos)``, where:
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
37
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
38 * ``kind`` defines what kind of event it is (such as the start of an element,
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
39 text, a comment, etc).
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
40 * ``data`` is the actual data associated with the event. How this looks depends
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
41 on the event kind.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
43 event “comes from”.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
44
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
45 ::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
46
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
47 >>> for kind, data, pos in stream:
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
48 ... print kind, `data`, pos
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
49 ...
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
50 START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
51 TEXT u'Some text and ' ('<string>', 1, 31)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
52 START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
53 TEXT u'a link' ('<string>', 1, 67)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
54 END u'a' ('<string>', 1, 67)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
55 TEXT u'.' ('<string>', 1, 72)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
56 START (u'br', []) ('<string>', 1, 72)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
57 END u'br' ('<string>', 1, 77)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
58 END u'p' ('<string>', 1, 77)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
59
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
60
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
61 Filtering
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
62 =========
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
63
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
64 One important feature of markup streams is that you can apply *filters* to the
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
65 stream, either filters that come with Markup, or your own custom filters.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
66
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
67 A filter is simply a callable that accepts the stream as parameter, and returns
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
68 the filtered stream::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
69
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
70 def noop(stream):
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
71 """A filter that doesn't actually do anything with the stream."""
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
72 for kind, data, pos in stream:
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
73 yield kind, data, pos
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
74
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
75 Filters can be applied in a number of ways. The simplest is to just call the
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
76 filter directly::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
77
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
78 stream = noop(stream)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
79
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
80 The ``Stream`` class also provides a ``filter()`` method, which takes an
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
81 arbitrary number of filter callables and applies them all::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
82
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
83 stream = stream.filter(noop)
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
84
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
85 Finally, filters can also be applied using the *bitwise or* operator (``|``),
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
86 which allows a syntax similar to pipes on Unix shells::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
87
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
88 stream = stream | noop
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
89
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
90 One example of a filter included with Markup is the ``HTMLSanitizer`` in
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
91 ``markup.filters``. It processes a stream of HTML markup, and strips out any
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
92 potentially dangerous constructs, such as Javascript event handlers.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
93 ``HTMLSanitizer`` is not a function, but rather a class that implements
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
94 ``__call__``, which means instances of the class are callable.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
95
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
96 Both the ``filter()`` method and the pipe operator allow easy chaining of
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
97 filters::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
98
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
99 from markup.filters import HTMLSanitizer
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
100 stream = stream.filter(noop, HTMLSanitizer())
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
101
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
102 That is equivalent to::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
103
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
104 stream = stream | noop | HTMLSanitizer()
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
105
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
106
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
107 Serialization
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
108 =============
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
109
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
110 The ``Stream`` class provides two methods for serializing this list of events:
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
111 ``serialize()`` and ``render()``. The former is a generator that yields chunks
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
112 of ``Markup`` objects (which are basically unicode strings). The latter returns
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
113 a single string, by default UTF-8 encoded.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
114
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
115 Here's the output from ``serialize()``::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
116
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
117 >>> for output in stream.serialize():
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
118 ... print `output`
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
119 ...
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
120 <Markup u'<p class="intro">'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
121 <Markup u'Some text and '>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
122 <Markup u'<a href="http://example.org/">'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
123 <Markup u'a link'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
124 <Markup u'</a>'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
125 <Markup u'.'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
126 <Markup u'<br/>'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
127 <Markup u'</p>'>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
128
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
129 And here's the output from ``render()``::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
130
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
131 >>> print stream.render()
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
132 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
133
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
134 Both methods can be passed a ``method`` parameter that determines how exactly
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
135 the events are serialzed to text. This parameter can be either “xml” (the
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
136 default), “xhtml”, “html”, “text”, or a custom serializer class::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
137
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
138 >>> print stream.render('html')
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
139 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
140
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
141 Note how the `<br>` element isn't closed, which is the right thing to do for
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
142 HTML.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
143
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
144 In addition, the ``render()`` method takes an ``encoding`` parameter, which
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
145 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
146
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
147 The different serializer classes in ``markup.output`` can also be used
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
148 directly::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
149
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
150 >>> from markup.filters import HTMLSanitizer
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
151 >>> from markup.output import TextSerializer
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
152 >>> print TextSerializer()(HTMLSanitizer()(stream))
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
153 Some text and a link.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
154
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
155 The pipe operator allows a nicer syntax::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
156
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
157 >>> print stream | HTMLSanitizer() | TextSerializer()
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
158 Some text and a link.
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
159
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
160 Using XPath
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
161 ===========
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
162
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
163 XPath can be used to extract a specific subset of the stream via the
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
164 ``select()`` method::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
165
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
166 >>> substream = stream.select('a')
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
167 >>> substream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
168 <markup.core.Stream object at 0x7118b0>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
169 >>> print substream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
170 <a href="http://example.org/">a link</a>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
171
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
172 Often, streams cannot be reused: in the above example, the sub-stream is based
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
173 on a generator. Once it has been serialized, it will have been fully consumed,
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
174 and cannot be rendered again. To work around this, you can wrap such a stream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
175 in a ``list``::
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
176
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
177 >>> from markup import Stream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
178 >>> substream = Stream(list(stream.select('a')))
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
179 >>> substream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
180 <markup.core.Stream object at 0x7118b0>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
181 >>> print substream
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
182 <a href="http://example.org/">a link</a>
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
183 >>> print substream.select('@href')
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
184 http://example.org/
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
185 >>> print substream.select('text()')
4d8a9e03b23d Add reStructuredText documentation files.
cmlenz
parents:
diff changeset
186 a link
Copyright (C) 2012-2017 Edgewall Software