Mercurial > genshi > genshi-test
annotate doc/streams.txt @ 828:eb8aa8690480 experimental-inline
inline branch: template object can be compiled, and remembers the generated module.
author | cmlenz |
---|---|
date | Fri, 13 Mar 2009 16:06:42 +0000 |
parents | 1837f39efd6f |
children | 09cc3627654c |
rev | line source |
---|---|
226 | 1 .. -*- mode: rst; encoding: utf-8 -*- |
2 | |
3 ============== | |
4 Markup Streams | |
5 ============== | |
6 | |
7 A stream is the common representation of markup as a *stream of events*. | |
8 | |
9 | |
10 .. contents:: Contents | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
11 :depth: 2 |
226 | 12 .. sectnum:: |
13 | |
14 | |
15 Basics | |
16 ====== | |
17 | |
18 A stream can be attained in a number of ways. It can be: | |
19 | |
20 * the result of parsing XML or HTML text, or | |
500 | 21 * the result of selecting a subset of another stream using XPath, or |
22 * programmatically generated. | |
226 | 23 |
24 For example, the functions ``XML()`` and ``HTML()`` can be used to convert | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
25 literal XML or HTML text to a markup stream: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
26 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
27 .. code-block:: pycon |
226 | 28 |
230 | 29 >>> from genshi import XML |
226 | 30 >>> stream = XML('<p class="intro">Some text and ' |
31 ... '<a href="http://example.org/">a link</a>.' | |
32 ... '<br/></p>') | |
33 >>> stream | |
395 | 34 <genshi.core.Stream object at ...> |
226 | 35 |
36 The stream is the result of parsing the text into events. Each event is a tuple | |
37 of the form ``(kind, data, pos)``, where: | |
38 | |
39 * ``kind`` defines what kind of event it is (such as the start of an element, | |
40 text, a comment, etc). | |
41 * ``data`` is the actual data associated with the event. How this looks depends | |
395 | 42 on the event kind (see `event kinds`_) |
226 | 43 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the |
44 event “comes from”. | |
45 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
46 .. code-block:: pycon |
226 | 47 |
48 >>> for kind, data, pos in stream: | |
49 ... print kind, `data`, pos | |
50 ... | |
395 | 51 START (QName(u'p'), Attrs([(QName(u'class'), u'intro')])) (None, 1, 0) |
52 TEXT u'Some text and ' (None, 1, 17) | |
53 START (QName(u'a'), Attrs([(QName(u'href'), u'http://example.org/')])) (None, 1, 31) | |
54 TEXT u'a link' (None, 1, 61) | |
55 END QName(u'a') (None, 1, 67) | |
56 TEXT u'.' (None, 1, 71) | |
57 START (QName(u'br'), Attrs()) (None, 1, 72) | |
58 END QName(u'br') (None, 1, 77) | |
59 END QName(u'p') (None, 1, 77) | |
226 | 60 |
61 | |
62 Filtering | |
63 ========= | |
64 | |
65 One important feature of markup streams is that you can apply *filters* to the | |
230 | 66 stream, either filters that come with Genshi, or your own custom filters. |
226 | 67 |
68 A filter is simply a callable that accepts the stream as parameter, and returns | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
69 the filtered stream: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
70 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
71 .. code-block:: python |
226 | 72 |
73 def noop(stream): | |
74 """A filter that doesn't actually do anything with the stream.""" | |
75 for kind, data, pos in stream: | |
76 yield kind, data, pos | |
77 | |
78 Filters can be applied in a number of ways. The simplest is to just call the | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
79 filter directly: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
80 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
81 .. code-block:: python |
226 | 82 |
83 stream = noop(stream) | |
84 | |
85 The ``Stream`` class also provides a ``filter()`` method, which takes an | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
86 arbitrary number of filter callables and applies them all: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
87 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
88 .. code-block:: python |
226 | 89 |
90 stream = stream.filter(noop) | |
91 | |
92 Finally, filters can also be applied using the *bitwise or* operator (``|``), | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
93 which allows a syntax similar to pipes on Unix shells: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
94 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
95 .. code-block:: python |
226 | 96 |
97 stream = stream | noop | |
98 | |
230 | 99 One example of a filter included with Genshi is the ``HTMLSanitizer`` in |
100 ``genshi.filters``. It processes a stream of HTML markup, and strips out any | |
226 | 101 potentially dangerous constructs, such as Javascript event handlers. |
102 ``HTMLSanitizer`` is not a function, but rather a class that implements | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
103 ``__call__``, which means instances of the class are callable: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
104 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
105 .. code-block:: python |
500 | 106 |
107 stream = stream | HTMLSanitizer() | |
226 | 108 |
109 Both the ``filter()`` method and the pipe operator allow easy chaining of | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
110 filters: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
111 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
112 .. code-block:: python |
226 | 113 |
230 | 114 from genshi.filters import HTMLSanitizer |
226 | 115 stream = stream.filter(noop, HTMLSanitizer()) |
116 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
117 That is equivalent to: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
118 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
119 .. code-block:: python |
226 | 120 |
121 stream = stream | noop | HTMLSanitizer() | |
122 | |
500 | 123 For more information about the built-in filters, see `Stream Filters`_. |
124 | |
125 .. _`Stream Filters`: filters.html | |
126 | |
226 | 127 |
128 Serialization | |
129 ============= | |
130 | |
500 | 131 Serialization means producing some kind of textual output from a stream of |
132 events, which you'll need when you want to transmit or store the results of | |
133 generating or otherwise processing markup. | |
134 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
135 The ``Stream`` class provides two methods for serialization: ``serialize()`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
136 and ``render()``. The former is a generator that yields chunks of ``Markup`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
137 objects (which are basically unicode strings that are considered safe for |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
138 output on the web). The latter returns a single string, by default UTF-8 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
139 encoded. |
226 | 140 |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
141 Here's the output from ``serialize()``: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
142 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
143 .. code-block:: pycon |
226 | 144 |
145 >>> for output in stream.serialize(): | |
146 ... print `output` | |
147 ... | |
148 <Markup u'<p class="intro">'> | |
149 <Markup u'Some text and '> | |
150 <Markup u'<a href="http://example.org/">'> | |
151 <Markup u'a link'> | |
152 <Markup u'</a>'> | |
153 <Markup u'.'> | |
154 <Markup u'<br/>'> | |
155 <Markup u'</p>'> | |
156 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
157 And here's the output from ``render()``: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
158 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
159 .. code-block:: pycon |
226 | 160 |
161 >>> print stream.render() | |
162 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> | |
163 | |
164 Both methods can be passed a ``method`` parameter that determines how exactly | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
165 the events are serialized to text. This parameter can be either a string or a |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
166 custom serializer class: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
167 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
168 .. code-block:: pycon |
226 | 169 |
170 >>> print stream.render('html') | |
171 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> | |
172 | |
173 Note how the `<br>` element isn't closed, which is the right thing to do for | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
174 HTML. See `serialization methods`_ for more details. |
226 | 175 |
176 In addition, the ``render()`` method takes an ``encoding`` parameter, which | |
177 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. | |
178 | |
230 | 179 The different serializer classes in ``genshi.output`` can also be used |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
180 directly: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
181 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
182 .. code-block:: pycon |
226 | 183 |
230 | 184 >>> from genshi.filters import HTMLSanitizer |
185 >>> from genshi.output import TextSerializer | |
395 | 186 >>> print ''.join(TextSerializer()(HTMLSanitizer()(stream))) |
226 | 187 Some text and a link. |
188 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
189 The pipe operator allows a nicer syntax: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
190 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
191 .. code-block:: pycon |
226 | 192 |
193 >>> print stream | HTMLSanitizer() | TextSerializer() | |
194 Some text and a link. | |
195 | |
395 | 196 |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
197 .. _`serialization methods`: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
198 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
199 Serialization Methods |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
200 --------------------- |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
201 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
202 Genshi supports the use of different serialization methods to use for creating |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
203 a text representation of a markup stream. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
204 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
205 ``xml`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
206 The ``XMLSerializer`` is the default serialization method and results in |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
207 proper XML output including namespace support, the XML declaration, CDATA |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
208 sections, and so on. It is not generally not suitable for serving HTML or |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
209 XHTML web pages (unless you want to use true XHTML 1.1), for which the |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
210 ``xhtml`` and ``html`` serializers described below should be preferred. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
211 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
212 ``xhtml`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
213 The ``XHTMLSerializer`` is a specialization of the generic ``XMLSerializer`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
214 that understands the pecularities of producing XML-compliant output that can |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
215 also be parsed without problems by the HTML parsers found in modern web |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
216 browsers. Thus, the output by this serializer should be usable whether sent |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
217 as "text/html" or "application/xhtml+html" (although there are a lot of |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
218 subtle issues to pay attention to when switching between the two, in |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
219 particular with respect to differences in the DOM and CSS). |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
220 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
221 For example, instead of rendering a script tag as ``<script/>`` (which |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
222 confuses the HTML parser in many browsers), it will produce |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
223 ``<script></script>``. Also, it will normalize any boolean attributes values |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
224 that are minimized in HTML, so that for example ``<hr noshade="1"/>`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
225 becomes ``<hr noshade="noshade" />``. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
226 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
227 This serializer supports the use of namespaces for compound documents, for |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
228 example to use inline SVG inside an XHTML document. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
229 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
230 ``html`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
231 The ``HTMLSerializer`` produces proper HTML markup. The main differences |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
232 compared to ``xhtml`` serialization are that boolean attributes are |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
233 minimized, empty tags are not self-closing (so it's ``<br>`` instead of |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
234 ``<br />``), and that the contents of ``<script>`` and ``<style>`` elements |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
235 are not escaped. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
236 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
237 ``text`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
238 The ``TextSerializer`` produces plain text from markup streams. This is |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
239 useful primarily for `text templates`_, but can also be used to produce |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
240 plain text output from markup templates or other sources. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
241 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
242 .. _`text templates`: text-templates.html |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
243 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
244 |
500 | 245 Serialization Options |
246 --------------------- | |
247 | |
248 Both ``serialize()`` and ``render()`` support additional keyword arguments that | |
249 are passed through to the initializer of the serializer class. The following | |
250 options are supported by the built-in serializers: | |
251 | |
252 ``strip_whitespace`` | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
253 Whether the serializer should remove trailing spaces and empty lines. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
254 Defaults to ``True``. |
500 | 255 |
256 (This option is not available for serialization to plain text.) | |
257 | |
258 ``doctype`` | |
259 A ``(name, pubid, sysid)`` tuple defining the name, publid identifier, and | |
260 system identifier of a ``DOCTYPE`` declaration to prepend to the generated | |
261 output. If provided, this declaration will override any ``DOCTYPE`` | |
262 declaration in the stream. | |
263 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
264 The parameter can also be specified as a string to refer to commonly used |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
265 doctypes: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
266 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
267 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
268 | Shorthand | DOCTYPE | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
269 +=============================+===========================================+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
270 | ``html`` or | HTML 4.01 Strict | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
271 | ``html-strict`` | | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
272 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
273 | ``html-transitional`` | HTML 4.01 Transitional | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
274 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
275 | ``html-frameset`` | HTML 4.01 Frameset | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
276 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
277 | ``html5`` | DOCTYPE proposed for the work-in-progress | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
278 | | HTML5 standard | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
279 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
280 | ``xhtml`` or | XHTML 1.0 Strict | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
281 | ``xhtml-strict`` | | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
282 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
283 | ``xhtml-transitional`` | XHTML 1.0 Transitional | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
284 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
285 | ``xhtml-frameset`` | XHTML 1.0 Frameset | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
286 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
287 | ``xhtml11`` | XHTML 1.1 | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
288 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
289 | ``svg`` or ``svg-full`` | SVG 1.1 | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
290 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
291 | ``svg-basic`` | SVG 1.1 Basic | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
292 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
293 | ``svg-tiny`` | SVG 1.1 Tiny | |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
294 +-----------------------------+-------------------------------------------+ |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
295 |
500 | 296 (This option is not available for serialization to plain text.) |
297 | |
298 ``namespace_prefixes`` | |
299 The namespace prefixes to use for namespace that are not bound to a prefix | |
300 in the stream itself. | |
301 | |
302 (This option is not available for serialization to HTML or plain text.) | |
303 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
304 ``drop_xml_decl`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
305 Whether to remove the XML declaration (the ``<?xml ?>`` part at the |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
306 beginning of a document) when serializing. This defaults to ``True`` as an |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
307 XML declaration throws some older browsers into "Quirks" rendering mode. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
308 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
309 (This option is only available for serialization to XHTML.) |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
310 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
311 ``strip_markup`` |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
312 Whether the text serializer should detect and remove any tags or entity |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
313 encoded characters in the text. |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
314 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
315 (This option is only available for serialization to plain text.) |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
316 |
500 | 317 |
318 | |
226 | 319 Using XPath |
320 =========== | |
321 | |
322 XPath can be used to extract a specific subset of the stream via the | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
323 ``select()`` method: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
324 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
325 .. code-block:: pycon |
226 | 326 |
327 >>> substream = stream.select('a') | |
328 >>> substream | |
395 | 329 <genshi.core.Stream object at ...> |
226 | 330 >>> print substream |
331 <a href="http://example.org/">a link</a> | |
332 | |
333 Often, streams cannot be reused: in the above example, the sub-stream is based | |
334 on a generator. Once it has been serialized, it will have been fully consumed, | |
335 and cannot be rendered again. To work around this, you can wrap such a stream | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
336 in a ``list``: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
337 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
338 .. code-block:: pycon |
226 | 339 |
230 | 340 >>> from genshi import Stream |
226 | 341 >>> substream = Stream(list(stream.select('a'))) |
342 >>> substream | |
395 | 343 <genshi.core.Stream object at ...> |
226 | 344 >>> print substream |
345 <a href="http://example.org/">a link</a> | |
346 >>> print substream.select('@href') | |
347 http://example.org/ | |
348 >>> print substream.select('text()') | |
349 a link | |
395 | 350 |
351 See `Using XPath in Genshi`_ for more information about the XPath support in | |
352 Genshi. | |
353 | |
354 .. _`Using XPath in Genshi`: xpath.html | |
355 | |
356 | |
357 .. _`event kinds`: | |
358 | |
359 Event Kinds | |
360 =========== | |
361 | |
362 Every event in a stream is of one of several *kinds*, which also determines | |
363 what the ``data`` item of the event tuple looks like. The different kinds of | |
364 events are documented below. | |
365 | |
366 .. note:: The ``data`` item is generally immutable. If the data is to be | |
367 modified when processing a stream, it must be replaced by a new tuple. | |
368 Effectively, this means the entire event tuple is immutable. | |
369 | |
370 START | |
371 ----- | |
372 The opening tag of an element. | |
373 | |
374 For this kind of event, the ``data`` item is a tuple of the form | |
375 ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the | |
376 qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing | |
377 the attribute names and values associated with the tag (excluding namespace | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
378 declarations): |
395 | 379 |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
380 .. code-block:: python |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
381 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
382 START, (QName(u'p'), Attrs([(QName(u'class'), u'intro')])), pos |
395 | 383 |
384 END | |
385 --- | |
386 The closing tag of an element. | |
387 | |
388 The ``data`` item of end events consists of just a ``QName`` instance | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
389 describing the qualified name of the tag: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
390 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
391 .. code-block:: python |
395 | 392 |
393 END, QName(u'p'), pos | |
394 | |
395 TEXT | |
396 ---- | |
397 Character data outside of elements and comments. | |
398 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
399 For text events, the ``data`` item should be a unicode object: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
400 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
401 .. code-block:: python |
395 | 402 |
403 TEXT, u'Hello, world!', pos | |
404 | |
405 START_NS | |
406 -------- | |
407 The start of a namespace mapping, binding a namespace prefix to a URI. | |
408 | |
409 The ``data`` item of this kind of event is a tuple of the form | |
410 ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the | |
411 full URI to which the prefix is bound. Both should be unicode objects. If the | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
412 namespace is not bound to any prefix, the ``prefix`` item is an empty string: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
413 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
414 .. code-block:: python |
395 | 415 |
416 START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos | |
417 | |
418 END_NS | |
419 ------ | |
420 The end of a namespace mapping. | |
421 | |
422 The ``data`` item of such events consists of only the namespace prefix (a | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
423 unicode object): |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
424 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
425 .. code-block:: python |
395 | 426 |
427 END_NS, u'svg', pos | |
428 | |
429 DOCTYPE | |
430 ------- | |
431 A document type declaration. | |
432 | |
433 For this type of event, the ``data`` item is a tuple of the form | |
434 ``(name, pubid, sysid)``, where ``name`` is the name of the root element, | |
435 ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
436 the system identifier of the DTD (or ``None``): |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
437 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
438 .. code-block:: python |
395 | 439 |
440 DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ | |
441 u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos | |
442 | |
443 COMMENT | |
444 ------- | |
445 A comment. | |
446 | |
447 For such events, the ``data`` item is a unicode object containing all character | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
448 data between the comment delimiters: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
449 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
450 .. code-block:: python |
395 | 451 |
452 COMMENT, u'Commented out', pos | |
453 | |
454 PI | |
455 -- | |
456 A processing instruction. | |
457 | |
458 The ``data`` item is a tuple of the form ``(target, data)`` for processing | |
459 instructions, where ``target`` is the target of the PI (used to identify the | |
460 application by which the instruction should be processed), and ``data`` is text | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
461 following the target (excluding the terminating question mark): |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
462 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
463 .. code-block:: python |
395 | 464 |
465 PI, (u'php', u'echo "Yo" '), pos | |
466 | |
467 START_CDATA | |
468 ----------- | |
469 Marks the beginning of a ``CDATA`` section. | |
470 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
471 The ``data`` item for such events is always ``None``: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
472 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
473 .. code-block:: python |
395 | 474 |
475 START_CDATA, None, pos | |
476 | |
477 END_CDATA | |
478 --------- | |
479 Marks the end of a ``CDATA`` section. | |
480 | |
820
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
481 The ``data`` item for such events is always ``None``: |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
482 |
1837f39efd6f
Sync (old) experimental inline branch with trunk@1027.
cmlenz
parents:
500
diff
changeset
|
483 .. code-block:: python |
395 | 484 |
485 END_CDATA, None, pos |