Mercurial > genshi > mirror
comparison doc/streams.txt @ 745:74b5c5476ddb trunk
Preparing for [milestone:0.5] release.
author | cmlenz |
---|---|
date | Mon, 09 Jun 2008 09:50:03 +0000 |
parents | be0b4a7b2fd4 |
children | f459f22f7ad2 |
comparison
equal
deleted
inserted
replaced
744:cd6624cf2f7c | 745:74b5c5476ddb |
---|---|
6 | 6 |
7 A stream is the common representation of markup as a *stream of events*. | 7 A stream is the common representation of markup as a *stream of events*. |
8 | 8 |
9 | 9 |
10 .. contents:: Contents | 10 .. contents:: Contents |
11 :depth: 1 | 11 :depth: 2 |
12 .. sectnum:: | 12 .. sectnum:: |
13 | 13 |
14 | 14 |
15 Basics | 15 Basics |
16 ====== | 16 ====== |
130 | 130 |
131 Serialization means producing some kind of textual output from a stream of | 131 Serialization means producing some kind of textual output from a stream of |
132 events, which you'll need when you want to transmit or store the results of | 132 events, which you'll need when you want to transmit or store the results of |
133 generating or otherwise processing markup. | 133 generating or otherwise processing markup. |
134 | 134 |
135 The ``Stream`` class provides two methods for serialization: ``serialize()`` and | 135 The ``Stream`` class provides two methods for serialization: ``serialize()`` |
136 ``render()``. The former is a generator that yields chunks of ``Markup`` objects | 136 and ``render()``. The former is a generator that yields chunks of ``Markup`` |
137 (which are basically unicode strings that are considered safe for output on the | 137 objects (which are basically unicode strings that are considered safe for |
138 web). The latter returns a single string, by default UTF-8 encoded. | 138 output on the web). The latter returns a single string, by default UTF-8 |
139 encoded. | |
139 | 140 |
140 Here's the output from ``serialize()``: | 141 Here's the output from ``serialize()``: |
141 | 142 |
142 .. code-block:: pycon | 143 .. code-block:: pycon |
143 | 144 |
159 | 160 |
160 >>> print stream.render() | 161 >>> print stream.render() |
161 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> | 162 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> |
162 | 163 |
163 Both methods can be passed a ``method`` parameter that determines how exactly | 164 Both methods can be passed a ``method`` parameter that determines how exactly |
164 the events are serialzed to text. This parameter can be either “xml” (the | 165 the events are serialized to text. This parameter can be either a string or a |
165 default), “xhtml”, “html”, “text”, or a custom serializer class: | 166 custom serializer class: |
166 | 167 |
167 .. code-block:: pycon | 168 .. code-block:: pycon |
168 | 169 |
169 >>> print stream.render('html') | 170 >>> print stream.render('html') |
170 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> | 171 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> |
171 | 172 |
172 Note how the `<br>` element isn't closed, which is the right thing to do for | 173 Note how the `<br>` element isn't closed, which is the right thing to do for |
173 HTML. | 174 HTML. See `serialization methods`_ for more details. |
174 | 175 |
175 In addition, the ``render()`` method takes an ``encoding`` parameter, which | 176 In addition, the ``render()`` method takes an ``encoding`` parameter, which |
176 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. | 177 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. |
177 | 178 |
178 The different serializer classes in ``genshi.output`` can also be used | 179 The different serializer classes in ``genshi.output`` can also be used |
191 | 192 |
192 >>> print stream | HTMLSanitizer() | TextSerializer() | 193 >>> print stream | HTMLSanitizer() | TextSerializer() |
193 Some text and a link. | 194 Some text and a link. |
194 | 195 |
195 | 196 |
197 .. _`serialization methods`: | |
198 | |
199 Serialization Methods | |
200 --------------------- | |
201 | |
202 Genshi supports the use of different serialization methods to use for creating | |
203 a text representation of a markup stream. | |
204 | |
205 ``xml`` | |
206 The ``XMLSerializer`` is the default serialization method and results in | |
207 proper XML output including namespace support, the XML declaration, CDATA | |
208 sections, and so on. It is not generally not suitable for serving HTML or | |
209 XHTML web pages (unless you want to use true XHTML 1.1), for which the | |
210 ``xhtml`` and ``html`` serializers described below should be preferred. | |
211 | |
212 ``xhtml`` | |
213 The ``XHTMLSerializer`` is a specialization of the generic ``XMLSerializer`` | |
214 that understands the pecularities of producing XML-compliant output that can | |
215 also be parsed without problems by the HTML parsers found in modern web | |
216 browsers. Thus, the output by this serializer should be usable whether sent | |
217 as "text/html" or "application/xhtml+html" (although there are a lot of | |
218 subtle issues to pay attention to when switching between the two, in | |
219 particular with respect to differences in the DOM and CSS). | |
220 | |
221 For example, instead of rendering a script tag as ``<script/>`` (which | |
222 confuses the HTML parser in many browsers), it will produce | |
223 ``<script></script>``. Also, it will normalize any boolean attributes values | |
224 that are minimized in HTML, so that for example ``<hr noshade="1"/>`` | |
225 becomes ``<hr noshade="noshade" />``. | |
226 | |
227 This serializer supports the use of namespaces for compound documents, for | |
228 example to use inline SVG inside an XHTML document. | |
229 | |
230 ``html`` | |
231 The ``HTMLSerializer`` produces proper HTML markup. The main differences | |
232 compared to ``xhtml`` serialization are that boolean attributes are | |
233 minimized, empty tags are not self-closing (so it's ``<br>`` instead of | |
234 ``<br />``), and that the contents of ``<script>`` and ``<style>`` elements | |
235 are not escaped. | |
236 | |
237 ``text`` | |
238 The ``TextSerializer`` produces plain text from markup streams. This is | |
239 useful primarily for `text templates`_, but can also be used to produce | |
240 plain text output from markup templates or other sources. | |
241 | |
242 .. _`text templates`: text-templates.html | |
243 | |
244 | |
196 Serialization Options | 245 Serialization Options |
197 --------------------- | 246 --------------------- |
198 | 247 |
199 Both ``serialize()`` and ``render()`` support additional keyword arguments that | 248 Both ``serialize()`` and ``render()`` support additional keyword arguments that |
200 are passed through to the initializer of the serializer class. The following | 249 are passed through to the initializer of the serializer class. The following |
201 options are supported by the built-in serializers: | 250 options are supported by the built-in serializers: |
202 | 251 |
203 ``strip_whitespace`` | 252 ``strip_whitespace`` |
204 Whether the serializer should remove trailing spaces and empty lines. Defaults | 253 Whether the serializer should remove trailing spaces and empty lines. |
205 to ``True``. | 254 Defaults to ``True``. |
206 | 255 |
207 (This option is not available for serialization to plain text.) | 256 (This option is not available for serialization to plain text.) |
208 | 257 |
209 ``doctype`` | 258 ``doctype`` |
210 A ``(name, pubid, sysid)`` tuple defining the name, publid identifier, and | 259 A ``(name, pubid, sysid)`` tuple defining the name, publid identifier, and |
211 system identifier of a ``DOCTYPE`` declaration to prepend to the generated | 260 system identifier of a ``DOCTYPE`` declaration to prepend to the generated |
212 output. If provided, this declaration will override any ``DOCTYPE`` | 261 output. If provided, this declaration will override any ``DOCTYPE`` |
213 declaration in the stream. | 262 declaration in the stream. |
214 | 263 |
264 The parameter can also be specified as a string to refer to commonly used | |
265 doctypes: | |
266 | |
267 +-----------------------------+-------------------------------------------+ | |
268 | Shorthand | DOCTYPE | | |
269 +=============================+===========================================+ | |
270 | ``html`` or | HTML 4.01 Strict | | |
271 | ``html-strict`` | | | |
272 +-----------------------------+-------------------------------------------+ | |
273 | ``html-transitional`` | HTML 4.01 Transitional | | |
274 +-----------------------------+-------------------------------------------+ | |
275 | ``html-frameset`` | HTML 4.01 Frameset | | |
276 +-----------------------------+-------------------------------------------+ | |
277 | ``html5`` | DOCTYPE proposed for the work-in-progress | | |
278 | | HTML5 standard | | |
279 +-----------------------------+-------------------------------------------+ | |
280 | ``xhtml`` or | XHTML 1.0 Strict | | |
281 | ``xhtml-strict`` | | | |
282 +-----------------------------+-------------------------------------------+ | |
283 | ``xhtml-transitional`` | XHTML 1.0 Transitional | | |
284 +-----------------------------+-------------------------------------------+ | |
285 | ``xhtml-frameset`` | XHTML 1.0 Frameset | | |
286 +-----------------------------+-------------------------------------------+ | |
287 | ``xhtml11`` | XHTML 1.1 | | |
288 +-----------------------------+-------------------------------------------+ | |
289 | ``svg`` or ``svg-full`` | SVG 1.1 | | |
290 +-----------------------------+-------------------------------------------+ | |
291 | ``svg-basic`` | SVG 1.1 Basic | | |
292 +-----------------------------+-------------------------------------------+ | |
293 | ``svg-tiny`` | SVG 1.1 Tiny | | |
294 +-----------------------------+-------------------------------------------+ | |
295 | |
215 (This option is not available for serialization to plain text.) | 296 (This option is not available for serialization to plain text.) |
216 | 297 |
217 ``namespace_prefixes`` | 298 ``namespace_prefixes`` |
218 The namespace prefixes to use for namespace that are not bound to a prefix | 299 The namespace prefixes to use for namespace that are not bound to a prefix |
219 in the stream itself. | 300 in the stream itself. |
224 Whether to remove the XML declaration (the ``<?xml ?>`` part at the | 305 Whether to remove the XML declaration (the ``<?xml ?>`` part at the |
225 beginning of a document) when serializing. This defaults to ``True`` as an | 306 beginning of a document) when serializing. This defaults to ``True`` as an |
226 XML declaration throws some older browsers into "Quirks" rendering mode. | 307 XML declaration throws some older browsers into "Quirks" rendering mode. |
227 | 308 |
228 (This option is only available for serialization to XHTML.) | 309 (This option is only available for serialization to XHTML.) |
310 | |
311 ``strip_markup`` | |
312 Whether the text serializer should detect and remove any tags or entity | |
313 encoded characters in the text. | |
314 | |
315 (This option is only available for serialization to plain text.) | |
229 | 316 |
230 | 317 |
231 | 318 |
232 Using XPath | 319 Using XPath |
233 =========== | 320 =========== |