Mercurial > genshi > genshi-test
comparison doc/filters.txt @ 500:0742f421caba experimental-inline
Merged revisions 487-603 via svnmerge from
http://svn.edgewall.org/repos/genshi/trunk
author | cmlenz |
---|---|
date | Fri, 01 Jun 2007 17:21:47 +0000 |
parents | |
children | a332cb9c70d5 1a29617a5d87 1837f39efd6f |
comparison
equal
deleted
inserted
replaced
499:869b7885a516 | 500:0742f421caba |
---|---|
1 .. -*- mode: rst; encoding: utf-8 -*- | |
2 | |
3 ============== | |
4 Stream Filters | |
5 ============== | |
6 | |
7 `Markup Streams`_ showed how to write filters and how they are applied to | |
8 markup streams. This page describes the features of the various filters that | |
9 come with Genshi itself. | |
10 | |
11 .. _`Markup Streams`: streams.html | |
12 | |
13 .. contents:: Contents | |
14 :depth: 1 | |
15 .. sectnum:: | |
16 | |
17 | |
18 HTML Form Filler | |
19 ================ | |
20 | |
21 The filter ``genshi.filters.HTMLFormFiller`` can automatically populate an HTML | |
22 form from values provided as a simple dictionary. When using thi filter, you can | |
23 basically omit any ``value``, ``selected``, or ``checked`` attributes from form | |
24 controls in your templates, and let the filter do all that work for you. | |
25 | |
26 ``HTMLFormFiller`` takes a dictionary of data to populate the form with, where | |
27 the keys should match the names of form elements, and the values determine the | |
28 values of those controls. For example:: | |
29 | |
30 >>> from genshi.filters import HTMLFormFiller | |
31 >>> from genshi.template import MarkupTemplate | |
32 >>> template = MarkupTemplate("""<form> | |
33 ... <p> | |
34 ... <label>User name: | |
35 ... <input type="text" name="username" /> | |
36 ... </label><br /> | |
37 ... <label>Password: | |
38 ... <input type="password" name="password" /> | |
39 ... </label><br /> | |
40 ... <label> | |
41 ... <input type="checkbox" name="remember" /> Remember me | |
42 ... </label> | |
43 ... </p> | |
44 ... </form>""") | |
45 >>> filler = HTMLFormFiller(data=dict(username='john', remember=True)) | |
46 >>> print template.generate() | filler | |
47 <form> | |
48 <p> | |
49 <label>User name: | |
50 <input type="text" name="username" value="john"/> | |
51 </label><br/> | |
52 <label>Password: | |
53 <input type="password" name="password"/> | |
54 </label><br/> | |
55 <label> | |
56 <input type="checkbox" name="remember" checked="checked"/> Remember me | |
57 </label> | |
58 </p> | |
59 </form> | |
60 | |
61 .. note:: This processing is done without in any way reparsing the template | |
62 output. As any stream filter it operates after the template output is | |
63 generated but *before* that output is actually serialized. | |
64 | |
65 The filter will of course also handle radio buttons as well as ``<select>`` and | |
66 ``<textarea>`` elements. For radio buttons to be marked as checked, the value in | |
67 the data dictionary needs to match the ``value`` attribute of the ``<input>`` | |
68 element, or evaluate to a truth value if the element has no such attribute. For | |
69 options in a ``<select>`` box to be marked as selected, the value in the data | |
70 dictionary needs to match the ``value`` attribute of the ``<option>`` element, | |
71 or the text content of the option if it has no ``value`` attribute. Password and | |
72 file input fields are not populated, as most browsers would ignore that anyway | |
73 for security reasons. | |
74 | |
75 You'll want to make sure that the values in the data dictionary have already | |
76 been converted to strings. While the filter may be able to deal with non-string | |
77 data in some cases (such as check boxes), in most cases it will either not | |
78 attempt any conversion or not produce the desired results. | |
79 | |
80 You can restrict the form filler to operate only on a specific ``<form>`` by | |
81 passing either the ``id`` or the ``name`` keyword argument to the initializer. | |
82 If either of those is specified, the filter will only apply to form tags with | |
83 an attribute matching the specified value. | |
84 | |
85 | |
86 HTML Sanitizer | |
87 ============== | |
88 | |
89 The filter ``genshi.filters.HTMLSanitizer`` filter can be used to clean up | |
90 user-submitted HTML markup, removing potentially dangerous constructs that could | |
91 be used for various kinds of abuse, such as cross-site scripting (XSS) attacks:: | |
92 | |
93 >>> from genshi.filters import HTMLSanitizer | |
94 >>> from genshi.input import HTML | |
95 >>> html = HTML("""<div> | |
96 ... <p>Innocent looking text.</p> | |
97 ... <script>alert("Danger: " + document.cookie)</script> | |
98 ... </div>""") | |
99 >>> sanitize = HTMLSanitizer() | |
100 >>> print html | sanitize | |
101 <div> | |
102 <p>Innocent looking text.</p> | |
103 </div> | |
104 | |
105 In this example, the ``<script>`` tag was removed from the output. | |
106 | |
107 You can determine which tags and attributes should be allowed by initializing | |
108 the filter with corresponding sets. See the API documentation for more | |
109 information. | |
110 | |
111 Inline ``style`` attributes are forbidden by default. If you allow them, the | |
112 filter will still perform sanitization on the contents any encountered inline | |
113 styles: the proprietary ``expression()`` function (supported only by Internet | |
114 Explorer) is removed, and any property using an ``url()`` which a potentially | |
115 dangerous URL scheme (such as ``javascript:``) are also stripped out:: | |
116 | |
117 >>> from genshi.filters import HTMLSanitizer | |
118 >>> from genshi.input import HTML | |
119 >>> html = HTML("""<div> | |
120 ... <br style="background: url(javascript:alert(document.cookie); color: #000" /> | |
121 ... </div>""") | |
122 >>> sanitize = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style'])) | |
123 >>> print html | sanitize | |
124 <div> | |
125 <br style="color: #000"/> | |
126 </div> | |
127 | |
128 .. warning:: You should probably not rely on the ``style`` filtering, as | |
129 sanitizing mixed HTML, CSS, and Javascript is very complicated and | |
130 suspect to various browser bugs. If you can somehow get away with | |
131 not allowing inline styles in user-submitted content, that would | |
132 definitely be the safer route to follow. |