annotate genshi/filters.py @ 431:ad01564e87f2 trunk

* Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97. * In case `style` attributes are explicitly allowed, also handle unicode escapes correctly.
author cmlenz
date Thu, 22 Mar 2007 18:13:02 +0000
parents 073640758a42
children 9f11c745fac9
rev   line source
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
2 #
408
4675d5cf6c67 Update copyright year for files modified this year.
cmlenz
parents: 403
diff changeset
3 # Copyright (C) 2006-2007 Edgewall Software
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
5 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 182
diff changeset
8 # are also available at http://genshi.edgewall.org/wiki/License.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
9 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 182
diff changeset
12 # history and logs, available at http://genshi.edgewall.org/log/.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
13
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
14 """Implementation of a number of stream filters."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
15
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
16 try:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
17 frozenset
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
18 except NameError:
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
19 from sets import ImmutableSet as frozenset
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
20 import re
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
21
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
22 from genshi.core import Attrs, QName, stripentities
363
37e4b4bb0b53 Parse template includes at parse time to avoid some runtime overhead.
cmlenz
parents: 345
diff changeset
23 from genshi.core import END, START, TEXT
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
24
363
37e4b4bb0b53 Parse template includes at parse time to avoid some runtime overhead.
cmlenz
parents: 345
diff changeset
25 __all__ = ['HTMLFormFiller', 'HTMLSanitizer']
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
26 __docformat__ = 'restructuredtext en'
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
27
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
28
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
29 class HTMLFormFiller(object):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
30 """A stream filter that can populate HTML forms from a dictionary of values.
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
31
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
32 >>> from genshi.input import HTML
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
33 >>> html = HTML('''<form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
34 ... <p><input type="text" name="foo" /></p>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
35 ... </form>''')
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
36 >>> filler = HTMLFormFiller(data={'foo': 'bar'})
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
37 >>> print html | filler
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
38 <form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
39 <p><input type="text" name="foo" value="bar"/></p>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
40 </form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
41 """
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
42 # TODO: only select the first radio button, and the first select option
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
43 # (if not in a multiple-select)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
44 # TODO: only apply to elements in the XHTML namespace (or no namespace)?
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
45
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
46 def __init__(self, name=None, id=None, data=None):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
47 """Create the filter.
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
48
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
49 :param name: The name of the form that should be populated. If this
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
50 parameter is given, only forms where the ``name`` attribute
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
51 value matches the parameter are processed.
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
52 :param id: The ID of the form that should be populated. If this
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
53 parameter is given, only forms where the ``id`` attribute
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
54 value matches the parameter are processed.
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
55 :param data: The dictionary of form values, where the keys are the names
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
56 of the form fields, and the values are the values to fill
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
57 in.
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
58 """
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
59 self.name = name
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
60 self.id = id
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
61 if data is None:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
62 data = {}
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
63 self.data = data
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
64
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
65 def __call__(self, stream, ctxt=None):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
66 """Apply the filter to the given stream.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
67
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
68 :param stream: the markup event stream to filter
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
69 :param ctxt: the template context (unused)
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
70 """
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
71 in_form = in_select = in_option = in_textarea = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
72 select_value = option_value = textarea_value = None
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
73 option_start = option_text = None
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
74
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
75 for kind, data, pos in stream:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
76
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
77 if kind is START:
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
78 tag, attrs = data
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
79 tagname = tag.localname
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
80
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
81 if tagname == 'form' and (
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
82 self.name and attrs.get('name') == self.name or
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
83 self.id and attrs.get('id') == self.id or
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
84 not (self.id or self.name)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
85 in_form = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
86
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
87 elif in_form:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
88 if tagname == 'input':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
89 type = attrs.get('type')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
90 if type in ('checkbox', 'radio'):
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
91 name = attrs.get('name')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
92 if name:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
93 value = self.data.get(name)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
94 declval = attrs.get('value')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
95 checked = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
96 if isinstance(value, (list, tuple)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
97 if declval:
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
98 checked = declval in [str(v) for v
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
99 in value]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
100 else:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
101 checked = bool(filter(None, value))
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
102 else:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
103 if declval:
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
104 checked = declval == str(value)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
105 elif type == 'checkbox':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
106 checked = bool(value)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
107 if checked:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
108 attrs |= [(QName('checked'), 'checked')]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
109 elif 'checked' in attrs:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
110 attrs -= 'checked'
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
111 elif type in (None, 'hidden', 'text'):
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
112 name = attrs.get('name')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
113 if name:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
114 value = self.data.get(name)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
115 if isinstance(value, (list, tuple)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
116 value = value[0]
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
117 if value is not None:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
118 attrs |= [(QName('value'), unicode(value))]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
119 elif tagname == 'select':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
120 name = attrs.get('name')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
121 select_value = self.data.get(name)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
122 in_select = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
123 elif tagname == 'textarea':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
124 name = attrs.get('name')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
125 textarea_value = self.data.get(name)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
126 if isinstance(textarea_value, (list, tuple)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
127 textarea_value = textarea_value[0]
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
128 in_textarea = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
129 elif in_select and tagname == 'option':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
130 option_start = kind, data, pos
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
131 option_value = attrs.get('value')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
132 in_option = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
133 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
134 yield kind, (tag, attrs), pos
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
135
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
136
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
137 elif in_form and kind is TEXT:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
138 if in_select and in_option:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
139 if option_value is None:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
140 option_value = data
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
141 option_text = kind, data, pos
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
142 continue
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
143 elif in_textarea:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
144 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
145 yield kind, data, pos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
146
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
147 elif in_form and kind is END:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
148 tagname = data.localname
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
149 if tagname == 'form':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
150 in_form = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
151 elif tagname == 'select':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
152 in_select = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
153 select_value = None
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
154 elif in_select and tagname == 'option':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
155 if isinstance(select_value, (tuple, list)):
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
156 selected = option_value in [str(v) for v
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
157 in select_value]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
158 else:
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
159 selected = option_value == str(select_value)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
160 okind, (tag, attrs), opos = option_start
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
161 if selected:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
162 attrs |= [(QName('selected'), 'selected')]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
163 elif 'selected' in attrs:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
164 attrs -= 'selected'
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
165 yield okind, (tag, attrs), opos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
166 if option_text:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
167 yield option_text
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
168 in_option = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
169 option_start = option_text = option_value = None
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
170 elif tagname == 'textarea':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
171 if textarea_value:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
172 yield TEXT, unicode(textarea_value), pos
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
173 in_textarea = False
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
174 yield kind, data, pos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
175
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
176 else:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
177 yield kind, data, pos
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
178
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
179
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
180 class HTMLSanitizer(object):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
181 """A filter that removes potentially dangerous HTML tags and attributes
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
182 from the stream.
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
183
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
184 >>> from genshi import HTML
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
185 >>> html = HTML('<div><script>alert(document.cookie)</script></div>')
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
186 >>> print html | HTMLSanitizer()
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
187 <div/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
188
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
189 The default set of safe tags and attributes can be modified when the filter
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
190 is instantiated. For example, to allow inline ``style`` attributes, the
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
191 following instantation would work:
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
192
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
193 >>> html = HTML('<div style="background: #000"></div>')
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
194 >>> sanitizer = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style']))
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
195 >>> print html | sanitizer
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
196 <div style="background: #000"/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
197
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
198 Note that even in this case, the filter *does* attempt to remove dangerous
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
199 constructs from style attributes:
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
200
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
201 >>> html = HTML('<div style="background: url(javascript:void); color: #000"></div>')
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
202 >>> print html | sanitizer
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
203 <div style="color: #000"/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
204
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
205 This handles HTML entities, unicode escapes in CSS and Javascript text, as
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
206 well as a lot of other things. However, the style tag is still excluded by
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
207 default because it is very hard for such sanitizing to be completely safe,
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
208 especially considering how much error recovery current web browsers perform.
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
209 """
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
210
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
211 SAFE_TAGS = frozenset(['a', 'abbr', 'acronym', 'address', 'area', 'b',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
212 'big', 'blockquote', 'br', 'button', 'caption', 'center', 'cite',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
213 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
214 'em', 'fieldset', 'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
215 'hr', 'i', 'img', 'input', 'ins', 'kbd', 'label', 'legend', 'li', 'map',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
216 'menu', 'ol', 'optgroup', 'option', 'p', 'pre', 'q', 's', 'samp',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
217 'select', 'small', 'span', 'strike', 'strong', 'sub', 'sup', 'table',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
218 'tbody', 'td', 'textarea', 'tfoot', 'th', 'thead', 'tr', 'tt', 'u',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
219 'ul', 'var'])
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
220
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
221 SAFE_ATTRS = frozenset(['abbr', 'accept', 'accept-charset', 'accesskey',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
222 'action', 'align', 'alt', 'axis', 'bgcolor', 'border', 'cellpadding',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
223 'cellspacing', 'char', 'charoff', 'charset', 'checked', 'cite', 'class',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
224 'clear', 'cols', 'colspan', 'color', 'compact', 'coords', 'datetime',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
225 'dir', 'disabled', 'enctype', 'for', 'frame', 'headers', 'height',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
226 'href', 'hreflang', 'hspace', 'id', 'ismap', 'label', 'lang',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
227 'longdesc', 'maxlength', 'media', 'method', 'multiple', 'name',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
228 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly', 'rel', 'rev',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
229 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
230 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title',
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
231 'type', 'usemap', 'valign', 'value', 'vspace', 'width'])
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
232
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
233 SAFE_SCHEMES = frozenset(['file', 'ftp', 'http', 'https', 'mailto', None])
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
234
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
235 URI_ATTRS = frozenset(['action', 'background', 'dynsrc', 'href', 'lowsrc',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
236 'src'])
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
237
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
238 def __init__(self, safe_tags=SAFE_TAGS, safe_attrs=SAFE_ATTRS,
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
239 safe_schemes=SAFE_SCHEMES, uri_attrs=URI_ATTRS):
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
240 """Create the sanitizer.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
241
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
242 The exact set of allowed elements and attributes can be configured.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
243
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
244 :param safe_tags: a set of tag names that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
245 :param safe_attrs: a set of attribute names that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
246 :param safe_schemes: a set of URI schemes that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
247 :param uri_attrs: a set of names of attributes that contain URIs
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
248 """
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
249 self.safe_tags = safe_tags
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
250 self.safe_attrs = safe_attrs
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
251 self.uri_attrs = uri_attrs
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
252 self.safe_schemes = safe_schemes
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
253
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
254 def __call__(self, stream, ctxt=None):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
255 """Apply the filter to the given stream.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
256
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
257 :param stream: the markup event stream to filter
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
258 :param ctxt: the template context (unused)
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
259 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
260 waiting_for = None
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
261
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
262 def _get_scheme(href):
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
263 if ':' not in href:
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
264 return None
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
265 chars = [char for char in href.split(':', 1)[0] if char.isalnum()]
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
266 return ''.join(chars).lower()
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
267
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
268 for kind, data, pos in stream:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
269 if kind is START:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
270 if waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
271 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
272 tag, attrs = data
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
273 if tag not in self.safe_tags:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
274 waiting_for = tag
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
275 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
276
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
277 new_attrs = []
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
278 for attr, value in attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
279 value = stripentities(value)
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
280 if attr not in self.safe_attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
281 continue
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
282 elif attr in self.uri_attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
283 # Don't allow URI schemes such as "javascript:"
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
284 if _get_scheme(value) not in self.safe_schemes:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
285 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
286 elif attr == 'style':
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
287 # Remove dangerous CSS declarations from inline styles
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
288 decls = []
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
289 value = self._replace_unicode_escapes(value)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
290 for decl in filter(None, value.split(';')):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
291 is_evil = False
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
292 if 'expression' in decl:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
293 is_evil = True
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
294 for m in re.finditer(r'url\s*\(([^)]+)', decl):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
295 if _get_scheme(m.group(1)) not in self.safe_schemes:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
296 is_evil = True
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
297 break
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
298 if not is_evil:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
299 decls.append(decl.strip())
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
300 if not decls:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
301 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
302 value = '; '.join(decls)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
303 new_attrs.append((attr, value))
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
304
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
305 yield kind, (tag, Attrs(new_attrs)), pos
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
306
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
307 elif kind is END:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
308 tag = data
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
309 if waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
310 if waiting_for == tag:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
311 waiting_for = None
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
312 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
313 yield kind, data, pos
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
314
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
315 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
316 if not waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
317 yield kind, data, pos
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
318
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
319 _NORMALIZE_NEWLINES = re.compile(r'\r\n').sub
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
320 _UNICODE_ESCAPE = re.compile(r'\\([0-9a-fA-F]{1,6})\s?').sub
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
321
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
322 def _replace_unicode_escapes(self, text):
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
323 def _repl(match):
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
324 return unichr(int(match.group(1), 16))
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
325 return self._UNICODE_ESCAPE(_repl, self._NORMALIZE_NEWLINES('\n', text))
Copyright (C) 2012-2017 Edgewall Software