annotate genshi/filters/html.py @ 950:981f3fc8c3ed trunk

Revert accidental small docstring change from r1174.
author hodgestar
date Fri, 02 Sep 2011 22:08:05 +0000
parents 8bc6f32fdd45
children
rev   line source
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
2 #
854
4d9bef447df9 More work on reducing the size of the diff produced by 2to3.
cmlenz
parents: 853
diff changeset
3 # Copyright (C) 2006-2009 Edgewall Software
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
5 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 182
diff changeset
8 # are also available at http://genshi.edgewall.org/wiki/License.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
9 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 182
diff changeset
12 # history and logs, available at http://genshi.edgewall.org/log/.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
13
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
14 """Implementation of a number of stream filters."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
15
856
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
16 try:
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
17 any
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
18 except NameError:
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
19 from genshi.util import any
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
20 import re
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
21
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
22 from genshi.core import Attrs, QName, stripentities
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
23 from genshi.core import END, START, TEXT, COMMENT
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
24
363
37e4b4bb0b53 Parse template includes at parse time to avoid some runtime overhead.
cmlenz
parents: 345
diff changeset
25 __all__ = ['HTMLFormFiller', 'HTMLSanitizer']
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
26 __docformat__ = 'restructuredtext en'
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
27
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
28
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
29 class HTMLFormFiller(object):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
30 """A stream filter that can populate HTML forms from a dictionary of values.
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
31
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
32 >>> from genshi.input import HTML
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
33 >>> html = HTML('''<form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
34 ... <p><input type="text" name="foo" /></p>
933
1e8c33345e52 Merge r1141 from py3k:
hodgestar
parents: 909
diff changeset
35 ... </form>''', encoding='utf-8')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
36 >>> filler = HTMLFormFiller(data={'foo': 'bar'})
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
37 >>> print(html | filler)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
38 <form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
39 <p><input type="text" name="foo" value="bar"/></p>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
40 </form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
41 """
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
42 # TODO: only select the first radio button, and the first select option
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
43 # (if not in a multiple-select)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
44 # TODO: only apply to elements in the XHTML namespace (or no namespace)?
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
45
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
46 def __init__(self, name=None, id=None, data=None, passwords=False):
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
47 """Create the filter.
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
48
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
49 :param name: The name of the form that should be populated. If this
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
50 parameter is given, only forms where the ``name`` attribute
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
51 value matches the parameter are processed.
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
52 :param id: The ID of the form that should be populated. If this
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
53 parameter is given, only forms where the ``id`` attribute
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
54 value matches the parameter are processed.
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
55 :param data: The dictionary of form values, where the keys are the names
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
56 of the form fields, and the values are the values to fill
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
57 in.
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
58 :param passwords: Whether password input fields should be populated.
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
59 This is off by default for security reasons (for
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
60 example, a password may end up in the browser cache)
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
61 :note: Changed in 0.5.2: added the `passwords` option
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
62 """
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
63 self.name = name
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
64 self.id = id
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
65 if data is None:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
66 data = {}
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
67 self.data = data
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
68 self.passwords = passwords
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
69
439
9f11c745fac9 Add support for adding custom template filters by passing a custom callback function to the `TemplateLoader`. Closes #89 (see added unit test).
cmlenz
parents: 431
diff changeset
70 def __call__(self, stream):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
71 """Apply the filter to the given stream.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
72
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
73 :param stream: the markup event stream to filter
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
74 """
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
75 in_form = in_select = in_option = in_textarea = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
76 select_value = option_value = textarea_value = None
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
77 option_start = None
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
78 option_text = []
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
79 no_option_value = False
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
80
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
81 for kind, data, pos in stream:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
82
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
83 if kind is START:
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
84 tag, attrs = data
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
85 tagname = tag.localname
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
86
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
87 if tagname == 'form' and (
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
88 self.name and attrs.get('name') == self.name or
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
89 self.id and attrs.get('id') == self.id or
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
90 not (self.id or self.name)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
91 in_form = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
92
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
93 elif in_form:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
94 if tagname == 'input':
844
1ae18bca8de4 Fix two instances of using None, which would cause an AttributeError.
jruigrok
parents: 841
diff changeset
95 type = attrs.get('type', '').lower()
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
96 if type in ('checkbox', 'radio'):
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
97 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
98 if name and name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
99 value = self.data[name]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
100 declval = attrs.get('value')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
101 checked = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
102 if isinstance(value, (list, tuple)):
909
19750e09fab1 Fix handling of checkboxes and radio buttons with an empty value attribute in `HTMLFormFiller`. Thanks to Benoit Hirbec for pointing out the problem and providing a patch.
cmlenz
parents: 908
diff changeset
103 if declval is not None:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
104 checked = declval in [unicode(v) for v
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
105 in value]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
106 else:
856
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
107 checked = any(value)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
108 else:
909
19750e09fab1 Fix handling of checkboxes and radio buttons with an empty value attribute in `HTMLFormFiller`. Thanks to Benoit Hirbec for pointing out the problem and providing a patch.
cmlenz
parents: 908
diff changeset
109 if declval is not None:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
110 checked = declval == unicode(value)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
111 elif type == 'checkbox':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
112 checked = bool(value)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
113 if checked:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
114 attrs |= [(QName('checked'), 'checked')]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
115 elif 'checked' in attrs:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
116 attrs -= 'checked'
844
1ae18bca8de4 Fix two instances of using None, which would cause an AttributeError.
jruigrok
parents: 841
diff changeset
117 elif type in ('', 'hidden', 'text') \
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
118 or type == 'password' and self.passwords:
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
119 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
120 if name and name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
121 value = self.data[name]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
122 if isinstance(value, (list, tuple)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
123 value = value[0]
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
124 if value is not None:
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
125 attrs |= [
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
126 (QName('value'), unicode(value))
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
127 ]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
128 elif tagname == 'select':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
129 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
130 if name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
131 select_value = self.data[name]
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
132 in_select = True
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
133 elif tagname == 'textarea':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
134 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
135 if name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
136 textarea_value = self.data.get(name)
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
137 if isinstance(textarea_value, (list, tuple)):
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
138 textarea_value = textarea_value[0]
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
139 in_textarea = True
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
140 elif in_select and tagname == 'option':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
141 option_start = kind, data, pos
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
142 option_value = attrs.get('value')
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
143 if option_value is None:
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
144 no_option_value = True
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
145 option_value = ''
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
146 in_option = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
147 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
148 yield kind, (tag, attrs), pos
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
149
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
150 elif in_form and kind is TEXT:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
151 if in_select and in_option:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
152 if no_option_value:
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
153 option_value += data
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
154 option_text.append((kind, data, pos))
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
155 continue
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
156 elif in_textarea:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
157 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
158 yield kind, data, pos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
159
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
160 elif in_form and kind is END:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
161 tagname = data.localname
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
162 if tagname == 'form':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
163 in_form = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
164 elif tagname == 'select':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
165 in_select = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
166 select_value = None
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
167 elif in_select and tagname == 'option':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
168 if isinstance(select_value, (tuple, list)):
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
169 selected = option_value in [unicode(v) for v
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
170 in select_value]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
171 else:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
172 selected = option_value == unicode(select_value)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
173 okind, (tag, attrs), opos = option_start
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
174 if selected:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
175 attrs |= [(QName('selected'), 'selected')]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
176 elif 'selected' in attrs:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
177 attrs -= 'selected'
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
178 yield okind, (tag, attrs), opos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
179 if option_text:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
180 for event in option_text:
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
181 yield event
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
182 in_option = False
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
183 no_option_value = False
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
184 option_start = option_value = None
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
185 option_text = []
908
07b2f61ed0c1 Fix for bug with the `HTMLFormFiller` in the handling of textareas. Thanks to Trevor Morgan for pointing this out on the mailing list.
cmlenz
parents: 856
diff changeset
186 elif in_textarea and tagname == 'textarea':
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
187 if textarea_value:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
188 yield TEXT, unicode(textarea_value), pos
908
07b2f61ed0c1 Fix for bug with the `HTMLFormFiller` in the handling of textareas. Thanks to Trevor Morgan for pointing this out on the mailing list.
cmlenz
parents: 856
diff changeset
189 textarea_value = None
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
190 in_textarea = False
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
191 yield kind, data, pos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
192
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
193 else:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
194 yield kind, data, pos
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
195
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
196
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
197 class HTMLSanitizer(object):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
198 """A filter that removes potentially dangerous HTML tags and attributes
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
199 from the stream.
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
200
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
201 >>> from genshi import HTML
933
1e8c33345e52 Merge r1141 from py3k:
hodgestar
parents: 909
diff changeset
202 >>> html = HTML('<div><script>alert(document.cookie)</script></div>', encoding='utf-8')
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
203 >>> print(html | HTMLSanitizer())
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
204 <div/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
205
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
206 The default set of safe tags and attributes can be modified when the filter
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
207 is instantiated. For example, to allow inline ``style`` attributes, the
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
208 following instantation would work:
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
209
933
1e8c33345e52 Merge r1141 from py3k:
hodgestar
parents: 909
diff changeset
210 >>> html = HTML('<div style="background: #000"></div>', encoding='utf-8')
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
211 >>> sanitizer = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style']))
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
212 >>> print(html | sanitizer)
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
213 <div style="background: #000"/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
214
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
215 Note that even in this case, the filter *does* attempt to remove dangerous
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
216 constructs from style attributes:
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
217
933
1e8c33345e52 Merge r1141 from py3k:
hodgestar
parents: 909
diff changeset
218 >>> html = HTML('<div style="background: url(javascript:void); color: #000"></div>', encoding='utf-8')
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
219 >>> print(html | sanitizer)
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
220 <div style="color: #000"/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
221
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
222 This handles HTML entities, unicode escapes in CSS and Javascript text, as
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
223 well as a lot of other things. However, the style tag is still excluded by
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
224 default because it is very hard for such sanitizing to be completely safe,
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
225 especially considering how much error recovery current web browsers perform.
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
226
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
227 It also does some basic filtering of CSS properties that may be used for
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
228 typical phishing attacks. For more sophisticated filtering, this class
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
229 provides a couple of hooks that can be overridden in sub-classes.
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
230
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
231 :warn: Note that this special processing of CSS is currently only applied to
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
232 style attributes, **not** style elements.
950
981f3fc8c3ed Revert accidental small docstring change from r1174.
hodgestar
parents: 949
diff changeset
233 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
234
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
235 SAFE_TAGS = frozenset(['a', 'abbr', 'acronym', 'address', 'area', 'b',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
236 'big', 'blockquote', 'br', 'button', 'caption', 'center', 'cite',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
237 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
238 'em', 'fieldset', 'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
239 'hr', 'i', 'img', 'input', 'ins', 'kbd', 'label', 'legend', 'li', 'map',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
240 'menu', 'ol', 'optgroup', 'option', 'p', 'pre', 'q', 's', 'samp',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
241 'select', 'small', 'span', 'strike', 'strong', 'sub', 'sup', 'table',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
242 'tbody', 'td', 'textarea', 'tfoot', 'th', 'thead', 'tr', 'tt', 'u',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
243 'ul', 'var'])
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
244
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
245 SAFE_ATTRS = frozenset(['abbr', 'accept', 'accept-charset', 'accesskey',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
246 'action', 'align', 'alt', 'axis', 'bgcolor', 'border', 'cellpadding',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
247 'cellspacing', 'char', 'charoff', 'charset', 'checked', 'cite', 'class',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
248 'clear', 'cols', 'colspan', 'color', 'compact', 'coords', 'datetime',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
249 'dir', 'disabled', 'enctype', 'for', 'frame', 'headers', 'height',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
250 'href', 'hreflang', 'hspace', 'id', 'ismap', 'label', 'lang',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
251 'longdesc', 'maxlength', 'media', 'method', 'multiple', 'name',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
252 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly', 'rel', 'rev',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
253 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
254 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title',
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
255 'type', 'usemap', 'valign', 'value', 'vspace', 'width'])
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
256
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
257 SAFE_CSS = frozenset([
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
258 # CSS 3 properties <http://www.w3.org/TR/CSS/#properties>
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
259 'background', 'background-attachment', 'background-color',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
260 'background-image', 'background-position', 'background-repeat',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
261 'border', 'border-bottom', 'border-bottom-color',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
262 'border-bottom-style', 'border-bottom-width', 'border-collapse',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
263 'border-color', 'border-left', 'border-left-color',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
264 'border-left-style', 'border-left-width', 'border-right',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
265 'border-right-color', 'border-right-style', 'border-right-width',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
266 'border-spacing', 'border-style', 'border-top', 'border-top-color',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
267 'border-top-style', 'border-top-width', 'border-width', 'bottom',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
268 'caption-side', 'clear', 'clip', 'color', 'content',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
269 'counter-increment', 'counter-reset', 'cursor', 'direction', 'display',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
270 'empty-cells', 'float', 'font', 'font-family', 'font-size',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
271 'font-style', 'font-variant', 'font-weight', 'height', 'left',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
272 'letter-spacing', 'line-height', 'list-style', 'list-style-image',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
273 'list-style-position', 'list-style-type', 'margin', 'margin-bottom',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
274 'margin-left', 'margin-right', 'margin-top', 'max-height', 'max-width',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
275 'min-height', 'min-width', 'opacity', 'orphans', 'outline',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
276 'outline-color', 'outline-style', 'outline-width', 'overflow',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
277 'padding', 'padding-bottom', 'padding-left', 'padding-right',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
278 'padding-top', 'page-break-after', 'page-break-before',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
279 'page-break-inside', 'quotes', 'right', 'table-layout',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
280 'text-align', 'text-decoration', 'text-indent', 'text-transform',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
281 'top', 'unicode-bidi', 'vertical-align', 'visibility', 'white-space',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
282 'widows', 'width', 'word-spacing', 'z-index',
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
283 ])
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
284
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
285 SAFE_SCHEMES = frozenset(['file', 'ftp', 'http', 'https', 'mailto', None])
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
286
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
287 URI_ATTRS = frozenset(['action', 'background', 'dynsrc', 'href', 'lowsrc',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
288 'src'])
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
289
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
290 def __init__(self, safe_tags=SAFE_TAGS, safe_attrs=SAFE_ATTRS,
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
291 safe_schemes=SAFE_SCHEMES, uri_attrs=URI_ATTRS,
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
292 safe_css=SAFE_CSS):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
293 """Create the sanitizer.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
294
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
295 The exact set of allowed elements and attributes can be configured.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
296
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
297 :param safe_tags: a set of tag names that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
298 :param safe_attrs: a set of attribute names that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
299 :param safe_schemes: a set of URI schemes that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
300 :param uri_attrs: a set of names of attributes that contain URIs
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
301 """
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
302 self.safe_tags = safe_tags
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
303 # The set of tag names that are considered safe.
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
304 self.safe_attrs = safe_attrs
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
305 # The set of attribute names that are considered safe.
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
306 self.safe_css = safe_css
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
307 # The set of CSS properties that are considered safe.
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
308 self.uri_attrs = uri_attrs
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
309 # The set of names of attributes that may contain URIs.
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
310 self.safe_schemes = safe_schemes
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
311 # The set of URI schemes that are considered safe.
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
312
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
313 # IE6 <http://heideri.ch/jso/#80>
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
314 _EXPRESSION_SEARCH = re.compile(u"""
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
315 [eE
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
316 \uFF25 # FULLWIDTH LATIN CAPITAL LETTER E
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
317 \uFF45 # FULLWIDTH LATIN SMALL LETTER E
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
318 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
319 [xX
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
320 \uFF38 # FULLWIDTH LATIN CAPITAL LETTER X
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
321 \uFF58 # FULLWIDTH LATIN SMALL LETTER X
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
322 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
323 [pP
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
324 \uFF30 # FULLWIDTH LATIN CAPITAL LETTER P
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
325 \uFF50 # FULLWIDTH LATIN SMALL LETTER P
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
326 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
327 [rR
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
328 \u0280 # LATIN LETTER SMALL CAPITAL R
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
329 \uFF32 # FULLWIDTH LATIN CAPITAL LETTER R
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
330 \uFF52 # FULLWIDTH LATIN SMALL LETTER R
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
331 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
332 [eE
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
333 \uFF25 # FULLWIDTH LATIN CAPITAL LETTER E
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
334 \uFF45 # FULLWIDTH LATIN SMALL LETTER E
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
335 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
336 [sS
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
337 \uFF33 # FULLWIDTH LATIN CAPITAL LETTER S
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
338 \uFF53 # FULLWIDTH LATIN SMALL LETTER S
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
339 ]{2}
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
340 [iI
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
341 \u026A # LATIN LETTER SMALL CAPITAL I
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
342 \uFF29 # FULLWIDTH LATIN CAPITAL LETTER I
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
343 \uFF49 # FULLWIDTH LATIN SMALL LETTER I
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
344 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
345 [oO
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
346 \uFF2F # FULLWIDTH LATIN CAPITAL LETTER O
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
347 \uFF4F # FULLWIDTH LATIN SMALL LETTER O
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
348 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
349 [nN
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
350 \u0274 # LATIN LETTER SMALL CAPITAL N
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
351 \uFF2E # FULLWIDTH LATIN CAPITAL LETTER N
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
352 \uFF4E # FULLWIDTH LATIN SMALL LETTER N
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
353 ]
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
354 """, re.VERBOSE).search
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
355
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
356 # IE6 <http://openmya.hacker.jp/hasegawa/security/expression.txt>
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
357 # 7) Particular bit of Unicode characters
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
358 _URL_FINDITER = re.compile(
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
359 u'[Uu][Rr\u0280][Ll\u029F]\s*\(([^)]+)').finditer
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
360
439
9f11c745fac9 Add support for adding custom template filters by passing a custom callback function to the `TemplateLoader`. Closes #89 (see added unit test).
cmlenz
parents: 431
diff changeset
361 def __call__(self, stream):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
362 """Apply the filter to the given stream.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
363
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
364 :param stream: the markup event stream to filter
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
365 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
366 waiting_for = None
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
367
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
368 for kind, data, pos in stream:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
369 if kind is START:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
370 if waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
371 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
372 tag, attrs = data
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
373 if not self.is_safe_elem(tag, attrs):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
374 waiting_for = tag
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
375 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
376
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
377 new_attrs = []
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
378 for attr, value in attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
379 value = stripentities(value)
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
380 if attr not in self.safe_attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
381 continue
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
382 elif attr in self.uri_attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
383 # Don't allow URI schemes such as "javascript:"
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
384 if not self.is_safe_uri(value):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
385 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
386 elif attr == 'style':
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
387 # Remove dangerous CSS declarations from inline styles
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
388 decls = self.sanitize_css(value)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
389 if not decls:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
390 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
391 value = '; '.join(decls)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
392 new_attrs.append((attr, value))
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
393
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
394 yield kind, (tag, Attrs(new_attrs)), pos
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
395
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
396 elif kind is END:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
397 tag = data
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
398 if waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
399 if waiting_for == tag:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
400 waiting_for = None
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
401 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
402 yield kind, data, pos
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
403
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
404 elif kind is not COMMENT:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
405 if not waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
406 yield kind, data, pos
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
407
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
408 def is_safe_css(self, propname, value):
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
409 """Determine whether the given css property declaration is to be
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
410 considered safe for inclusion in the output.
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
411
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
412 :param propname: the CSS property name
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
413 :param value: the value of the property
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
414 :return: whether the property value should be considered safe
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
415 :rtype: bool
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
416 :since: version 0.6
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
417 """
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
418 if propname not in self.safe_css:
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
419 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
420 if propname.startswith('margin') and '-' in value:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
421 # Negative margins can be used for phishing
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
422 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
423 return True
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
424
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
425 def is_safe_elem(self, tag, attrs):
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
426 """Determine whether the given element should be considered safe for
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
427 inclusion in the output.
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
428
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
429 :param tag: the tag name of the element
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
430 :type tag: QName
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
431 :param attrs: the element attributes
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
432 :type attrs: Attrs
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
433 :return: whether the element should be considered safe
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
434 :rtype: bool
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
435 :since: version 0.6
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
436 """
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
437 if tag not in self.safe_tags:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
438 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
439 if tag.localname == 'input':
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
440 input_type = attrs.get('type', '').lower()
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
441 if input_type == 'password':
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
442 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
443 return True
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
444
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
445 def is_safe_uri(self, uri):
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
446 """Determine whether the given URI is to be considered safe for
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
447 inclusion in the output.
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
448
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
449 The default implementation checks whether the scheme of the URI is in
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
450 the set of allowed URIs (`safe_schemes`).
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
451
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
452 >>> sanitizer = HTMLSanitizer()
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
453 >>> sanitizer.is_safe_uri('http://example.org/')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
454 True
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
455 >>> sanitizer.is_safe_uri('javascript:alert(document.cookie)')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
456 False
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
457
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
458 :param uri: the URI to check
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
459 :return: `True` if the URI can be considered safe, `False` otherwise
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
460 :rtype: `bool`
576
b00765a115a5 Improve docs on `Stream.select()` for #135.
cmlenz
parents: 571
diff changeset
461 :since: version 0.4.3
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
462 """
837
adfacd5ed02c Fix for #274.
cmlenz
parents: 750
diff changeset
463 if '#' in uri:
adfacd5ed02c Fix for #274.
cmlenz
parents: 750
diff changeset
464 uri = uri.split('#', 1)[0] # Strip out the fragment identifier
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
465 if ':' not in uri:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
466 return True # This is a relative URI
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
467 chars = [char for char in uri.split(':', 1)[0] if char.isalnum()]
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
468 return ''.join(chars).lower() in self.safe_schemes
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
469
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
470 def sanitize_css(self, text):
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
471 """Remove potentially dangerous property declarations from CSS code.
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
472
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
473 In particular, properties using the CSS ``url()`` function with a scheme
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
474 that is not considered safe are removed:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
475
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
476 >>> sanitizer = HTMLSanitizer()
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
477 >>> sanitizer.sanitize_css(u'''
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
478 ... background: url(javascript:alert("foo"));
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
479 ... color: #000;
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
480 ... ''')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
481 [u'color: #000']
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
482
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
483 Also, the proprietary Internet Explorer function ``expression()`` is
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
484 always stripped:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
485
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
486 >>> sanitizer.sanitize_css(u'''
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
487 ... background: #fff;
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
488 ... color: #000;
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
489 ... width: e/**/xpression(alert("foo"));
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
490 ... ''')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
491 [u'background: #fff', u'color: #000']
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
492
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
493 :param text: the CSS text; this is expected to be `unicode` and to not
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
494 contain any character or numeric references
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
495 :return: a list of declarations that are considered safe
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
496 :rtype: `list`
576
b00765a115a5 Improve docs on `Stream.select()` for #135.
cmlenz
parents: 571
diff changeset
497 :since: version 0.4.3
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
498 """
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
499 decls = []
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
500 text = self._strip_css_comments(self._replace_unicode_escapes(text))
856
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
501 for decl in text.split(';'):
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
502 decl = decl.strip()
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
503 if not decl:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
504 continue
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
505 try:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
506 propname, value = decl.split(':', 1)
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
507 except ValueError:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
508 continue
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
509 if not self.is_safe_css(propname.strip().lower(), value.strip()):
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
510 continue
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
511 is_evil = False
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
512 if self._EXPRESSION_SEARCH(value):
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
513 is_evil = True
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
514 for match in self._URL_FINDITER(value):
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
515 if not self.is_safe_uri(match.group(1)):
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
516 is_evil = True
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
517 break
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
518 if not is_evil:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
519 decls.append(decl.strip())
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
520 return decls
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
521
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
522 _NORMALIZE_NEWLINES = re.compile(r'\r\n').sub
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
523 _UNICODE_ESCAPE = re.compile(
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
524 r"""\\([0-9a-fA-F]{1,6})\s?|\\([^\r\n\f0-9a-fA-F'"{};:()#*])""",
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
525 re.UNICODE).sub
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
526
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
527 def _replace_unicode_escapes(self, text):
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
528 def _repl(match):
949
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
529 t = match.group(1)
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
530 if t:
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
531 return unichr(int(t, 16))
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
532 t = match.group(2)
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
533 if t == '\\':
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
534 return r'\\'
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
535 else:
8bc6f32fdd45 Improve sanitizing of CSS in style attributes (note that the Genshi documentation already warns users that enabling the style attribute is dangerous -- now it is slightly less dangerous). Fixes #455. Patch taken from jomae's Trac commit trac:r10788 and modified for Genshi -- thanks!
hodgestar
parents: 933
diff changeset
536 return t
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
537 return self._UNICODE_ESCAPE(_repl, self._NORMALIZE_NEWLINES('\n', text))
556
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
538
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
539 _CSS_COMMENTS = re.compile(r'/\*.*?\*/').sub
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
540
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
541 def _strip_css_comments(self, text):
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
542 return self._CSS_COMMENTS('', text)
Copyright (C) 2012-2017 Edgewall Software