annotate genshi/filters/html.py @ 951:40415173f513 stable-0.6.x

Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
author hodgestar
date Fri, 02 Sep 2011 22:10:58 +0000
parents 21308bd343b8
children
rev   line source
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
2 #
854
4d9bef447df9 More work on reducing the size of the diff produced by 2to3.
cmlenz
parents: 853
diff changeset
3 # Copyright (C) 2006-2009 Edgewall Software
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
4 # All rights reserved.
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
5 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 182
diff changeset
8 # are also available at http://genshi.edgewall.org/wiki/License.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
9 #
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
230
84168828b074 Renamed Markup to Genshi in repository.
cmlenz
parents: 182
diff changeset
12 # history and logs, available at http://genshi.edgewall.org/log/.
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
13
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
14 """Implementation of a number of stream filters."""
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
15
856
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
16 try:
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
17 any
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
18 except NameError:
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
19 from genshi.util import any
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
20 import re
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
21
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
22 from genshi.core import Attrs, QName, stripentities
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
23 from genshi.core import END, START, TEXT, COMMENT
1
5479aae32f5a Initial import.
cmlenz
parents:
diff changeset
24
363
37e4b4bb0b53 Parse template includes at parse time to avoid some runtime overhead.
cmlenz
parents: 345
diff changeset
25 __all__ = ['HTMLFormFiller', 'HTMLSanitizer']
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
26 __docformat__ = 'restructuredtext en'
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
27
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
28
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
29 class HTMLFormFiller(object):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
30 """A stream filter that can populate HTML forms from a dictionary of values.
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
31
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
32 >>> from genshi.input import HTML
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
33 >>> html = HTML('''<form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
34 ... <p><input type="text" name="foo" /></p>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
35 ... </form>''')
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
36 >>> filler = HTMLFormFiller(data={'foo': 'bar'})
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
37 >>> print(html | filler)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
38 <form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
39 <p><input type="text" name="foo" value="bar"/></p>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
40 </form>
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
41 """
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
42 # TODO: only select the first radio button, and the first select option
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
43 # (if not in a multiple-select)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
44 # TODO: only apply to elements in the XHTML namespace (or no namespace)?
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
45
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
46 def __init__(self, name=None, id=None, data=None, passwords=False):
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
47 """Create the filter.
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
48
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
49 :param name: The name of the form that should be populated. If this
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
50 parameter is given, only forms where the ``name`` attribute
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
51 value matches the parameter are processed.
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
52 :param id: The ID of the form that should be populated. If this
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
53 parameter is given, only forms where the ``id`` attribute
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
54 value matches the parameter are processed.
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
55 :param data: The dictionary of form values, where the keys are the names
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
56 of the form fields, and the values are the values to fill
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
57 in.
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
58 :param passwords: Whether password input fields should be populated.
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
59 This is off by default for security reasons (for
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
60 example, a password may end up in the browser cache)
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
61 :note: Changed in 0.5.2: added the `passwords` option
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
62 """
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
63 self.name = name
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
64 self.id = id
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
65 if data is None:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
66 data = {}
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
67 self.data = data
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
68 self.passwords = passwords
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
69
439
9f11c745fac9 Add support for adding custom template filters by passing a custom callback function to the `TemplateLoader`. Closes #89 (see added unit test).
cmlenz
parents: 431
diff changeset
70 def __call__(self, stream):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
71 """Apply the filter to the given stream.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
72
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
73 :param stream: the markup event stream to filter
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
74 """
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
75 in_form = in_select = in_option = in_textarea = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
76 select_value = option_value = textarea_value = None
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
77 option_start = None
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
78 option_text = []
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
79 no_option_value = False
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
80
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
81 for kind, data, pos in stream:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
82
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
83 if kind is START:
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
84 tag, attrs = data
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
85 tagname = tag.localname
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
86
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
87 if tagname == 'form' and (
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
88 self.name and attrs.get('name') == self.name or
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
89 self.id and attrs.get('id') == self.id or
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
90 not (self.id or self.name)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
91 in_form = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
92
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
93 elif in_form:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
94 if tagname == 'input':
844
1ae18bca8de4 Fix two instances of using None, which would cause an AttributeError.
jruigrok
parents: 841
diff changeset
95 type = attrs.get('type', '').lower()
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
96 if type in ('checkbox', 'radio'):
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
97 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
98 if name and name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
99 value = self.data[name]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
100 declval = attrs.get('value')
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
101 checked = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
102 if isinstance(value, (list, tuple)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
103 if declval:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
104 checked = declval in [unicode(v) for v
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
105 in value]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
106 else:
856
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
107 checked = any(value)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
108 else:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
109 if declval:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
110 checked = declval == unicode(value)
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
111 elif type == 'checkbox':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
112 checked = bool(value)
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
113 if checked:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
114 attrs |= [(QName('checked'), 'checked')]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
115 elif 'checked' in attrs:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
116 attrs -= 'checked'
844
1ae18bca8de4 Fix two instances of using None, which would cause an AttributeError.
jruigrok
parents: 841
diff changeset
117 elif type in ('', 'hidden', 'text') \
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
118 or type == 'password' and self.passwords:
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
119 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
120 if name and name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
121 value = self.data[name]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
122 if isinstance(value, (list, tuple)):
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
123 value = value[0]
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
124 if value is not None:
841
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
125 attrs |= [
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
126 (QName('value'), unicode(value))
86b5cee4eb6c Added an option to the `HTMLFiller` to also populate password fields.
cmlenz
parents: 840
diff changeset
127 ]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
128 elif tagname == 'select':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
129 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
130 if name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
131 select_value = self.data[name]
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
132 in_select = True
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
133 elif tagname == 'textarea':
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
134 name = attrs.get('name')
471
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
135 if name in self.data:
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
136 textarea_value = self.data.get(name)
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
137 if isinstance(textarea_value, (list, tuple)):
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
138 textarea_value = textarea_value[0]
76a0ec32835d The `HTMLFormFiller` stream filter no longer alters form elements for which the data element contains no corresponding item.
cmlenz
parents: 446
diff changeset
139 in_textarea = True
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
140 elif in_select and tagname == 'option':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
141 option_start = kind, data, pos
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
142 option_value = attrs.get('value')
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
143 if option_value is None:
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
144 no_option_value = True
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
145 option_value = ''
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
146 in_option = True
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
147 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
148 yield kind, (tag, attrs), pos
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
149
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
150 elif in_form and kind is TEXT:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
151 if in_select and in_option:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
152 if no_option_value:
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
153 option_value += data
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
154 option_text.append((kind, data, pos))
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
155 continue
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
156 elif in_textarea:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
157 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
158 yield kind, data, pos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
159
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
160 elif in_form and kind is END:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
161 tagname = data.localname
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
162 if tagname == 'form':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
163 in_form = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
164 elif tagname == 'select':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
165 in_select = False
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
166 select_value = None
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
167 elif in_select and tagname == 'option':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
168 if isinstance(select_value, (tuple, list)):
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
169 selected = option_value in [unicode(v) for v
415
b9f9a22484f0 `HTMLFormFiller` now correctly deals with non-string values in the data dictionary for select/checkbox/radio controls.
cmlenz
parents: 408
diff changeset
170 in select_value]
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
171 else:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
172 selected = option_value == unicode(select_value)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
173 okind, (tag, attrs), opos = option_start
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
174 if selected:
403
228907abb726 Remove some magic/overhead from `Attrs` creation and manipulation by not automatically wrapping attribute names in `QName`.
cmlenz
parents: 363
diff changeset
175 attrs |= [(QName('selected'), 'selected')]
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
176 elif 'selected' in attrs:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
177 attrs -= 'selected'
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
178 yield okind, (tag, attrs), opos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
179 if option_text:
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
180 for event in option_text:
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
181 yield event
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
182 in_option = False
584
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
183 no_option_value = False
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
184 option_start = option_value = None
94f719af686d Fixed a few cases where HTMLFormFiller didn't work well with option elements:
jonas
parents: 576
diff changeset
185 option_text = []
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
186 elif tagname == 'textarea':
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
187 if textarea_value:
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
188 yield TEXT, unicode(textarea_value), pos
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
189 in_textarea = False
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
190 yield kind, data, pos
275
d91cbdeb75e9 Integrated `HTMLFormFiller` filter initially presented as a [wiki:FormFilling#Usingatemplatefilter recipe].
cmlenz
parents: 230
diff changeset
191
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
192 else:
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
193 yield kind, data, pos
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
194
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
195
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
196 class HTMLSanitizer(object):
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
197 """A filter that removes potentially dangerous HTML tags and attributes
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
198 from the stream.
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
199
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
200 >>> from genshi import HTML
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
201 >>> html = HTML('<div><script>alert(document.cookie)</script></div>')
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
202 >>> print(html | HTMLSanitizer())
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
203 <div/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
204
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
205 The default set of safe tags and attributes can be modified when the filter
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
206 is instantiated. For example, to allow inline ``style`` attributes, the
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
207 following instantation would work:
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
208
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
209 >>> html = HTML('<div style="background: #000"></div>')
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
210 >>> sanitizer = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style']))
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
211 >>> print(html | sanitizer)
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
212 <div style="background: #000"/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
213
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
214 Note that even in this case, the filter *does* attempt to remove dangerous
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
215 constructs from style attributes:
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
216
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
217 >>> html = HTML('<div style="background: url(javascript:void); color: #000"></div>')
853
f33ecf3c319e Convert a bunch of print statements to py3k compatible syntax.
cmlenz
parents: 844
diff changeset
218 >>> print(html | sanitizer)
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
219 <div style="color: #000"/>
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
220
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
221 This handles HTML entities, unicode escapes in CSS and Javascript text, as
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
222 well as a lot of other things. However, the style tag is still excluded by
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
223 default because it is very hard for such sanitizing to be completely safe,
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
224 especially considering how much error recovery current web browsers perform.
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
225
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
226 It also does some basic filtering of CSS properties that may be used for
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
227 typical phishing attacks. For more sophisticated filtering, this class
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
228 provides a couple of hooks that can be overridden in sub-classes.
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
229
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
230 :warn: Note that this special processing of CSS is currently only applied to
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
231 style attributes, **not** style elements.
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
232 """
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
233
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
234 SAFE_TAGS = frozenset(['a', 'abbr', 'acronym', 'address', 'area', 'b',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
235 'big', 'blockquote', 'br', 'button', 'caption', 'center', 'cite',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
236 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
237 'em', 'fieldset', 'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
238 'hr', 'i', 'img', 'input', 'ins', 'kbd', 'label', 'legend', 'li', 'map',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
239 'menu', 'ol', 'optgroup', 'option', 'p', 'pre', 'q', 's', 'samp',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
240 'select', 'small', 'span', 'strike', 'strong', 'sub', 'sup', 'table',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
241 'tbody', 'td', 'textarea', 'tfoot', 'th', 'thead', 'tr', 'tt', 'u',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
242 'ul', 'var'])
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
243
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
244 SAFE_ATTRS = frozenset(['abbr', 'accept', 'accept-charset', 'accesskey',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
245 'action', 'align', 'alt', 'axis', 'bgcolor', 'border', 'cellpadding',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
246 'cellspacing', 'char', 'charoff', 'charset', 'checked', 'cite', 'class',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
247 'clear', 'cols', 'colspan', 'color', 'compact', 'coords', 'datetime',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
248 'dir', 'disabled', 'enctype', 'for', 'frame', 'headers', 'height',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
249 'href', 'hreflang', 'hspace', 'id', 'ismap', 'label', 'lang',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
250 'longdesc', 'maxlength', 'media', 'method', 'multiple', 'name',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
251 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly', 'rel', 'rev',
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
252 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
253 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title',
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
254 'type', 'usemap', 'valign', 'value', 'vspace', 'width'])
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
255
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
256 SAFE_CSS = frozenset([
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
257 # CSS 3 properties <http://www.w3.org/TR/CSS/#properties>
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
258 'background', 'background-attachment', 'background-color',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
259 'background-image', 'background-position', 'background-repeat',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
260 'border', 'border-bottom', 'border-bottom-color',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
261 'border-bottom-style', 'border-bottom-width', 'border-collapse',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
262 'border-color', 'border-left', 'border-left-color',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
263 'border-left-style', 'border-left-width', 'border-right',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
264 'border-right-color', 'border-right-style', 'border-right-width',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
265 'border-spacing', 'border-style', 'border-top', 'border-top-color',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
266 'border-top-style', 'border-top-width', 'border-width', 'bottom',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
267 'caption-side', 'clear', 'clip', 'color', 'content',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
268 'counter-increment', 'counter-reset', 'cursor', 'direction', 'display',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
269 'empty-cells', 'float', 'font', 'font-family', 'font-size',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
270 'font-style', 'font-variant', 'font-weight', 'height', 'left',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
271 'letter-spacing', 'line-height', 'list-style', 'list-style-image',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
272 'list-style-position', 'list-style-type', 'margin', 'margin-bottom',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
273 'margin-left', 'margin-right', 'margin-top', 'max-height', 'max-width',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
274 'min-height', 'min-width', 'opacity', 'orphans', 'outline',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
275 'outline-color', 'outline-style', 'outline-width', 'overflow',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
276 'padding', 'padding-bottom', 'padding-left', 'padding-right',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
277 'padding-top', 'page-break-after', 'page-break-before',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
278 'page-break-inside', 'quotes', 'right', 'table-layout',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
279 'text-align', 'text-decoration', 'text-indent', 'text-transform',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
280 'top', 'unicode-bidi', 'vertical-align', 'visibility', 'white-space',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
281 'widows', 'width', 'word-spacing', 'z-index',
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
282 ])
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
283
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
284 SAFE_SCHEMES = frozenset(['file', 'ftp', 'http', 'https', 'mailto', None])
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
285
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
286 URI_ATTRS = frozenset(['action', 'background', 'dynsrc', 'href', 'lowsrc',
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
287 'src'])
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
288
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
289 def __init__(self, safe_tags=SAFE_TAGS, safe_attrs=SAFE_ATTRS,
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
290 safe_schemes=SAFE_SCHEMES, uri_attrs=URI_ATTRS,
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
291 safe_css=SAFE_CSS):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
292 """Create the sanitizer.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
293
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
294 The exact set of allowed elements and attributes can be configured.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
295
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
296 :param safe_tags: a set of tag names that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
297 :param safe_attrs: a set of attribute names that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
298 :param safe_schemes: a set of URI schemes that are considered safe
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
299 :param uri_attrs: a set of names of attributes that contain URIs
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
300 """
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
301 self.safe_tags = safe_tags
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
302 # The set of tag names that are considered safe.
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
303 self.safe_attrs = safe_attrs
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
304 # The set of attribute names that are considered safe.
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
305 self.safe_css = safe_css
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
306 # The set of CSS properties that are considered safe.
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
307 self.uri_attrs = uri_attrs
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
308 # The set of names of attributes that may contain URIs.
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
309 self.safe_schemes = safe_schemes
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
310 # The set of URI schemes that are considered safe.
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
311
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
312 # IE6 <http://heideri.ch/jso/#80>
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
313 _EXPRESSION_SEARCH = re.compile(u"""
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
314 [eE
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
315 \uFF25 # FULLWIDTH LATIN CAPITAL LETTER E
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
316 \uFF45 # FULLWIDTH LATIN SMALL LETTER E
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
317 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
318 [xX
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
319 \uFF38 # FULLWIDTH LATIN CAPITAL LETTER X
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
320 \uFF58 # FULLWIDTH LATIN SMALL LETTER X
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
321 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
322 [pP
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
323 \uFF30 # FULLWIDTH LATIN CAPITAL LETTER P
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
324 \uFF50 # FULLWIDTH LATIN SMALL LETTER P
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
325 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
326 [rR
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
327 \u0280 # LATIN LETTER SMALL CAPITAL R
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
328 \uFF32 # FULLWIDTH LATIN CAPITAL LETTER R
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
329 \uFF52 # FULLWIDTH LATIN SMALL LETTER R
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
330 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
331 [eE
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
332 \uFF25 # FULLWIDTH LATIN CAPITAL LETTER E
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
333 \uFF45 # FULLWIDTH LATIN SMALL LETTER E
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
334 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
335 [sS
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
336 \uFF33 # FULLWIDTH LATIN CAPITAL LETTER S
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
337 \uFF53 # FULLWIDTH LATIN SMALL LETTER S
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
338 ]{2}
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
339 [iI
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
340 \u026A # LATIN LETTER SMALL CAPITAL I
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
341 \uFF29 # FULLWIDTH LATIN CAPITAL LETTER I
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
342 \uFF49 # FULLWIDTH LATIN SMALL LETTER I
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
343 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
344 [oO
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
345 \uFF2F # FULLWIDTH LATIN CAPITAL LETTER O
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
346 \uFF4F # FULLWIDTH LATIN SMALL LETTER O
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
347 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
348 [nN
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
349 \u0274 # LATIN LETTER SMALL CAPITAL N
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
350 \uFF2E # FULLWIDTH LATIN CAPITAL LETTER N
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
351 \uFF4E # FULLWIDTH LATIN SMALL LETTER N
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
352 ]
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
353 """, re.VERBOSE).search
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
354
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
355 # IE6 <http://openmya.hacker.jp/hasegawa/security/expression.txt>
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
356 # 7) Particular bit of Unicode characters
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
357 _URL_FINDITER = re.compile(
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
358 u'[Uu][Rr\u0280][Ll\u029F]\s*\(([^)]+)').finditer
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
359
439
9f11c745fac9 Add support for adding custom template filters by passing a custom callback function to the `TemplateLoader`. Closes #89 (see added unit test).
cmlenz
parents: 431
diff changeset
360 def __call__(self, stream):
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
361 """Apply the filter to the given stream.
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
362
425
073640758a42 Try to use proper reStructuredText for docstrings throughout.
cmlenz
parents: 415
diff changeset
363 :param stream: the markup event stream to filter
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
364 """
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
365 waiting_for = None
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
366
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
367 for kind, data, pos in stream:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
368 if kind is START:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
369 if waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
370 continue
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
371 tag, attrs = data
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
372 if not self.is_safe_elem(tag, attrs):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
373 waiting_for = tag
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
374 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
375
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
376 new_attrs = []
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
377 for attr, value in attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
378 value = stripentities(value)
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
379 if attr not in self.safe_attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
380 continue
277
7e30bfa966ab The `HTMLSanitizer` now lets you override the default set of tag and attribute names that are considered safe.
cmlenz
parents: 275
diff changeset
381 elif attr in self.uri_attrs:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
382 # Don't allow URI schemes such as "javascript:"
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
383 if not self.is_safe_uri(value):
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
384 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
385 elif attr == 'style':
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
386 # Remove dangerous CSS declarations from inline styles
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
387 decls = self.sanitize_css(value)
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
388 if not decls:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
389 continue
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
390 value = '; '.join(decls)
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
391 new_attrs.append((attr, value))
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
392
345
2aa7ca37ae6a Make `Attrs` instances immutable.
cmlenz
parents: 305
diff changeset
393 yield kind, (tag, Attrs(new_attrs)), pos
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
394
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
395 elif kind is END:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
396 tag = data
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
397 if waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
398 if waiting_for == tag:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
399 waiting_for = None
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
400 else:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
401 yield kind, data, pos
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
402
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
403 elif kind is not COMMENT:
123
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
404 if not waiting_for:
10279d2eeec9 Fix for #18: whitespace in space-sensitive elements such as `<pre>` and `<textarea>` is now preserved.
cmlenz
parents: 113
diff changeset
405 yield kind, data, pos
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
406
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
407 def is_safe_css(self, propname, value):
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
408 """Determine whether the given css property declaration is to be
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
409 considered safe for inclusion in the output.
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
410
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
411 :param propname: the CSS property name
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
412 :param value: the value of the property
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
413 :return: whether the property value should be considered safe
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
414 :rtype: bool
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
415 :since: version 0.6
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
416 """
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
417 if propname not in self.safe_css:
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
418 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
419 if propname.startswith('margin') and '-' in value:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
420 # Negative margins can be used for phishing
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
421 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
422 return True
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
423
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
424 def is_safe_elem(self, tag, attrs):
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
425 """Determine whether the given element should be considered safe for
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
426 inclusion in the output.
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
427
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
428 :param tag: the tag name of the element
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
429 :type tag: QName
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
430 :param attrs: the element attributes
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
431 :type attrs: Attrs
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
432 :return: whether the element should be considered safe
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
433 :rtype: bool
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
434 :since: version 0.6
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
435 """
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
436 if tag not in self.safe_tags:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
437 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
438 if tag.localname == 'input':
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
439 input_type = attrs.get('type', '').lower()
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
440 if input_type == 'password':
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
441 return False
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
442 return True
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
443
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
444 def is_safe_uri(self, uri):
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
445 """Determine whether the given URI is to be considered safe for
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
446 inclusion in the output.
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
447
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
448 The default implementation checks whether the scheme of the URI is in
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
449 the set of allowed URIs (`safe_schemes`).
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
450
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
451 >>> sanitizer = HTMLSanitizer()
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
452 >>> sanitizer.is_safe_uri('http://example.org/')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
453 True
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
454 >>> sanitizer.is_safe_uri('javascript:alert(document.cookie)')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
455 False
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
456
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
457 :param uri: the URI to check
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
458 :return: `True` if the URI can be considered safe, `False` otherwise
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
459 :rtype: `bool`
576
b00765a115a5 Improve docs on `Stream.select()` for #135.
cmlenz
parents: 571
diff changeset
460 :since: version 0.4.3
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
461 """
837
adfacd5ed02c Fix for #274.
cmlenz
parents: 750
diff changeset
462 if '#' in uri:
adfacd5ed02c Fix for #274.
cmlenz
parents: 750
diff changeset
463 uri = uri.split('#', 1)[0] # Strip out the fragment identifier
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
464 if ':' not in uri:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
465 return True # This is a relative URI
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
466 chars = [char for char in uri.split(':', 1)[0] if char.isalnum()]
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
467 return ''.join(chars).lower() in self.safe_schemes
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
468
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
469 def sanitize_css(self, text):
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
470 """Remove potentially dangerous property declarations from CSS code.
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
471
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
472 In particular, properties using the CSS ``url()`` function with a scheme
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
473 that is not considered safe are removed:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
474
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
475 >>> sanitizer = HTMLSanitizer()
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
476 >>> sanitizer.sanitize_css(u'''
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
477 ... background: url(javascript:alert("foo"));
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
478 ... color: #000;
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
479 ... ''')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
480 [u'color: #000']
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
481
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
482 Also, the proprietary Internet Explorer function ``expression()`` is
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
483 always stripped:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
484
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
485 >>> sanitizer.sanitize_css(u'''
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
486 ... background: #fff;
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
487 ... color: #000;
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
488 ... width: e/**/xpression(alert("foo"));
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
489 ... ''')
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
490 [u'background: #fff', u'color: #000']
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
491
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
492 :param text: the CSS text; this is expected to be `unicode` and to not
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
493 contain any character or numeric references
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
494 :return: a list of declarations that are considered safe
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
495 :rtype: `list`
576
b00765a115a5 Improve docs on `Stream.select()` for #135.
cmlenz
parents: 571
diff changeset
496 :since: version 0.4.3
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
497 """
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
498 decls = []
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
499 text = self._strip_css_comments(self._replace_unicode_escapes(text))
856
21308bd343b8 Add a couple of fallback imports for Python 3.0.
cmlenz
parents: 854
diff changeset
500 for decl in text.split(';'):
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
501 decl = decl.strip()
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
502 if not decl:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
503 continue
840
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
504 try:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
505 propname, value = decl.split(':', 1)
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
506 except ValueError:
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
507 continue
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
508 if not self.is_safe_css(propname.strip().lower(), value.strip()):
9eb84c75e5ac Ported some of the HTML sanitization improvements from Trac (see [T7658]).
cmlenz
parents: 837
diff changeset
509 continue
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
510 is_evil = False
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
511 if self._EXPRESSION_SEARCH(value):
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
512 is_evil = True
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
513 for match in self._URL_FINDITER(value):
571
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
514 if not self.is_safe_uri(match.group(1)):
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
515 is_evil = True
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
516 break
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
517 if not is_evil:
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
518 decls.append(decl.strip())
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
519 return decls
f0461dc3939a * Cleaned up the implementation of the `HTMLSanitizer`.
cmlenz
parents: 556
diff changeset
520
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
521 _NORMALIZE_NEWLINES = re.compile(r'\r\n').sub
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
522 _UNICODE_ESCAPE = re.compile(
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
523 r"""\\([0-9a-fA-F]{1,6})\s?|\\([^\r\n\f0-9a-fA-F'"{};:()#*])""",
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
524 re.UNICODE).sub
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
525
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
526 def _replace_unicode_escapes(self, text):
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
527 def _repl(match):
951
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
528 t = match.group(1)
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
529 if t:
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
530 return unichr(int(t, 16))
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
531 t = match.group(2)
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
532 if t == '\\':
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
533 return r'\\'
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
534 else:
40415173f513 Merge r1174 and r1175 from trunk (improve sanitizing of CSS in style attributes -- see #455).
hodgestar
parents: 856
diff changeset
535 return t
431
ad01564e87f2 * Don't allow `style` attributes by default in the `HTMLSanitizer`. Closes #97.
cmlenz
parents: 425
diff changeset
536 return self._UNICODE_ESCAPE(_repl, self._NORMALIZE_NEWLINES('\n', text))
556
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
537
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
538 _CSS_COMMENTS = re.compile(r'/\*.*?\*/').sub
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
539
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
540 def _strip_css_comments(self, text):
0d98569eaced The HTML sanitizer now strips any CSS comments in style attributes, which could previously be used to hide malicious property values.
cmlenz
parents: 471
diff changeset
541 return self._CSS_COMMENTS('', text)
Copyright (C) 2012-2017 Edgewall Software