# HG changeset patch # User cmlenz # Date 1156550316 0 # Node ID 51d4101f49cadaea234c782f9a5d9783c6add26f # Parent 48fab34e5e4d9b9221089f4ae434e5006136cf00 * Implement reverse add/mul operators for `Markup` class, so that the result is also a `Markup` instance. * Override the bitwise or (`|`) operator on the `Stream` class, which allows a syntax similar to Unix shell pipes for chaining stream filters. diff --git a/markup/core.py b/markup/core.py --- a/markup/core.py +++ b/markup/core.py @@ -14,6 +14,7 @@ """Core classes for markup processing.""" import htmlentitydefs +import operator import re __all__ = ['Stream', 'Markup', 'escape', 'unescape', 'Namespace', 'QName'] @@ -65,17 +66,64 @@ def __iter__(self): return iter(self.events) + def __or__(self, function): + """Override the "bitwise or" operator to apply filters or serializers + to the stream, providing a syntax similar to pipes on Unix shells. + + Assume the following stream produced by the `HTML` function: + + >>> from markup.input import HTML + >>> html = HTML('''

Hello, world!

''') + >>> print html +

Hello, world!

+ + A filter such as the HTML sanitizer can be applied to that stream using + the pipe notation as follows: + + >>> from markup.filters import HTMLSanitizer + >>> sanitizer = HTMLSanitizer() + >>> print html | sanitizer +

Hello, world!

+ + Filters can be any function that accepts and produces a stream (where + a stream is anything that iterators over events): + + >>> def uppercase(stream): + ... for kind, data, pos in stream: + ... if kind is TEXT: + ... data = data.upper() + ... yield kind, data, pos + >>> print html | sanitizer | uppercase +

HELLO, WORLD!

+ + Serializers can also be used with this notation: + + >>> from markup.output import TextSerializer + >>> output = TextSerializer() + >>> print html | sanitizer | uppercase | output + HELLO, WORLD! + + Commonly, serializers should be used at the end of the "pipeline"; + using them somewhere in the middle may produce unexpected results. + """ + return Stream(_ensure(function(self))) + def filter(self, *filters): """Apply filters to the stream. This method returns a new stream with the given filters applied. The filters must be callables that accept the stream object as parameter, and return the filtered stream. + + The call: + + stream.filter(filter1, filter2) + + is equivalent to: + + stream | filter1 | filter2 """ - stream = self - for filter_ in filters: - stream = filter_(iter(stream)) - return Stream(stream) + return reduce(operator.or_, (self,) + filters) def render(self, method='xml', encoding='utf-8', **kwargs): """Return a string representation of the stream. @@ -129,8 +177,7 @@ 'xhtml': output.XHTMLSerializer, 'html': output.HTMLSerializer, 'text': output.TextSerializer}[method] - serialize = cls(**kwargs) - return serialize(_ensure(self)) + return cls(**kwargs)(_ensure(self)) def __str__(self): return self.render() @@ -335,7 +382,10 @@ return unicode.__new__(cls, text) def __add__(self, other): - return Markup(unicode(self) + escape(other)) + return Markup(unicode(self) + unicode(escape(other))) + + def __radd__(self, other): + return Markup(unicode(escape(other)) + unicode(self)) def __mod__(self, args): if not isinstance(args, (list, tuple)): @@ -345,6 +395,9 @@ def __mul__(self, num): return Markup(unicode(self) * num) + def __rmul__(self, num): + return Markup(num * unicode(self)) + def __repr__(self): return '<%s "%s">' % (self.__class__.__name__, self) diff --git a/markup/tests/core.py b/markup/tests/core.py --- a/markup/tests/core.py +++ b/markup/tests/core.py @@ -66,9 +66,9 @@ self.assertEquals('foo
', markup) def test_add_reverse(self): - markup = 'foo' + Markup('bar') - assert isinstance(markup, unicode) - self.assertEquals('foobar', markup) + markup = '
' + Markup('bar') + assert isinstance(markup, Markup) + self.assertEquals('<br/>bar', markup) def test_mod(self): markup = Markup('%s') % '&' @@ -85,6 +85,11 @@ assert isinstance(markup, Markup) self.assertEquals('foofoo', markup) + def test_mul_reverse(self): + markup = 2 * Markup('foo') + assert isinstance(markup, Markup) + self.assertEquals('foofoo', markup) + def test_join(self): markup = Markup('
').join(['foo', '', Markup('')]) assert isinstance(markup, Markup) diff --git a/markup/tests/filters.py b/markup/tests/filters.py --- a/markup/tests/filters.py +++ b/markup/tests/filters.py @@ -24,96 +24,96 @@ def test_sanitize_unchanged(self): html = HTML('fo
o
') self.assertEquals(u'fo
o
', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) def test_sanitize_escape_text(self): html = HTML('fo&') self.assertEquals(u'fo&', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) html = HTML('<foo>') self.assertEquals(u'<foo>', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) def test_sanitize_entityref_text(self): html = HTML('foö') self.assertEquals(u'foƶ', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) def test_sanitize_escape_attr(self): html = HTML('
') self.assertEquals(u'
', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) def test_sanitize_close_empty_tag(self): html = HTML('fo
o
') self.assertEquals(u'fo
o
', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) def test_sanitize_invalid_entity(self): html = HTML('&junk;') - self.assertEquals('&junk;', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals('&junk;', unicode(html | HTMLSanitizer())) def test_sanitize_remove_script_elem(self): html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) self.assertRaises(ParseError, HTML, 'alert("foo")') self.assertRaises(ParseError, HTML, '') def test_sanitize_remove_onclick_attr(self): html = HTML('
') - self.assertEquals(u'
', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'
', unicode(html | HTMLSanitizer())) def test_sanitize_remove_style_scripts(self): # Inline style with url() using javascript: scheme html = HTML('
') - self.assertEquals(u'
', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'
', unicode(html | HTMLSanitizer())) # Inline style with url() using javascript: scheme, using control char html = HTML('
') - self.assertEquals(u'
', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'
', unicode(html | HTMLSanitizer())) # Inline style with url() using javascript: scheme, in quotes html = HTML('
') - self.assertEquals(u'
', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'
', unicode(html | HTMLSanitizer())) # IE expressions in CSS not allowed html = HTML('
') - self.assertEquals(u'
', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'
', unicode(html | HTMLSanitizer())) html = HTML('
') self.assertEquals(u'
', - unicode(html.filter(HTMLSanitizer()))) + unicode(html | HTMLSanitizer())) def test_sanitize_remove_src_javascript(self): html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) # Case-insensitive protocol matching html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) # Grave accents (not parsed) self.assertRaises(ParseError, HTML, '') # Protocol encoded using UTF-8 numeric entities html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) # Protocol encoded using UTF-8 numeric entities without a semicolon # (which is allowed because the max number of digits is used) html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) # Protocol encoded using UTF-8 numeric hex entities without a semicolon # (which is allowed because the max number of digits is used) html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) # Embedded tab character in protocol html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) # Embedded tab character in protocol, but encoded this time html = HTML('') - self.assertEquals(u'', unicode(html.filter(HTMLSanitizer()))) + self.assertEquals(u'', unicode(html | HTMLSanitizer())) def suite():