# HG changeset patch # User cmlenz # Date 1156504444 0 # Node ID 50eab0469148991a30c5c839bba2e294e5196d01 # Parent 58284b6b000955a4a5de24d5946f46eee7cf5df0 Add serialization to plain text, based on cboos' patch. Closes #41. diff --git a/ChangeLog b/ChangeLog --- a/ChangeLog +++ b/ChangeLog @@ -13,6 +13,7 @@ to multiple names, and semicolons inside string literals are treated as expected. * Generator expressions can now be used in template expressions (ticket #16). + * Added serialization to plain text (ticket #41). Version 0.2 diff --git a/markup/core.py b/markup/core.py --- a/markup/core.py +++ b/markup/core.py @@ -81,8 +81,8 @@ """Return a string representation of the stream. @param method: determines how the stream is serialized; can be either - "xml", "xhtml", or "html", or a custom `Serializer` - subclass + "xml", "xhtml", "html", "text", or a custom serializer + class @param encoding: how the output string should be encoded; if set to `None`, this method returns a `unicode` object @@ -92,7 +92,10 @@ generator = self.serialize(method=method, **kwargs) output = u''.join(list(generator)) if encoding is not None: - return output.encode(encoding, 'xmlcharrefreplace') + errors = 'replace' + if method != 'text': + errors = 'xmlcharrefreplace' + return output.encode(encoding, errors) return output def select(self, path): @@ -113,7 +116,8 @@ string. @param method: determines how the stream is serialized; can be either - "xml", "xhtml", or "html", or a custom serializer class + "xml", "xhtml", "html", "text", or a custom serializer + class Any additional keyword arguments are passed to the serializer, and thus depend on the `method` parameter value. @@ -123,7 +127,8 @@ if isinstance(method, basestring): cls = {'xml': output.XMLSerializer, 'xhtml': output.XHTMLSerializer, - 'html': output.HTMLSerializer}[method] + 'html': output.HTMLSerializer, + 'text': output.TextSerializer}[method] serialize = cls(**kwargs) return serialize(_ensure(self)) @@ -300,8 +305,7 @@ return unichr(ref) else: # character entity ref = match.group(2) - if keepxmlentities and ref in ('amp', 'apos', 'gt', 'lt', - 'quot'): + if keepxmlentities and ref in ('amp', 'apos', 'gt', 'lt', 'quot'): return '&%s;' % ref try: codepoint = htmlentitydefs.name2codepoint[ref] diff --git a/markup/output.py b/markup/output.py --- a/markup/output.py +++ b/markup/output.py @@ -26,7 +26,8 @@ from markup.core import DOCTYPE, START, END, START_NS, TEXT, START_CDATA, \ END_CDATA, PI, COMMENT, XML_NAMESPACE -__all__ = ['Serializer', 'XMLSerializer', 'HTMLSerializer'] +__all__ = ['DocType', 'XMLSerializer', 'XHTMLSerializer', 'HTMLSerializer', + 'TextSerializer'] class DocType(object): @@ -398,6 +399,37 @@ yield Markup('' % data) +class TextSerializer(object): + """Produces plain text from an event stream. + + Only text events are included in the output. Unlike the other serializer, + special XML characters are not escaped: + + >>> from markup.builder import tag + >>> elem = tag.div(tag.a('', href='foo'), tag.br) + >>> print elem +
<Hello!>
+ >>> print ''.join(TextSerializer()(elem.generate())) + + + If text events contain literal markup (instances of the `Markup` class), + tags or entities are stripped from the output: + + >>> elem = tag.div(Markup('Hello!
')) + >>> print elem +
Hello!
+ >>> print ''.join(TextSerializer()(elem.generate())) + Hello! + """ + + def __call__(self, stream): + for kind, data, pos in stream: + if kind is TEXT: + if type(data) is Markup: + data = data.striptags().stripentities() + yield data + + class WhitespaceFilter(object): """A filter that removes extraneous ignorable white space from the stream."""