cmlenz@501: # -*- coding: utf-8 -*- cmlenz@501: # cmlenz@531: # Copyright (C) 2007 Edgewall Software cmlenz@501: # All rights reserved. cmlenz@501: # cmlenz@501: # This software is licensed as described in the file COPYING, which cmlenz@501: # you should have received as part of this distribution. The terms cmlenz@501: # are also available at http://genshi.edgewall.org/wiki/License. cmlenz@501: # cmlenz@501: # This software consists of voluntary contributions made by many cmlenz@501: # individuals. For the exact contribution history, see the revision cmlenz@501: # history and logs, available at http://genshi.edgewall.org/log/. cmlenz@501: cmlenz@503: """A filter for functional-style transformations of markup streams. cmlenz@503: cmlenz@503: The `Transformer` filter provides a variety of transformations that can be cmlenz@503: applied to parts of streams that match given XPath expressions. These cmlenz@503: transformations can be chained to achieve results that would be comparitively cmlenz@503: tedious to achieve by writing stream filters by hand. The approach of chaining cmlenz@503: node selection and transformation has been inspired by the `jQuery`_ Javascript cmlenz@503: library. cmlenz@503: cmlenz@503: .. _`jQuery`: http://jquery.com/ cmlenz@502: cmlenz@502: For example, the following transformation removes the ```` element from cmlenz@502: the ``<head>`` of the input document: cmlenz@502: cmlenz@504: >>> from genshi.builder import tag cmlenz@504: >>> html = HTML('''<html> cmlenz@504: ... <head><title>Some Title cmlenz@504: ... cmlenz@504: ... Some body text. cmlenz@504: ... cmlenz@504: ... ''') athomas@533: >>> print html | Transformer('body/em').map(unicode.upper, TEXT) \\ cmlenz@504: ... .unwrap().wrap(tag.u) cmlenz@504: cmlenz@504: Some Title cmlenz@504: cmlenz@504: Some BODY text. cmlenz@504: cmlenz@504: cmlenz@502: cmlenz@502: The ``Transformer`` support a large number of useful transformations out of the cmlenz@502: box, but custom transformations can be added easily. cmlenz@576: cmlenz@576: :since: version 0.5 cmlenz@501: """ cmlenz@501: athomas@533: import re cmlenz@501: import sys cmlenz@501: cmlenz@501: from genshi.builder import Element cmlenz@504: from genshi.core import Stream, Attrs, QName, TEXT, START, END, _ensure cmlenz@504: from genshi.path import Path cmlenz@501: athomas@518: __all__ = ['Transformer', 'StreamBuffer', 'InjectorTransformation', 'ENTER', athomas@518: 'EXIT', 'INSIDE', 'OUTSIDE'] cmlenz@501: cmlenz@501: cmlenz@501: class TransformMark(str): cmlenz@501: """A mark on a transformation stream.""" cmlenz@501: __slots__ = [] cmlenz@501: _instances = {} cmlenz@501: cmlenz@501: def __new__(cls, val): cmlenz@501: return cls._instances.setdefault(val, str.__new__(cls, val)) cmlenz@501: cmlenz@501: cmlenz@502: ENTER = TransformMark('ENTER') athomas@515: """Stream augmentation mark indicating that a selected element is being cmlenz@502: entered.""" cmlenz@502: cmlenz@501: INSIDE = TransformMark('INSIDE') cmlenz@502: """Stream augmentation mark indicating that processing is currently inside a athomas@515: selected element.""" cmlenz@502: cmlenz@501: OUTSIDE = TransformMark('OUTSIDE') athomas@515: """Stream augmentation mark indicating that a match occurred outside a selected athomas@515: element.""" cmlenz@502: athomas@519: ATTR = TransformMark('ATTR') athomas@519: """Stream augmentation mark indicating a selected element attribute.""" athomas@517: cmlenz@502: EXIT = TransformMark('EXIT') athomas@515: """Stream augmentation mark indicating that a selected element is being cmlenz@502: exited.""" cmlenz@501: cmlenz@501: cmlenz@501: class Transformer(object): cmlenz@501: """Stream filter that can apply a variety of different transformations to cmlenz@501: a stream. cmlenz@501: cmlenz@501: This is achieved by selecting the events to be transformed using XPath, cmlenz@501: then applying the transformations to the events matched by the path cmlenz@501: expression. Each marked event is in the form (mark, (kind, data, pos)), cmlenz@503: where mark can be any of `ENTER`, `INSIDE`, `EXIT`, `OUTSIDE`, or `None`. cmlenz@501: cmlenz@501: The first three marks match `START` and `END` events, and any events cmlenz@501: contained `INSIDE` any selected XML/HTML element. A non-element match cmlenz@501: outside a `START`/`END` container (e.g. ``text()``) will yield an `OUTSIDE` cmlenz@501: mark. cmlenz@501: cmlenz@502: >>> html = HTML('Some Title' cmlenz@502: ... 'Some body text.') cmlenz@501: cmlenz@503: Transformations act on selected stream events matching an XPath expression. cmlenz@503: Here's an example of removing some markup (the title, in this case) cmlenz@503: selected by an expression: cmlenz@501: cmlenz@503: >>> print html | Transformer('head/title').remove() cmlenz@501: Some body text. cmlenz@501: cmlenz@503: Inserted content can be passed in the form of a string, or a markup event cmlenz@503: stream, which includes streams generated programmatically via the cmlenz@503: `builder` module: cmlenz@501: cmlenz@501: >>> from genshi.builder import tag cmlenz@503: >>> print html | Transformer('body').prepend(tag.h1('Document Title')) cmlenz@501: Some Title

Document cmlenz@501: Title

Some body text. cmlenz@501: cmlenz@501: Each XPath expression determines the set of tags that will be acted upon by cmlenz@503: subsequent transformations. In this example we select the ```` text, cmlenz@503: copy it into a buffer, then select the ``<body>`` element and paste the cmlenz@503: copied text into the body as ``<h1>`` enclosed text: cmlenz@501: cmlenz@506: >>> buffer = StreamBuffer() cmlenz@503: >>> print html | Transformer('head/title/text()').copy(buffer) \\ athomas@514: ... .end().select('body').prepend(tag.h1(buffer)) cmlenz@501: <html><head><title>Some Title

Some Title

Some cmlenz@501: body text. cmlenz@501: cmlenz@501: Transformations can also be assigned and reused, although care must be cmlenz@501: taken when using buffers, to ensure that buffers are cleared between cmlenz@501: transforms: cmlenz@501: athomas@517: >>> emphasis = Transformer('body//em').attr('class', 'emphasis') cmlenz@502: >>> print html | emphasis cmlenz@501: Some TitleSome body text. cmlenz@501: """ cmlenz@501: cmlenz@504: __slots__ = ['transforms'] cmlenz@501: athomas@578: def __init__(self, path='.'): cmlenz@501: """Construct a new transformation filter. cmlenz@501: cmlenz@503: :param path: an XPath expression (as string) or a `Path` instance cmlenz@501: """ athomas@575: self.transforms = [SelectTransformation(path)] cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@503: :param stream: the marked event stream to filter cmlenz@501: :return: the transformed stream cmlenz@501: :rtype: `Stream` cmlenz@501: """ cmlenz@501: transforms = self._mark(stream) cmlenz@501: for link in self.transforms: cmlenz@501: transforms = link(transforms) cmlenz@605: return Stream(self._unmark(transforms), cmlenz@605: serializer=getattr(stream, 'serializer', None)) cmlenz@501: athomas@533: def apply(self, function): athomas@533: """Apply a transformation to the stream. cmlenz@501: cmlenz@501: Transformations can be chained, similar to stream filters. Any callable cmlenz@501: accepting a marked stream can be used as a transform. cmlenz@501: cmlenz@501: As an example, here is a simple `TEXT` event upper-casing transform: cmlenz@501: cmlenz@501: >>> def upper(stream): cmlenz@501: ... for mark, (kind, data, pos) in stream: cmlenz@501: ... if mark and kind is TEXT: cmlenz@501: ... yield mark, (kind, data.upper(), pos) cmlenz@501: ... else: cmlenz@501: ... yield mark, (kind, data, pos) cmlenz@501: >>> short_stream = HTML('Some test text') athomas@533: >>> print short_stream | Transformer('.//em/text()').apply(upper) cmlenz@501: Some TEST text cmlenz@501: """ cmlenz@504: transformer = Transformer() cmlenz@504: transformer.transforms = self.transforms[:] cmlenz@501: if isinstance(function, Transformer): cmlenz@504: transformer.transforms.extend(function.transforms) cmlenz@501: else: cmlenz@504: transformer.transforms.append(function) cmlenz@504: return transformer cmlenz@501: cmlenz@501: #{ Selection operations cmlenz@501: cmlenz@501: def select(self, path): athomas@514: """Mark events matching the given XPath expression, within the current athomas@514: selection. cmlenz@501: cmlenz@501: >>> html = HTML('Some test text') cmlenz@501: >>> print html | Transformer().select('.//em').trace() cmlenz@501: (None, ('START', (QName(u'body'), Attrs()), (None, 1, 0))) cmlenz@501: (None, ('TEXT', u'Some ', (None, 1, 6))) cmlenz@502: ('ENTER', ('START', (QName(u'em'), Attrs()), (None, 1, 11))) cmlenz@501: ('INSIDE', ('TEXT', u'test', (None, 1, 15))) cmlenz@502: ('EXIT', ('END', QName(u'em'), (None, 1, 19))) cmlenz@501: (None, ('TEXT', u' text', (None, 1, 24))) cmlenz@501: (None, ('END', QName(u'body'), (None, 1, 29))) cmlenz@501: Some test text cmlenz@501: cmlenz@503: :param path: an XPath expression (as string) or a `Path` instance cmlenz@501: :return: the stream augmented by transformation marks cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(SelectTransformation(path)) cmlenz@501: cmlenz@501: def invert(self): cmlenz@501: """Invert selection so that marked events become unmarked, and vice cmlenz@501: versa. cmlenz@501: cmlenz@501: Specificaly, all marks are converted to null marks, and all null marks cmlenz@501: are converted to OUTSIDE marks. cmlenz@501: cmlenz@501: >>> html = HTML('Some test text') cmlenz@501: >>> print html | Transformer('//em').invert().trace() cmlenz@501: ('OUTSIDE', ('START', (QName(u'body'), Attrs()), (None, 1, 0))) cmlenz@501: ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6))) cmlenz@501: (None, ('START', (QName(u'em'), Attrs()), (None, 1, 11))) cmlenz@501: (None, ('TEXT', u'test', (None, 1, 15))) cmlenz@501: (None, ('END', QName(u'em'), (None, 1, 19))) cmlenz@501: ('OUTSIDE', ('TEXT', u' text', (None, 1, 24))) cmlenz@501: ('OUTSIDE', ('END', QName(u'body'), (None, 1, 29))) cmlenz@501: Some test text cmlenz@501: cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(InvertTransformation()) cmlenz@501: athomas@514: def end(self): athomas@514: """End current selection, allowing all events to be selected. athomas@514: athomas@514: Example: athomas@514: athomas@514: >>> html = HTML('Some test text') athomas@514: >>> print html | Transformer('//em').end().trace() athomas@514: ('OUTSIDE', ('START', (QName(u'body'), Attrs()), (None, 1, 0))) athomas@514: ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6))) athomas@514: ('OUTSIDE', ('START', (QName(u'em'), Attrs()), (None, 1, 11))) athomas@514: ('OUTSIDE', ('TEXT', u'test', (None, 1, 15))) athomas@514: ('OUTSIDE', ('END', QName(u'em'), (None, 1, 19))) athomas@514: ('OUTSIDE', ('TEXT', u' text', (None, 1, 24))) athomas@514: ('OUTSIDE', ('END', QName(u'body'), (None, 1, 29))) athomas@514: Some test text athomas@514: athomas@514: :return: the stream augmented by transformation marks athomas@514: :rtype: `Transformer` athomas@514: """ athomas@533: return self.apply(EndTransformation()) athomas@514: cmlenz@501: #{ Deletion operations cmlenz@501: cmlenz@501: def empty(self): cmlenz@501: """Empty selected elements of all content. cmlenz@501: cmlenz@501: Example: cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em').empty() cmlenz@501: Some TitleSome cmlenz@501: text. cmlenz@501: cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(EmptyTransformation()) cmlenz@501: cmlenz@501: def remove(self): cmlenz@501: """Remove selection from the stream. cmlenz@501: cmlenz@501: Example: cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em').remove() cmlenz@501: Some TitleSome cmlenz@501: text. cmlenz@501: cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(RemoveTransformation()) cmlenz@501: cmlenz@501: #{ Direct element operations cmlenz@501: cmlenz@501: def unwrap(self): cmlenz@504: """Remove outermost enclosing elements from selection. cmlenz@501: cmlenz@501: Example: cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em').unwrap() cmlenz@501: Some TitleSome body cmlenz@501: text. cmlenz@501: cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(UnwrapTransformation()) cmlenz@501: cmlenz@501: def wrap(self, element): cmlenz@501: """Wrap selection in an element. cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em').wrap('strong') cmlenz@501: Some TitleSome cmlenz@501: body text. cmlenz@501: cmlenz@504: :param element: either a tag name (as string) or an `Element` object cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(WrapTransformation(element)) cmlenz@501: cmlenz@501: #{ Content insertion operations cmlenz@501: cmlenz@501: def replace(self, content): cmlenz@501: """Replace selection with content. cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//title/text()').replace('New Title') cmlenz@501: New TitleSome body cmlenz@501: text. cmlenz@501: cmlenz@501: :param content: Either an iterable of events or a string to insert. cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(ReplaceTransformation(content)) cmlenz@501: cmlenz@501: def before(self, content): cmlenz@501: """Insert content before selection. cmlenz@501: cmlenz@501: In this example we insert the word 'emphasised' before the opening cmlenz@501: tag: cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em').before('emphasised ') cmlenz@501: Some TitleSome emphasised cmlenz@501: body text. cmlenz@501: cmlenz@501: :param content: Either an iterable of events or a string to insert. cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(BeforeTransformation(content)) cmlenz@501: cmlenz@501: def after(self, content): cmlenz@501: """Insert content after selection. cmlenz@501: cmlenz@501: Here, we insert some text after the closing tag: cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em').after(' rock') cmlenz@501: Some TitleSome body cmlenz@501: rock text. cmlenz@501: cmlenz@501: :param content: Either an iterable of events or a string to insert. cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(AfterTransformation(content)) cmlenz@501: cmlenz@501: def prepend(self, content): cmlenz@502: """Insert content after the ENTER event of the selection. cmlenz@501: cmlenz@501: Inserting some new text at the start of the : cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//body').prepend('Some new body text. ') cmlenz@501: Some TitleSome new body text. cmlenz@501: Some body text. cmlenz@501: cmlenz@501: :param content: Either an iterable of events or a string to insert. cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(PrependTransformation(content)) cmlenz@501: cmlenz@501: def append(self, content): cmlenz@501: """Insert content before the END event of the selection. cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//body').append(' Some new body text.') cmlenz@501: Some TitleSome body cmlenz@501: text. Some new body text. cmlenz@501: cmlenz@501: :param content: Either an iterable of events or a string to insert. cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(AppendTransformation(content)) cmlenz@501: cmlenz@501: #{ Attribute manipulation cmlenz@501: athomas@517: def attr(self, name, value): athomas@517: """Add, replace or delete an attribute on selected elements. cmlenz@501: athomas@517: If `value` evaulates to `None` the attribute will be deleted from the athomas@517: element: cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' athomas@517: ... 'Some body text.' athomas@517: ... '') athomas@517: >>> print html | Transformer('body/em').attr('class', None) athomas@517: Some TitleSome body athomas@517: text. athomas@517: athomas@517: Otherwise the attribute will be set to `value`: athomas@517: athomas@517: >>> print html | Transformer('body/em').attr('class', 'emphasis') cmlenz@501: Some TitleSome body text. athomas@517: athomas@517: If `value` is a callable it will be called with the attribute name and athomas@517: the `START` event for the matching element. Its return value will then athomas@517: be used to set the attribute: athomas@517: athomas@517: >>> def print_attr(name, event): athomas@517: ... attrs = event[1][1] athomas@517: ... print attrs athomas@517: ... return attrs.get(name) athomas@517: >>> print html | Transformer('body/em').attr('class', print_attr) athomas@517: Attrs([(QName(u'class'), u'before')]) athomas@517: Attrs() athomas@517: Some TitleSome body text. cmlenz@501: cmlenz@503: :param name: the name of the attribute athomas@517: :param value: the value that should be set for the attribute. cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(AttrTransformation(name, value)) cmlenz@501: cmlenz@501: #{ Buffer operations cmlenz@501: cmlenz@501: def copy(self, buffer): cmlenz@501: """Copy selection into buffer. cmlenz@501: cmlenz@501: >>> from genshi.builder import tag cmlenz@506: >>> buffer = StreamBuffer() cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') athomas@509: >>> print html | Transformer('title/text()').copy(buffer) \\ athomas@514: ... .end().select('body').prepend(tag.h1(buffer)) athomas@509: Some Title

Some athomas@509: Title

Some body text. athomas@509: athomas@509: To ensure that a transformation can be reused deterministically, the athomas@509: contents of ``buffer`` is replaced by the ``copy()`` operation: athomas@509: athomas@509: >>> print buffer athomas@509: Some Title athomas@509: >>> print html | Transformer('head/title/text()').copy(buffer) \\ athomas@514: ... .end().select('body/em').copy(buffer).end().select('body') \\ athomas@509: ... .prepend(tag.h1(buffer)) athomas@509: Some athomas@509: Title

body

Some body athomas@509: text. athomas@509: >>> print buffer athomas@509: body athomas@509: athomas@519: Element attributes can also be copied for later use: athomas@519: athomas@519: >>> html = HTML('Some Title' athomas@519: ... 'Some body' athomas@519: ... 'text.') athomas@519: >>> buffer = StreamBuffer() athomas@519: >>> def apply_attr(name, entry): athomas@519: ... return list(buffer)[0][1][1].get('class') athomas@519: >>> print html | Transformer('body/em[@class]/@class').copy(buffer) \\ athomas@519: ... .end().select('body/em[not(@class)]').attr('class', apply_attr) athomas@519: Some TitleSome bodytext. athomas@519: cmlenz@501: cmlenz@507: :param buffer: the `StreamBuffer` in which the selection should be cmlenz@507: stored cmlenz@501: :rtype: `Transformer` cmlenz@501: :note: this transformation will buffer the entire input stream cmlenz@501: """ athomas@533: return self.apply(CopyTransformation(buffer)) cmlenz@501: cmlenz@501: def cut(self, buffer): cmlenz@501: """Copy selection into buffer and remove the selection from the stream. cmlenz@501: cmlenz@501: >>> from genshi.builder import tag cmlenz@506: >>> buffer = StreamBuffer() cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') cmlenz@501: >>> print html | Transformer('.//em/text()').cut(buffer) \\ athomas@514: ... .end().select('.//em').after(tag.h1(buffer)) cmlenz@501: Some TitleSome cmlenz@501:

body

text. cmlenz@501: cmlenz@507: :param buffer: the `StreamBuffer` in which the selection should be cmlenz@507: stored cmlenz@501: :rtype: `Transformer` cmlenz@501: :note: this transformation will buffer the entire input stream cmlenz@501: """ athomas@533: return self.apply(CutTransformation(buffer)) cmlenz@501: cmlenz@501: #{ Miscellaneous operations cmlenz@501: athomas@533: def filter(self, filter): athomas@533: """Apply a normal stream filter to the selection. The filter is called athomas@533: once for each contiguous block of marked events. athomas@533: athomas@533: >>> from genshi.filters.html import HTMLSanitizer athomas@533: >>> html = HTML('Some text and some more text') athomas@533: >>> print html | Transformer('body/*').filter(HTMLSanitizer()) athomas@533: Some text and some more text athomas@533: athomas@533: :param filter: The stream filter to apply. athomas@533: :rtype: `Transformer` athomas@533: """ athomas@533: return self.apply(FilterTransformation(filter)) athomas@533: athomas@533: def map(self, function, kind): athomas@533: """Applies a function to the ``data`` element of events of ``kind`` in cmlenz@501: the selection. cmlenz@501: cmlenz@501: >>> html = HTML('Some Title' cmlenz@501: ... 'Some body text.') athomas@533: >>> print html | Transformer('head/title').map(unicode.upper, TEXT) cmlenz@501: SOME TITLESome body cmlenz@501: text. cmlenz@501: cmlenz@501: :param function: the function to apply cmlenz@501: :param kind: the kind of event the function should be applied to cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(MapTransformation(function, kind)) athomas@533: athomas@533: def substitute(self, pattern, replace, count=1): athomas@533: """Replace text matching a regular expression. athomas@533: athomas@533: Refer to the documentation for ``re.sub()`` for details. athomas@533: athomas@533: >>> html = HTML('Some text, some more text and ' athomas@533: ... 'some bold text') athomas@533: >>> print html | Transformer('body').substitute('(?i)some', 'SOME') athomas@533: SOME text, some more text and SOME bold text athomas@533: athomas@533: :param pattern: A regular expression object or string. athomas@533: :param replace: Replacement pattern. athomas@533: :param count: Number of replacements to make in each text fragment. athomas@533: :rtype: `Transformer` athomas@533: """ athomas@533: return self.apply(SubstituteTransformation(pattern, replace, count)) cmlenz@501: athomas@578: def rename(self, name): athomas@578: """Rename matching elements. athomas@578: athomas@578: >>> html = HTML('Some text, some more text and ' athomas@578: ... 'some bold text') athomas@578: >>> print html | Transformer('body/b').rename('strong') athomas@578: Some text, some more text and some bold text athomas@578: """ athomas@578: return self.apply(RenameTransformation(name)) athomas@578: cmlenz@501: def trace(self, prefix='', fileobj=None): cmlenz@501: """Print events as they pass through the transform. cmlenz@501: cmlenz@501: >>> html = HTML('Some test text') cmlenz@501: >>> print html | Transformer('em').trace() cmlenz@501: (None, ('START', (QName(u'body'), Attrs()), (None, 1, 0))) cmlenz@501: (None, ('TEXT', u'Some ', (None, 1, 6))) cmlenz@502: ('ENTER', ('START', (QName(u'em'), Attrs()), (None, 1, 11))) cmlenz@501: ('INSIDE', ('TEXT', u'test', (None, 1, 15))) cmlenz@502: ('EXIT', ('END', QName(u'em'), (None, 1, 19))) cmlenz@501: (None, ('TEXT', u' text', (None, 1, 24))) cmlenz@501: (None, ('END', QName(u'body'), (None, 1, 29))) cmlenz@501: Some test text cmlenz@501: cmlenz@501: :param prefix: a string to prefix each event with in the output cmlenz@501: :param fileobj: the writable file-like object to write to; defaults to cmlenz@501: the standard output stream cmlenz@501: :rtype: `Transformer` cmlenz@501: """ athomas@533: return self.apply(TraceTransformation(prefix, fileobj=fileobj)) cmlenz@501: cmlenz@501: # Internal methods cmlenz@501: cmlenz@501: def _mark(self, stream): cmlenz@501: for event in stream: athomas@514: yield OUTSIDE, event cmlenz@501: cmlenz@501: def _unmark(self, stream): cmlenz@501: for mark, event in stream: athomas@519: if event[0] is not None: athomas@519: yield event cmlenz@501: cmlenz@501: cmlenz@504: class SelectTransformation(object): cmlenz@501: """Select and mark events that match an XPath expression.""" cmlenz@503: cmlenz@501: def __init__(self, path): cmlenz@501: """Create selection. cmlenz@501: cmlenz@503: :param path: an XPath expression (as string) or a `Path` object cmlenz@501: """ cmlenz@503: if not isinstance(path, Path): cmlenz@503: path = Path(path) cmlenz@503: self.path = path cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@503: :param stream: the marked event stream to filter cmlenz@501: """ cmlenz@501: namespaces = {} cmlenz@501: variables = {} cmlenz@501: test = self.path.test() cmlenz@501: stream = iter(stream) cmlenz@501: for mark, event in stream: athomas@514: if mark is None: athomas@514: yield mark, event athomas@514: continue cmlenz@501: result = test(event, {}, {}) athomas@519: # XXX This is effectively genshi.core._ensure() for transform athomas@519: # streams. cmlenz@501: if result is True: cmlenz@501: if event[0] is START: cmlenz@502: yield ENTER, event cmlenz@501: depth = 1 cmlenz@501: while depth > 0: cmlenz@501: mark, subevent = stream.next() cmlenz@501: if subevent[0] is START: cmlenz@501: depth += 1 cmlenz@501: elif subevent[0] is END: cmlenz@501: depth -= 1 cmlenz@501: if depth == 0: cmlenz@502: yield EXIT, subevent cmlenz@501: else: cmlenz@501: yield INSIDE, subevent cmlenz@501: test(subevent, {}, {}, updateonly=True) cmlenz@501: else: cmlenz@501: yield OUTSIDE, event athomas@519: elif isinstance(result, Attrs): athomas@519: # XXX Selected *attributes* are given a "kind" of None to athomas@519: # indicate they are not really part of the stream. athomas@519: yield ATTR, (None, (QName(event[1][0] + '@*'), result), event[2]) athomas@519: yield None, event cmlenz@501: elif result: athomas@519: yield None, (TEXT, unicode(result), (None, -1, -1)) cmlenz@501: else: cmlenz@501: yield None, event cmlenz@501: cmlenz@501: cmlenz@504: class InvertTransformation(object): cmlenz@501: """Invert selection so that marked events become unmarked, and vice versa. cmlenz@501: cmlenz@501: Specificaly, all input marks are converted to null marks, and all input cmlenz@501: null marks are converted to OUTSIDE marks. cmlenz@501: """ cmlenz@501: cmlenz@504: def __call__(self, stream): cmlenz@504: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@504: :param stream: the marked event stream to filter cmlenz@504: """ cmlenz@504: for mark, event in stream: cmlenz@504: if mark: cmlenz@504: yield None, event cmlenz@504: else: cmlenz@504: yield OUTSIDE, event cmlenz@501: cmlenz@501: athomas@514: class EndTransformation(object): athomas@514: """End the current selection.""" athomas@514: athomas@514: def __call__(self, stream): athomas@514: """Apply the transform filter to the marked stream. athomas@514: athomas@514: :param stream: the marked event stream to filter athomas@514: """ athomas@514: for mark, event in stream: athomas@514: yield OUTSIDE, event athomas@514: athomas@514: cmlenz@504: class EmptyTransformation(object): cmlenz@504: """Empty selected elements of all content.""" cmlenz@504: cmlenz@504: def __call__(self, stream): cmlenz@504: """Apply the transform filter to the marked stream. cmlenz@504: cmlenz@504: :param stream: the marked event stream to filter cmlenz@504: """ cmlenz@504: for mark, event in stream: cmlenz@504: if mark not in (INSIDE, OUTSIDE): cmlenz@504: yield mark, event cmlenz@504: cmlenz@504: cmlenz@504: class RemoveTransformation(object): cmlenz@504: """Remove selection from the stream.""" cmlenz@504: cmlenz@504: def __call__(self, stream): cmlenz@504: """Apply the transform filter to the marked stream. cmlenz@504: cmlenz@504: :param stream: the marked event stream to filter cmlenz@504: """ cmlenz@504: for mark, event in stream: cmlenz@504: if mark is None: cmlenz@504: yield mark, event cmlenz@504: cmlenz@504: cmlenz@504: class UnwrapTransformation(object): cmlenz@504: """Remove outtermost enclosing elements from selection.""" cmlenz@504: cmlenz@504: def __call__(self, stream): cmlenz@504: """Apply the transform filter to the marked stream. cmlenz@504: cmlenz@504: :param stream: the marked event stream to filter cmlenz@504: """ cmlenz@504: for mark, event in stream: cmlenz@504: if mark not in (ENTER, EXIT): cmlenz@504: yield mark, event cmlenz@504: cmlenz@504: cmlenz@504: class WrapTransformation(object): cmlenz@501: """Wrap selection in an element.""" cmlenz@501: cmlenz@501: def __init__(self, element): cmlenz@501: if isinstance(element, Element): cmlenz@501: self.element = element cmlenz@501: else: cmlenz@501: self.element = Element(element) cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@501: for mark, event in stream: cmlenz@501: if mark: cmlenz@501: element = list(self.element.generate()) cmlenz@501: for prefix in element[:-1]: cmlenz@501: yield None, prefix cmlenz@501: yield mark, event cmlenz@501: while True: athomas@575: try: athomas@575: mark, event = stream.next() athomas@575: except StopIteration: athomas@575: yield None, element[-1] cmlenz@501: if not mark: cmlenz@501: break cmlenz@501: yield mark, event cmlenz@501: yield None, element[-1] cmlenz@501: yield mark, event cmlenz@501: else: cmlenz@501: yield mark, event cmlenz@501: cmlenz@501: cmlenz@504: class TraceTransformation(object): cmlenz@501: """Print events as they pass through the transform.""" cmlenz@501: cmlenz@501: def __init__(self, prefix='', fileobj=None): cmlenz@501: """Trace constructor. cmlenz@501: cmlenz@501: :param prefix: text to prefix each traced line with. cmlenz@501: :param fileobj: the writable file-like object to write to cmlenz@501: """ cmlenz@501: self.prefix = prefix cmlenz@501: self.fileobj = fileobj or sys.stdout cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@503: :param stream: the marked event stream to filter cmlenz@501: """ cmlenz@501: for event in stream: cmlenz@501: print>>self.fileobj, self.prefix + str(event) cmlenz@501: yield event cmlenz@501: cmlenz@501: athomas@533: class FilterTransformation(object): athomas@533: """Apply a normal stream filter to the selection. The filter is called once athomas@533: for each contiguous block of marked events.""" athomas@533: athomas@533: def __init__(self, filter): athomas@533: """Create the transform. athomas@533: athomas@533: :param filter: The stream filter to apply. athomas@533: """ athomas@533: self.filter = filter athomas@533: athomas@533: def __call__(self, stream): athomas@533: """Apply the transform filter to the marked stream. athomas@533: athomas@533: :param stream: The marked event stream to filter athomas@533: """ athomas@533: def flush(queue): athomas@533: if queue: athomas@533: for event in self.filter(queue): athomas@533: yield OUTSIDE, event athomas@533: del queue[:] athomas@533: athomas@533: queue = [] athomas@533: for mark, event in stream: athomas@533: if mark: athomas@533: queue.append(event) athomas@533: else: athomas@577: for queue_event in flush(queue): athomas@577: yield queue_event athomas@533: yield None, event athomas@533: for event in flush(queue): athomas@533: yield event athomas@533: athomas@533: athomas@533: class MapTransformation(object): cmlenz@501: """Apply a function to the `data` element of events of ``kind`` in the cmlenz@501: selection. cmlenz@501: """ cmlenz@501: cmlenz@501: def __init__(self, function, kind): cmlenz@501: """Create the transform. cmlenz@501: cmlenz@503: :param function: the function to apply; the function must take one cmlenz@503: argument, the `data` element of each selected event cmlenz@503: :param kind: the stream event ``kind`` to apply the `function` to cmlenz@501: """ cmlenz@501: self.function = function cmlenz@501: self.kind = kind cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ cmlenz@501: for mark, (kind, data, pos) in stream: cmlenz@504: if mark and self.kind in (None, kind): cmlenz@501: yield mark, (kind, self.function(data), pos) cmlenz@501: else: cmlenz@501: yield mark, (kind, data, pos) cmlenz@501: cmlenz@501: athomas@533: class SubstituteTransformation(object): athomas@533: """Replace text matching a regular expression. athomas@533: athomas@533: Refer to the documentation for ``re.sub()`` for details. athomas@533: """ athomas@533: def __init__(self, pattern, replace, count=1): athomas@533: """Create the transform. athomas@533: athomas@533: :param pattern: A regular expression object, or string. athomas@533: :param replace: Replacement pattern. athomas@533: :param count: Number of replacements to make in each text fragment. athomas@533: """ athomas@533: if isinstance(pattern, basestring): athomas@533: self.pattern = re.compile(pattern) athomas@533: else: athomas@533: self.pattern = pattern athomas@533: self.count = count athomas@533: self.replace = replace athomas@533: athomas@533: def __call__(self, stream): athomas@533: """Apply the transform filter to the marked stream. athomas@533: athomas@533: :param stream: The marked event stream to filter athomas@533: """ athomas@533: for mark, (kind, data, pos) in stream: athomas@533: if kind is TEXT: athomas@533: data = self.pattern.sub(self.replace, data, self.count) athomas@533: yield mark, (kind, data, pos) athomas@533: athomas@533: athomas@578: class RenameTransformation(object): athomas@578: """Rename matching elements.""" athomas@578: def __init__(self, name): athomas@578: """Create the transform. athomas@578: athomas@578: :param name: New element name. athomas@578: """ athomas@578: self.name = QName(name) athomas@578: athomas@578: def __call__(self, stream): athomas@578: """Apply the transform filter to the marked stream. athomas@578: athomas@578: :param stream: The marked event stream to filter athomas@578: """ athomas@578: for mark, (kind, data, pos) in stream: athomas@578: if mark is ENTER: athomas@578: data = self.name, data[1] athomas@578: elif mark is EXIT: athomas@578: data = self.name athomas@578: yield mark, (kind, data, pos) athomas@578: athomas@578: cmlenz@504: class InjectorTransformation(object): cmlenz@501: """Abstract base class for transformations that inject content into a cmlenz@501: stream. cmlenz@501: cmlenz@504: >>> class Top(InjectorTransformation): cmlenz@501: ... def __call__(self, stream): cmlenz@501: ... for event in self._inject(): cmlenz@501: ... yield event cmlenz@501: ... for event in stream: cmlenz@501: ... yield event cmlenz@501: >>> html = HTML('Some test text') athomas@533: >>> print html | Transformer('.//em').apply(Top('Prefix ')) cmlenz@501: Prefix Some test text cmlenz@501: """ cmlenz@501: def __init__(self, content): cmlenz@501: """Create a new injector. cmlenz@501: cmlenz@501: :param content: An iterable of Genshi stream events, or a string to be cmlenz@501: injected. cmlenz@501: """ cmlenz@501: self.content = content cmlenz@501: cmlenz@501: def _inject(self): cmlenz@504: for event in _ensure(self.content): cmlenz@504: yield None, event cmlenz@501: cmlenz@501: cmlenz@504: class ReplaceTransformation(InjectorTransformation): cmlenz@501: """Replace selection with content.""" cmlenz@503: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ cmlenz@501: for mark, event in stream: cmlenz@501: if mark is not None: cmlenz@501: for subevent in self._inject(): cmlenz@501: yield subevent cmlenz@501: while True: cmlenz@501: mark, event = stream.next() cmlenz@501: if mark is None: cmlenz@501: yield mark, event cmlenz@501: break cmlenz@501: else: cmlenz@501: yield mark, event cmlenz@501: cmlenz@501: cmlenz@504: class BeforeTransformation(InjectorTransformation): cmlenz@501: """Insert content before selection.""" cmlenz@503: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ cmlenz@501: for mark, event in stream: athomas@575: if mark is not None: cmlenz@501: for subevent in self._inject(): cmlenz@501: yield subevent athomas@575: yield mark, event athomas@575: while True: athomas@575: mark, event = stream.next() athomas@575: if not mark: athomas@575: break athomas@575: yield mark, event cmlenz@501: yield mark, event cmlenz@501: cmlenz@501: cmlenz@504: class AfterTransformation(InjectorTransformation): cmlenz@501: """Insert content after selection.""" cmlenz@503: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ cmlenz@501: for mark, event in stream: cmlenz@501: yield mark, event cmlenz@501: if mark: cmlenz@501: while True: athomas@575: try: athomas@575: mark, event = stream.next() athomas@575: except StopIteration: athomas@575: break cmlenz@501: if not mark: cmlenz@501: break cmlenz@501: yield mark, event cmlenz@501: for subevent in self._inject(): cmlenz@501: yield subevent cmlenz@501: yield mark, event cmlenz@501: cmlenz@501: cmlenz@504: class PrependTransformation(InjectorTransformation): cmlenz@501: """Prepend content to the inside of selected elements.""" cmlenz@503: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ cmlenz@501: for mark, event in stream: cmlenz@501: yield mark, event cmlenz@502: if mark in (ENTER, OUTSIDE): cmlenz@501: for subevent in self._inject(): cmlenz@501: yield subevent cmlenz@501: cmlenz@501: cmlenz@504: class AppendTransformation(InjectorTransformation): cmlenz@501: """Append content after the content of selected elements.""" cmlenz@503: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ cmlenz@501: for mark, event in stream: cmlenz@501: yield mark, event cmlenz@502: if mark is ENTER: cmlenz@501: while True: cmlenz@501: mark, event = stream.next() cmlenz@502: if mark is EXIT: cmlenz@501: break cmlenz@501: yield mark, event cmlenz@501: for subevent in self._inject(): cmlenz@501: yield subevent cmlenz@501: yield mark, event cmlenz@501: cmlenz@501: athomas@517: class AttrTransformation(object): cmlenz@501: """Set an attribute on selected elements.""" cmlenz@503: cmlenz@503: def __init__(self, name, value): cmlenz@501: """Construct transform. cmlenz@501: cmlenz@503: :param name: name of the attribute that should be set cmlenz@503: :param value: the value to set cmlenz@501: """ cmlenz@503: self.name = name cmlenz@501: self.value = value cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@501: :param stream: The marked event stream to filter cmlenz@501: """ athomas@517: callable_value = callable(self.value) cmlenz@501: for mark, (kind, data, pos) in stream: cmlenz@502: if mark is ENTER: athomas@517: if callable_value: athomas@517: value = self.value(self.name, (kind, data, pos)) athomas@517: else: athomas@517: value = self.value athomas@517: if value is None: athomas@517: attrs = data[1] - [QName(self.name)] athomas@517: else: athomas@517: attrs = data[1] | [(QName(self.name), value)] athomas@517: data = (data[0], attrs) cmlenz@501: yield mark, (kind, data, pos) cmlenz@501: cmlenz@501: cmlenz@501: cmlenz@506: class StreamBuffer(Stream): cmlenz@506: """Stream event buffer used for cut and copy transformations.""" cmlenz@506: cmlenz@506: def __init__(self): cmlenz@506: """Create the buffer.""" cmlenz@506: Stream.__init__(self, []) cmlenz@506: cmlenz@506: def append(self, event): cmlenz@506: """Add an event to the buffer. athomas@517: cmlenz@506: :param event: the markup event to add cmlenz@506: """ cmlenz@506: self.events.append(event) cmlenz@506: cmlenz@506: def reset(self): cmlenz@506: """Reset the buffer so that it's empty.""" cmlenz@506: del self.events[:] cmlenz@506: cmlenz@506: cmlenz@504: class CopyTransformation(object): cmlenz@501: """Copy selected events into a buffer for later insertion.""" cmlenz@503: cmlenz@501: def __init__(self, buffer): cmlenz@506: """Create the copy transformation. cmlenz@501: cmlenz@506: :param buffer: the `StreamBuffer` in which the selection should be cmlenz@506: stored cmlenz@501: """ cmlenz@501: self.buffer = buffer cmlenz@501: cmlenz@501: def __call__(self, stream): cmlenz@506: """Apply the transformation to the marked stream. cmlenz@501: cmlenz@503: :param stream: the marked event stream to filter cmlenz@501: """ cmlenz@506: self.buffer.reset() cmlenz@501: stream = list(stream) cmlenz@501: for mark, event in stream: cmlenz@501: if mark: cmlenz@501: self.buffer.append(event) cmlenz@501: return stream cmlenz@501: cmlenz@501: athomas@519: class CutTransformation(object): cmlenz@501: """Cut selected events into a buffer for later insertion and remove the cmlenz@503: selection. cmlenz@503: """ cmlenz@503: athomas@519: def __init__(self, buffer): athomas@519: """Create the cut transformation. athomas@519: athomas@519: :param buffer: the `StreamBuffer` in which the selection should be athomas@519: stored athomas@519: """ athomas@519: self.buffer = buffer athomas@519: cmlenz@501: def __call__(self, stream): cmlenz@501: """Apply the transform filter to the marked stream. cmlenz@501: cmlenz@503: :param stream: the marked event stream to filter cmlenz@501: """ athomas@519: out_stream = [] athomas@519: attributes = None athomas@519: for mark, (kind, data, pos) in stream: athomas@519: if attributes: athomas@519: assert kind is START athomas@519: data = (data[0], data[1] - attributes) athomas@519: attributes = None athomas@519: if mark: athomas@519: # There is some magic here. ATTR marked events are pushed into athomas@519: # the stream *before* the START event they originated from. athomas@519: # This allows cut() to strip out the attributes from START athomas@519: # event as would be expected. athomas@519: if mark is ATTR: athomas@519: self.buffer.append((kind, data, pos)) athomas@519: attributes = [name for name, _ in data[1]] athomas@519: else: athomas@519: self.buffer.append((kind, data, pos)) athomas@519: else: athomas@519: out_stream.append((mark, (kind, data, pos))) athomas@519: return out_stream