cmlenz@501: # -*- coding: utf-8 -*-
cmlenz@501: #
cmlenz@531: # Copyright (C) 2007 Edgewall Software
cmlenz@501: # All rights reserved.
cmlenz@501: #
cmlenz@501: # This software is licensed as described in the file COPYING, which
cmlenz@501: # you should have received as part of this distribution. The terms
cmlenz@501: # are also available at http://genshi.edgewall.org/wiki/License.
cmlenz@501: #
cmlenz@501: # This software consists of voluntary contributions made by many
cmlenz@501: # individuals. For the exact contribution history, see the revision
cmlenz@501: # history and logs, available at http://genshi.edgewall.org/log/.
cmlenz@501:
cmlenz@503: """A filter for functional-style transformations of markup streams.
cmlenz@503:
cmlenz@503: The `Transformer` filter provides a variety of transformations that can be
cmlenz@503: applied to parts of streams that match given XPath expressions. These
cmlenz@503: transformations can be chained to achieve results that would be comparitively
cmlenz@503: tedious to achieve by writing stream filters by hand. The approach of chaining
cmlenz@503: node selection and transformation has been inspired by the `jQuery`_ Javascript
cmlenz@503: library.
cmlenz@503:
cmlenz@503: .. _`jQuery`: http://jquery.com/
cmlenz@502:
cmlenz@502: For example, the following transformation removes the ``
`` element from
cmlenz@502: the ```` of the input document:
cmlenz@502:
cmlenz@504: >>> from genshi.builder import tag
cmlenz@504: >>> html = HTML('''
cmlenz@504: ... Some Title
cmlenz@504: ...
cmlenz@504: ... Some body text.
cmlenz@504: ...
cmlenz@504: ... ''')
athomas@533: >>> print html | Transformer('body/em').map(unicode.upper, TEXT) \\
cmlenz@504: ... .unwrap().wrap(tag.u)
cmlenz@504:
cmlenz@504: Some Title
cmlenz@504:
cmlenz@504: Some BODY text.
cmlenz@504:
cmlenz@504:
cmlenz@502:
cmlenz@502: The ``Transformer`` support a large number of useful transformations out of the
cmlenz@502: box, but custom transformations can be added easily.
cmlenz@576:
cmlenz@576: :since: version 0.5
cmlenz@501: """
cmlenz@501:
athomas@533: import re
cmlenz@501: import sys
cmlenz@501:
cmlenz@501: from genshi.builder import Element
cmlenz@504: from genshi.core import Stream, Attrs, QName, TEXT, START, END, _ensure
cmlenz@504: from genshi.path import Path
cmlenz@501:
athomas@518: __all__ = ['Transformer', 'StreamBuffer', 'InjectorTransformation', 'ENTER',
athomas@518: 'EXIT', 'INSIDE', 'OUTSIDE']
cmlenz@501:
cmlenz@501:
cmlenz@501: class TransformMark(str):
cmlenz@501: """A mark on a transformation stream."""
cmlenz@501: __slots__ = []
cmlenz@501: _instances = {}
cmlenz@501:
cmlenz@501: def __new__(cls, val):
cmlenz@501: return cls._instances.setdefault(val, str.__new__(cls, val))
cmlenz@501:
cmlenz@501:
cmlenz@502: ENTER = TransformMark('ENTER')
athomas@515: """Stream augmentation mark indicating that a selected element is being
cmlenz@502: entered."""
cmlenz@502:
cmlenz@501: INSIDE = TransformMark('INSIDE')
cmlenz@502: """Stream augmentation mark indicating that processing is currently inside a
athomas@515: selected element."""
cmlenz@502:
cmlenz@501: OUTSIDE = TransformMark('OUTSIDE')
athomas@515: """Stream augmentation mark indicating that a match occurred outside a selected
athomas@515: element."""
cmlenz@502:
athomas@519: ATTR = TransformMark('ATTR')
athomas@519: """Stream augmentation mark indicating a selected element attribute."""
athomas@517:
cmlenz@502: EXIT = TransformMark('EXIT')
athomas@515: """Stream augmentation mark indicating that a selected element is being
cmlenz@502: exited."""
cmlenz@501:
cmlenz@501:
cmlenz@501: class Transformer(object):
cmlenz@501: """Stream filter that can apply a variety of different transformations to
cmlenz@501: a stream.
cmlenz@501:
cmlenz@501: This is achieved by selecting the events to be transformed using XPath,
cmlenz@501: then applying the transformations to the events matched by the path
cmlenz@501: expression. Each marked event is in the form (mark, (kind, data, pos)),
cmlenz@503: where mark can be any of `ENTER`, `INSIDE`, `EXIT`, `OUTSIDE`, or `None`.
cmlenz@501:
cmlenz@501: The first three marks match `START` and `END` events, and any events
cmlenz@501: contained `INSIDE` any selected XML/HTML element. A non-element match
cmlenz@501: outside a `START`/`END` container (e.g. ``text()``) will yield an `OUTSIDE`
cmlenz@501: mark.
cmlenz@501:
cmlenz@502: >>> html = HTML('Some Title'
cmlenz@502: ... 'Some body text.')
cmlenz@501:
cmlenz@503: Transformations act on selected stream events matching an XPath expression.
cmlenz@503: Here's an example of removing some markup (the title, in this case)
cmlenz@503: selected by an expression:
cmlenz@501:
cmlenz@503: >>> print html | Transformer('head/title').remove()
cmlenz@501: Some body text.
cmlenz@501:
cmlenz@503: Inserted content can be passed in the form of a string, or a markup event
cmlenz@503: stream, which includes streams generated programmatically via the
cmlenz@503: `builder` module:
cmlenz@501:
cmlenz@501: >>> from genshi.builder import tag
cmlenz@503: >>> print html | Transformer('body').prepend(tag.h1('Document Title'))
cmlenz@501: Some TitleDocument
cmlenz@501: Title
Some body text.
cmlenz@501:
cmlenz@501: Each XPath expression determines the set of tags that will be acted upon by
cmlenz@503: subsequent transformations. In this example we select the ```` text,
cmlenz@503: copy it into a buffer, then select the ```` element and paste the
cmlenz@503: copied text into the body as ```` enclosed text:
cmlenz@501:
cmlenz@506: >>> buffer = StreamBuffer()
cmlenz@503: >>> print html | Transformer('head/title/text()').copy(buffer) \\
athomas@514: ... .end().select('body').prepend(tag.h1(buffer))
cmlenz@501: Some TitleSome Title
Some
cmlenz@501: body text.
cmlenz@501:
cmlenz@501: Transformations can also be assigned and reused, although care must be
cmlenz@501: taken when using buffers, to ensure that buffers are cleared between
cmlenz@501: transforms:
cmlenz@501:
athomas@517: >>> emphasis = Transformer('body//em').attr('class', 'emphasis')
cmlenz@502: >>> print html | emphasis
cmlenz@501: Some TitleSome body text.
cmlenz@501: """
cmlenz@501:
cmlenz@504: __slots__ = ['transforms']
cmlenz@501:
athomas@578: def __init__(self, path='.'):
cmlenz@501: """Construct a new transformation filter.
cmlenz@501:
cmlenz@503: :param path: an XPath expression (as string) or a `Path` instance
cmlenz@501: """
athomas@575: self.transforms = [SelectTransformation(path)]
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@503: :param stream: the marked event stream to filter
cmlenz@501: :return: the transformed stream
cmlenz@501: :rtype: `Stream`
cmlenz@501: """
cmlenz@501: transforms = self._mark(stream)
cmlenz@501: for link in self.transforms:
cmlenz@501: transforms = link(transforms)
cmlenz@605: return Stream(self._unmark(transforms),
cmlenz@605: serializer=getattr(stream, 'serializer', None))
cmlenz@501:
athomas@533: def apply(self, function):
athomas@533: """Apply a transformation to the stream.
cmlenz@501:
cmlenz@501: Transformations can be chained, similar to stream filters. Any callable
cmlenz@501: accepting a marked stream can be used as a transform.
cmlenz@501:
cmlenz@501: As an example, here is a simple `TEXT` event upper-casing transform:
cmlenz@501:
cmlenz@501: >>> def upper(stream):
cmlenz@501: ... for mark, (kind, data, pos) in stream:
cmlenz@501: ... if mark and kind is TEXT:
cmlenz@501: ... yield mark, (kind, data.upper(), pos)
cmlenz@501: ... else:
cmlenz@501: ... yield mark, (kind, data, pos)
cmlenz@501: >>> short_stream = HTML('Some test text')
athomas@533: >>> print short_stream | Transformer('.//em/text()').apply(upper)
cmlenz@501: Some TEST text
cmlenz@501: """
cmlenz@504: transformer = Transformer()
cmlenz@504: transformer.transforms = self.transforms[:]
cmlenz@501: if isinstance(function, Transformer):
cmlenz@504: transformer.transforms.extend(function.transforms)
cmlenz@501: else:
cmlenz@504: transformer.transforms.append(function)
cmlenz@504: return transformer
cmlenz@501:
cmlenz@501: #{ Selection operations
cmlenz@501:
cmlenz@501: def select(self, path):
athomas@514: """Mark events matching the given XPath expression, within the current
athomas@514: selection.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some test text')
cmlenz@501: >>> print html | Transformer().select('.//em').trace()
cmlenz@501: (None, ('START', (QName(u'body'), Attrs()), (None, 1, 0)))
cmlenz@501: (None, ('TEXT', u'Some ', (None, 1, 6)))
cmlenz@502: ('ENTER', ('START', (QName(u'em'), Attrs()), (None, 1, 11)))
cmlenz@501: ('INSIDE', ('TEXT', u'test', (None, 1, 15)))
cmlenz@502: ('EXIT', ('END', QName(u'em'), (None, 1, 19)))
cmlenz@501: (None, ('TEXT', u' text', (None, 1, 24)))
cmlenz@501: (None, ('END', QName(u'body'), (None, 1, 29)))
cmlenz@501: Some test text
cmlenz@501:
cmlenz@503: :param path: an XPath expression (as string) or a `Path` instance
cmlenz@501: :return: the stream augmented by transformation marks
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(SelectTransformation(path))
cmlenz@501:
cmlenz@501: def invert(self):
cmlenz@501: """Invert selection so that marked events become unmarked, and vice
cmlenz@501: versa.
cmlenz@501:
cmlenz@501: Specificaly, all marks are converted to null marks, and all null marks
cmlenz@501: are converted to OUTSIDE marks.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some test text')
cmlenz@501: >>> print html | Transformer('//em').invert().trace()
cmlenz@501: ('OUTSIDE', ('START', (QName(u'body'), Attrs()), (None, 1, 0)))
cmlenz@501: ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6)))
cmlenz@501: (None, ('START', (QName(u'em'), Attrs()), (None, 1, 11)))
cmlenz@501: (None, ('TEXT', u'test', (None, 1, 15)))
cmlenz@501: (None, ('END', QName(u'em'), (None, 1, 19)))
cmlenz@501: ('OUTSIDE', ('TEXT', u' text', (None, 1, 24)))
cmlenz@501: ('OUTSIDE', ('END', QName(u'body'), (None, 1, 29)))
cmlenz@501: Some test text
cmlenz@501:
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(InvertTransformation())
cmlenz@501:
athomas@514: def end(self):
athomas@514: """End current selection, allowing all events to be selected.
athomas@514:
athomas@514: Example:
athomas@514:
athomas@514: >>> html = HTML('Some test text')
athomas@514: >>> print html | Transformer('//em').end().trace()
athomas@514: ('OUTSIDE', ('START', (QName(u'body'), Attrs()), (None, 1, 0)))
athomas@514: ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6)))
athomas@514: ('OUTSIDE', ('START', (QName(u'em'), Attrs()), (None, 1, 11)))
athomas@514: ('OUTSIDE', ('TEXT', u'test', (None, 1, 15)))
athomas@514: ('OUTSIDE', ('END', QName(u'em'), (None, 1, 19)))
athomas@514: ('OUTSIDE', ('TEXT', u' text', (None, 1, 24)))
athomas@514: ('OUTSIDE', ('END', QName(u'body'), (None, 1, 29)))
athomas@514: Some test text
athomas@514:
athomas@514: :return: the stream augmented by transformation marks
athomas@514: :rtype: `Transformer`
athomas@514: """
athomas@533: return self.apply(EndTransformation())
athomas@514:
cmlenz@501: #{ Deletion operations
cmlenz@501:
cmlenz@501: def empty(self):
cmlenz@501: """Empty selected elements of all content.
cmlenz@501:
cmlenz@501: Example:
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em').empty()
cmlenz@501: Some TitleSome
cmlenz@501: text.
cmlenz@501:
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(EmptyTransformation())
cmlenz@501:
cmlenz@501: def remove(self):
cmlenz@501: """Remove selection from the stream.
cmlenz@501:
cmlenz@501: Example:
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em').remove()
cmlenz@501: Some TitleSome
cmlenz@501: text.
cmlenz@501:
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(RemoveTransformation())
cmlenz@501:
cmlenz@501: #{ Direct element operations
cmlenz@501:
cmlenz@501: def unwrap(self):
cmlenz@504: """Remove outermost enclosing elements from selection.
cmlenz@501:
cmlenz@501: Example:
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em').unwrap()
cmlenz@501: Some TitleSome body
cmlenz@501: text.
cmlenz@501:
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(UnwrapTransformation())
cmlenz@501:
cmlenz@501: def wrap(self, element):
cmlenz@501: """Wrap selection in an element.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em').wrap('strong')
cmlenz@501: Some TitleSome
cmlenz@501: body text.
cmlenz@501:
cmlenz@504: :param element: either a tag name (as string) or an `Element` object
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(WrapTransformation(element))
cmlenz@501:
cmlenz@501: #{ Content insertion operations
cmlenz@501:
cmlenz@501: def replace(self, content):
cmlenz@501: """Replace selection with content.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//title/text()').replace('New Title')
cmlenz@501: New TitleSome body
cmlenz@501: text.
cmlenz@501:
cmlenz@501: :param content: Either an iterable of events or a string to insert.
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(ReplaceTransformation(content))
cmlenz@501:
cmlenz@501: def before(self, content):
cmlenz@501: """Insert content before selection.
cmlenz@501:
cmlenz@501: In this example we insert the word 'emphasised' before the opening
cmlenz@501: tag:
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em').before('emphasised ')
cmlenz@501: Some TitleSome emphasised
cmlenz@501: body text.
cmlenz@501:
cmlenz@501: :param content: Either an iterable of events or a string to insert.
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(BeforeTransformation(content))
cmlenz@501:
cmlenz@501: def after(self, content):
cmlenz@501: """Insert content after selection.
cmlenz@501:
cmlenz@501: Here, we insert some text after the closing tag:
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em').after(' rock')
cmlenz@501: Some TitleSome body
cmlenz@501: rock text.
cmlenz@501:
cmlenz@501: :param content: Either an iterable of events or a string to insert.
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(AfterTransformation(content))
cmlenz@501:
cmlenz@501: def prepend(self, content):
cmlenz@502: """Insert content after the ENTER event of the selection.
cmlenz@501:
cmlenz@501: Inserting some new text at the start of the :
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//body').prepend('Some new body text. ')
cmlenz@501: Some TitleSome new body text.
cmlenz@501: Some body text.
cmlenz@501:
cmlenz@501: :param content: Either an iterable of events or a string to insert.
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(PrependTransformation(content))
cmlenz@501:
cmlenz@501: def append(self, content):
cmlenz@501: """Insert content before the END event of the selection.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//body').append(' Some new body text.')
cmlenz@501: Some TitleSome body
cmlenz@501: text. Some new body text.
cmlenz@501:
cmlenz@501: :param content: Either an iterable of events or a string to insert.
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(AppendTransformation(content))
cmlenz@501:
cmlenz@501: #{ Attribute manipulation
cmlenz@501:
athomas@517: def attr(self, name, value):
athomas@517: """Add, replace or delete an attribute on selected elements.
cmlenz@501:
athomas@517: If `value` evaulates to `None` the attribute will be deleted from the
athomas@517: element:
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
athomas@517: ... 'Some body text.'
athomas@517: ... '')
athomas@517: >>> print html | Transformer('body/em').attr('class', None)
athomas@517: Some TitleSome body
athomas@517: text.
athomas@517:
athomas@517: Otherwise the attribute will be set to `value`:
athomas@517:
athomas@517: >>> print html | Transformer('body/em').attr('class', 'emphasis')
cmlenz@501: Some TitleSome body text.
athomas@517:
athomas@517: If `value` is a callable it will be called with the attribute name and
athomas@517: the `START` event for the matching element. Its return value will then
athomas@517: be used to set the attribute:
athomas@517:
athomas@517: >>> def print_attr(name, event):
athomas@517: ... attrs = event[1][1]
athomas@517: ... print attrs
athomas@517: ... return attrs.get(name)
athomas@517: >>> print html | Transformer('body/em').attr('class', print_attr)
athomas@517: Attrs([(QName(u'class'), u'before')])
athomas@517: Attrs()
athomas@517: Some TitleSome body text.
cmlenz@501:
cmlenz@503: :param name: the name of the attribute
athomas@517: :param value: the value that should be set for the attribute.
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(AttrTransformation(name, value))
cmlenz@501:
cmlenz@501: #{ Buffer operations
cmlenz@501:
cmlenz@501: def copy(self, buffer):
cmlenz@501: """Copy selection into buffer.
cmlenz@501:
cmlenz@501: >>> from genshi.builder import tag
cmlenz@506: >>> buffer = StreamBuffer()
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
athomas@509: >>> print html | Transformer('title/text()').copy(buffer) \\
athomas@514: ... .end().select('body').prepend(tag.h1(buffer))
athomas@509: Some TitleSome
athomas@509: Title
Some body text.
athomas@509:
athomas@509: To ensure that a transformation can be reused deterministically, the
athomas@509: contents of ``buffer`` is replaced by the ``copy()`` operation:
athomas@509:
athomas@509: >>> print buffer
athomas@509: Some Title
athomas@509: >>> print html | Transformer('head/title/text()').copy(buffer) \\
athomas@514: ... .end().select('body/em').copy(buffer).end().select('body') \\
athomas@509: ... .prepend(tag.h1(buffer))
athomas@509: Some
athomas@509: Titlebody
Some body
athomas@509: text.
athomas@509: >>> print buffer
athomas@509: body
athomas@509:
athomas@519: Element attributes can also be copied for later use:
athomas@519:
athomas@519: >>> html = HTML('Some Title'
athomas@519: ... 'Some body'
athomas@519: ... 'text.')
athomas@519: >>> buffer = StreamBuffer()
athomas@519: >>> def apply_attr(name, entry):
athomas@519: ... return list(buffer)[0][1][1].get('class')
athomas@519: >>> print html | Transformer('body/em[@class]/@class').copy(buffer) \\
athomas@519: ... .end().select('body/em[not(@class)]').attr('class', apply_attr)
athomas@519: Some TitleSome bodytext.
athomas@519:
cmlenz@501:
cmlenz@507: :param buffer: the `StreamBuffer` in which the selection should be
cmlenz@507: stored
cmlenz@501: :rtype: `Transformer`
cmlenz@501: :note: this transformation will buffer the entire input stream
cmlenz@501: """
athomas@533: return self.apply(CopyTransformation(buffer))
cmlenz@501:
cmlenz@501: def cut(self, buffer):
cmlenz@501: """Copy selection into buffer and remove the selection from the stream.
cmlenz@501:
cmlenz@501: >>> from genshi.builder import tag
cmlenz@506: >>> buffer = StreamBuffer()
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
cmlenz@501: >>> print html | Transformer('.//em/text()').cut(buffer) \\
athomas@514: ... .end().select('.//em').after(tag.h1(buffer))
cmlenz@501: Some TitleSome
cmlenz@501: body
text.
cmlenz@501:
cmlenz@507: :param buffer: the `StreamBuffer` in which the selection should be
cmlenz@507: stored
cmlenz@501: :rtype: `Transformer`
cmlenz@501: :note: this transformation will buffer the entire input stream
cmlenz@501: """
athomas@533: return self.apply(CutTransformation(buffer))
cmlenz@501:
cmlenz@501: #{ Miscellaneous operations
cmlenz@501:
athomas@533: def filter(self, filter):
athomas@533: """Apply a normal stream filter to the selection. The filter is called
athomas@533: once for each contiguous block of marked events.
athomas@533:
athomas@533: >>> from genshi.filters.html import HTMLSanitizer
athomas@533: >>> html = HTML('Some text and some more text')
athomas@533: >>> print html | Transformer('body/*').filter(HTMLSanitizer())
athomas@533: Some text and some more text
athomas@533:
athomas@533: :param filter: The stream filter to apply.
athomas@533: :rtype: `Transformer`
athomas@533: """
athomas@533: return self.apply(FilterTransformation(filter))
athomas@533:
athomas@533: def map(self, function, kind):
athomas@533: """Applies a function to the ``data`` element of events of ``kind`` in
cmlenz@501: the selection.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some Title'
cmlenz@501: ... 'Some body text.')
athomas@533: >>> print html | Transformer('head/title').map(unicode.upper, TEXT)
cmlenz@501: SOME TITLESome body
cmlenz@501: text.
cmlenz@501:
cmlenz@501: :param function: the function to apply
cmlenz@501: :param kind: the kind of event the function should be applied to
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(MapTransformation(function, kind))
athomas@533:
athomas@533: def substitute(self, pattern, replace, count=1):
athomas@533: """Replace text matching a regular expression.
athomas@533:
athomas@533: Refer to the documentation for ``re.sub()`` for details.
athomas@533:
athomas@533: >>> html = HTML('Some text, some more text and '
athomas@533: ... 'some bold text')
athomas@533: >>> print html | Transformer('body').substitute('(?i)some', 'SOME')
athomas@533: SOME text, some more text and SOME bold text
athomas@533:
athomas@533: :param pattern: A regular expression object or string.
athomas@533: :param replace: Replacement pattern.
athomas@533: :param count: Number of replacements to make in each text fragment.
athomas@533: :rtype: `Transformer`
athomas@533: """
athomas@533: return self.apply(SubstituteTransformation(pattern, replace, count))
cmlenz@501:
athomas@578: def rename(self, name):
athomas@578: """Rename matching elements.
athomas@578:
athomas@578: >>> html = HTML('Some text, some more text and '
athomas@578: ... 'some bold text')
athomas@578: >>> print html | Transformer('body/b').rename('strong')
athomas@578: Some text, some more text and some bold text
athomas@578: """
athomas@578: return self.apply(RenameTransformation(name))
athomas@578:
cmlenz@501: def trace(self, prefix='', fileobj=None):
cmlenz@501: """Print events as they pass through the transform.
cmlenz@501:
cmlenz@501: >>> html = HTML('Some test text')
cmlenz@501: >>> print html | Transformer('em').trace()
cmlenz@501: (None, ('START', (QName(u'body'), Attrs()), (None, 1, 0)))
cmlenz@501: (None, ('TEXT', u'Some ', (None, 1, 6)))
cmlenz@502: ('ENTER', ('START', (QName(u'em'), Attrs()), (None, 1, 11)))
cmlenz@501: ('INSIDE', ('TEXT', u'test', (None, 1, 15)))
cmlenz@502: ('EXIT', ('END', QName(u'em'), (None, 1, 19)))
cmlenz@501: (None, ('TEXT', u' text', (None, 1, 24)))
cmlenz@501: (None, ('END', QName(u'body'), (None, 1, 29)))
cmlenz@501: Some test text
cmlenz@501:
cmlenz@501: :param prefix: a string to prefix each event with in the output
cmlenz@501: :param fileobj: the writable file-like object to write to; defaults to
cmlenz@501: the standard output stream
cmlenz@501: :rtype: `Transformer`
cmlenz@501: """
athomas@533: return self.apply(TraceTransformation(prefix, fileobj=fileobj))
cmlenz@501:
cmlenz@501: # Internal methods
cmlenz@501:
cmlenz@501: def _mark(self, stream):
cmlenz@501: for event in stream:
athomas@514: yield OUTSIDE, event
cmlenz@501:
cmlenz@501: def _unmark(self, stream):
cmlenz@501: for mark, event in stream:
athomas@519: if event[0] is not None:
athomas@519: yield event
cmlenz@501:
cmlenz@501:
cmlenz@504: class SelectTransformation(object):
cmlenz@501: """Select and mark events that match an XPath expression."""
cmlenz@503:
cmlenz@501: def __init__(self, path):
cmlenz@501: """Create selection.
cmlenz@501:
cmlenz@503: :param path: an XPath expression (as string) or a `Path` object
cmlenz@501: """
cmlenz@503: if not isinstance(path, Path):
cmlenz@503: path = Path(path)
cmlenz@503: self.path = path
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@503: :param stream: the marked event stream to filter
cmlenz@501: """
cmlenz@501: namespaces = {}
cmlenz@501: variables = {}
cmlenz@501: test = self.path.test()
cmlenz@501: stream = iter(stream)
cmlenz@501: for mark, event in stream:
athomas@514: if mark is None:
athomas@514: yield mark, event
athomas@514: continue
cmlenz@501: result = test(event, {}, {})
athomas@519: # XXX This is effectively genshi.core._ensure() for transform
athomas@519: # streams.
cmlenz@501: if result is True:
cmlenz@501: if event[0] is START:
cmlenz@502: yield ENTER, event
cmlenz@501: depth = 1
cmlenz@501: while depth > 0:
cmlenz@501: mark, subevent = stream.next()
cmlenz@501: if subevent[0] is START:
cmlenz@501: depth += 1
cmlenz@501: elif subevent[0] is END:
cmlenz@501: depth -= 1
cmlenz@501: if depth == 0:
cmlenz@502: yield EXIT, subevent
cmlenz@501: else:
cmlenz@501: yield INSIDE, subevent
cmlenz@501: test(subevent, {}, {}, updateonly=True)
cmlenz@501: else:
cmlenz@501: yield OUTSIDE, event
athomas@519: elif isinstance(result, Attrs):
athomas@519: # XXX Selected *attributes* are given a "kind" of None to
athomas@519: # indicate they are not really part of the stream.
athomas@519: yield ATTR, (None, (QName(event[1][0] + '@*'), result), event[2])
athomas@519: yield None, event
cmlenz@501: elif result:
athomas@519: yield None, (TEXT, unicode(result), (None, -1, -1))
cmlenz@501: else:
cmlenz@501: yield None, event
cmlenz@501:
cmlenz@501:
cmlenz@504: class InvertTransformation(object):
cmlenz@501: """Invert selection so that marked events become unmarked, and vice versa.
cmlenz@501:
cmlenz@501: Specificaly, all input marks are converted to null marks, and all input
cmlenz@501: null marks are converted to OUTSIDE marks.
cmlenz@501: """
cmlenz@501:
cmlenz@504: def __call__(self, stream):
cmlenz@504: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@504: :param stream: the marked event stream to filter
cmlenz@504: """
cmlenz@504: for mark, event in stream:
cmlenz@504: if mark:
cmlenz@504: yield None, event
cmlenz@504: else:
cmlenz@504: yield OUTSIDE, event
cmlenz@501:
cmlenz@501:
athomas@514: class EndTransformation(object):
athomas@514: """End the current selection."""
athomas@514:
athomas@514: def __call__(self, stream):
athomas@514: """Apply the transform filter to the marked stream.
athomas@514:
athomas@514: :param stream: the marked event stream to filter
athomas@514: """
athomas@514: for mark, event in stream:
athomas@514: yield OUTSIDE, event
athomas@514:
athomas@514:
cmlenz@504: class EmptyTransformation(object):
cmlenz@504: """Empty selected elements of all content."""
cmlenz@504:
cmlenz@504: def __call__(self, stream):
cmlenz@504: """Apply the transform filter to the marked stream.
cmlenz@504:
cmlenz@504: :param stream: the marked event stream to filter
cmlenz@504: """
cmlenz@504: for mark, event in stream:
cmlenz@504: if mark not in (INSIDE, OUTSIDE):
cmlenz@504: yield mark, event
cmlenz@504:
cmlenz@504:
cmlenz@504: class RemoveTransformation(object):
cmlenz@504: """Remove selection from the stream."""
cmlenz@504:
cmlenz@504: def __call__(self, stream):
cmlenz@504: """Apply the transform filter to the marked stream.
cmlenz@504:
cmlenz@504: :param stream: the marked event stream to filter
cmlenz@504: """
cmlenz@504: for mark, event in stream:
cmlenz@504: if mark is None:
cmlenz@504: yield mark, event
cmlenz@504:
cmlenz@504:
cmlenz@504: class UnwrapTransformation(object):
cmlenz@504: """Remove outtermost enclosing elements from selection."""
cmlenz@504:
cmlenz@504: def __call__(self, stream):
cmlenz@504: """Apply the transform filter to the marked stream.
cmlenz@504:
cmlenz@504: :param stream: the marked event stream to filter
cmlenz@504: """
cmlenz@504: for mark, event in stream:
cmlenz@504: if mark not in (ENTER, EXIT):
cmlenz@504: yield mark, event
cmlenz@504:
cmlenz@504:
cmlenz@504: class WrapTransformation(object):
cmlenz@501: """Wrap selection in an element."""
cmlenz@501:
cmlenz@501: def __init__(self, element):
cmlenz@501: if isinstance(element, Element):
cmlenz@501: self.element = element
cmlenz@501: else:
cmlenz@501: self.element = Element(element)
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@501: for mark, event in stream:
cmlenz@501: if mark:
cmlenz@501: element = list(self.element.generate())
cmlenz@501: for prefix in element[:-1]:
cmlenz@501: yield None, prefix
cmlenz@501: yield mark, event
cmlenz@501: while True:
athomas@575: try:
athomas@575: mark, event = stream.next()
athomas@575: except StopIteration:
athomas@575: yield None, element[-1]
cmlenz@501: if not mark:
cmlenz@501: break
cmlenz@501: yield mark, event
cmlenz@501: yield None, element[-1]
cmlenz@501: yield mark, event
cmlenz@501: else:
cmlenz@501: yield mark, event
cmlenz@501:
cmlenz@501:
cmlenz@504: class TraceTransformation(object):
cmlenz@501: """Print events as they pass through the transform."""
cmlenz@501:
cmlenz@501: def __init__(self, prefix='', fileobj=None):
cmlenz@501: """Trace constructor.
cmlenz@501:
cmlenz@501: :param prefix: text to prefix each traced line with.
cmlenz@501: :param fileobj: the writable file-like object to write to
cmlenz@501: """
cmlenz@501: self.prefix = prefix
cmlenz@501: self.fileobj = fileobj or sys.stdout
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@503: :param stream: the marked event stream to filter
cmlenz@501: """
cmlenz@501: for event in stream:
cmlenz@501: print>>self.fileobj, self.prefix + str(event)
cmlenz@501: yield event
cmlenz@501:
cmlenz@501:
athomas@533: class FilterTransformation(object):
athomas@533: """Apply a normal stream filter to the selection. The filter is called once
athomas@533: for each contiguous block of marked events."""
athomas@533:
athomas@533: def __init__(self, filter):
athomas@533: """Create the transform.
athomas@533:
athomas@533: :param filter: The stream filter to apply.
athomas@533: """
athomas@533: self.filter = filter
athomas@533:
athomas@533: def __call__(self, stream):
athomas@533: """Apply the transform filter to the marked stream.
athomas@533:
athomas@533: :param stream: The marked event stream to filter
athomas@533: """
athomas@533: def flush(queue):
athomas@533: if queue:
athomas@533: for event in self.filter(queue):
athomas@533: yield OUTSIDE, event
athomas@533: del queue[:]
athomas@533:
athomas@533: queue = []
athomas@533: for mark, event in stream:
athomas@533: if mark:
athomas@533: queue.append(event)
athomas@533: else:
athomas@577: for queue_event in flush(queue):
athomas@577: yield queue_event
athomas@533: yield None, event
athomas@533: for event in flush(queue):
athomas@533: yield event
athomas@533:
athomas@533:
athomas@533: class MapTransformation(object):
cmlenz@501: """Apply a function to the `data` element of events of ``kind`` in the
cmlenz@501: selection.
cmlenz@501: """
cmlenz@501:
cmlenz@501: def __init__(self, function, kind):
cmlenz@501: """Create the transform.
cmlenz@501:
cmlenz@503: :param function: the function to apply; the function must take one
cmlenz@503: argument, the `data` element of each selected event
cmlenz@503: :param kind: the stream event ``kind`` to apply the `function` to
cmlenz@501: """
cmlenz@501: self.function = function
cmlenz@501: self.kind = kind
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
cmlenz@501: for mark, (kind, data, pos) in stream:
cmlenz@504: if mark and self.kind in (None, kind):
cmlenz@501: yield mark, (kind, self.function(data), pos)
cmlenz@501: else:
cmlenz@501: yield mark, (kind, data, pos)
cmlenz@501:
cmlenz@501:
athomas@533: class SubstituteTransformation(object):
athomas@533: """Replace text matching a regular expression.
athomas@533:
athomas@533: Refer to the documentation for ``re.sub()`` for details.
athomas@533: """
athomas@533: def __init__(self, pattern, replace, count=1):
athomas@533: """Create the transform.
athomas@533:
athomas@533: :param pattern: A regular expression object, or string.
athomas@533: :param replace: Replacement pattern.
athomas@533: :param count: Number of replacements to make in each text fragment.
athomas@533: """
athomas@533: if isinstance(pattern, basestring):
athomas@533: self.pattern = re.compile(pattern)
athomas@533: else:
athomas@533: self.pattern = pattern
athomas@533: self.count = count
athomas@533: self.replace = replace
athomas@533:
athomas@533: def __call__(self, stream):
athomas@533: """Apply the transform filter to the marked stream.
athomas@533:
athomas@533: :param stream: The marked event stream to filter
athomas@533: """
athomas@533: for mark, (kind, data, pos) in stream:
athomas@533: if kind is TEXT:
athomas@533: data = self.pattern.sub(self.replace, data, self.count)
athomas@533: yield mark, (kind, data, pos)
athomas@533:
athomas@533:
athomas@578: class RenameTransformation(object):
athomas@578: """Rename matching elements."""
athomas@578: def __init__(self, name):
athomas@578: """Create the transform.
athomas@578:
athomas@578: :param name: New element name.
athomas@578: """
athomas@578: self.name = QName(name)
athomas@578:
athomas@578: def __call__(self, stream):
athomas@578: """Apply the transform filter to the marked stream.
athomas@578:
athomas@578: :param stream: The marked event stream to filter
athomas@578: """
athomas@578: for mark, (kind, data, pos) in stream:
athomas@578: if mark is ENTER:
athomas@578: data = self.name, data[1]
athomas@578: elif mark is EXIT:
athomas@578: data = self.name
athomas@578: yield mark, (kind, data, pos)
athomas@578:
athomas@578:
cmlenz@504: class InjectorTransformation(object):
cmlenz@501: """Abstract base class for transformations that inject content into a
cmlenz@501: stream.
cmlenz@501:
cmlenz@504: >>> class Top(InjectorTransformation):
cmlenz@501: ... def __call__(self, stream):
cmlenz@501: ... for event in self._inject():
cmlenz@501: ... yield event
cmlenz@501: ... for event in stream:
cmlenz@501: ... yield event
cmlenz@501: >>> html = HTML('Some test text')
athomas@533: >>> print html | Transformer('.//em').apply(Top('Prefix '))
cmlenz@501: Prefix Some test text
cmlenz@501: """
cmlenz@501: def __init__(self, content):
cmlenz@501: """Create a new injector.
cmlenz@501:
cmlenz@501: :param content: An iterable of Genshi stream events, or a string to be
cmlenz@501: injected.
cmlenz@501: """
cmlenz@501: self.content = content
cmlenz@501:
cmlenz@501: def _inject(self):
cmlenz@504: for event in _ensure(self.content):
cmlenz@504: yield None, event
cmlenz@501:
cmlenz@501:
cmlenz@504: class ReplaceTransformation(InjectorTransformation):
cmlenz@501: """Replace selection with content."""
cmlenz@503:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
cmlenz@501: for mark, event in stream:
cmlenz@501: if mark is not None:
cmlenz@501: for subevent in self._inject():
cmlenz@501: yield subevent
cmlenz@501: while True:
cmlenz@501: mark, event = stream.next()
cmlenz@501: if mark is None:
cmlenz@501: yield mark, event
cmlenz@501: break
cmlenz@501: else:
cmlenz@501: yield mark, event
cmlenz@501:
cmlenz@501:
cmlenz@504: class BeforeTransformation(InjectorTransformation):
cmlenz@501: """Insert content before selection."""
cmlenz@503:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
cmlenz@501: for mark, event in stream:
athomas@575: if mark is not None:
cmlenz@501: for subevent in self._inject():
cmlenz@501: yield subevent
athomas@575: yield mark, event
athomas@575: while True:
athomas@575: mark, event = stream.next()
athomas@575: if not mark:
athomas@575: break
athomas@575: yield mark, event
cmlenz@501: yield mark, event
cmlenz@501:
cmlenz@501:
cmlenz@504: class AfterTransformation(InjectorTransformation):
cmlenz@501: """Insert content after selection."""
cmlenz@503:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
cmlenz@501: for mark, event in stream:
cmlenz@501: yield mark, event
cmlenz@501: if mark:
cmlenz@501: while True:
athomas@575: try:
athomas@575: mark, event = stream.next()
athomas@575: except StopIteration:
athomas@575: break
cmlenz@501: if not mark:
cmlenz@501: break
cmlenz@501: yield mark, event
cmlenz@501: for subevent in self._inject():
cmlenz@501: yield subevent
cmlenz@501: yield mark, event
cmlenz@501:
cmlenz@501:
cmlenz@504: class PrependTransformation(InjectorTransformation):
cmlenz@501: """Prepend content to the inside of selected elements."""
cmlenz@503:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
cmlenz@501: for mark, event in stream:
cmlenz@501: yield mark, event
cmlenz@502: if mark in (ENTER, OUTSIDE):
cmlenz@501: for subevent in self._inject():
cmlenz@501: yield subevent
cmlenz@501:
cmlenz@501:
cmlenz@504: class AppendTransformation(InjectorTransformation):
cmlenz@501: """Append content after the content of selected elements."""
cmlenz@503:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
cmlenz@501: for mark, event in stream:
cmlenz@501: yield mark, event
cmlenz@502: if mark is ENTER:
cmlenz@501: while True:
cmlenz@501: mark, event = stream.next()
cmlenz@502: if mark is EXIT:
cmlenz@501: break
cmlenz@501: yield mark, event
cmlenz@501: for subevent in self._inject():
cmlenz@501: yield subevent
cmlenz@501: yield mark, event
cmlenz@501:
cmlenz@501:
athomas@517: class AttrTransformation(object):
cmlenz@501: """Set an attribute on selected elements."""
cmlenz@503:
cmlenz@503: def __init__(self, name, value):
cmlenz@501: """Construct transform.
cmlenz@501:
cmlenz@503: :param name: name of the attribute that should be set
cmlenz@503: :param value: the value to set
cmlenz@501: """
cmlenz@503: self.name = name
cmlenz@501: self.value = value
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@501: :param stream: The marked event stream to filter
cmlenz@501: """
athomas@517: callable_value = callable(self.value)
cmlenz@501: for mark, (kind, data, pos) in stream:
cmlenz@502: if mark is ENTER:
athomas@517: if callable_value:
athomas@517: value = self.value(self.name, (kind, data, pos))
athomas@517: else:
athomas@517: value = self.value
athomas@517: if value is None:
athomas@517: attrs = data[1] - [QName(self.name)]
athomas@517: else:
athomas@517: attrs = data[1] | [(QName(self.name), value)]
athomas@517: data = (data[0], attrs)
cmlenz@501: yield mark, (kind, data, pos)
cmlenz@501:
cmlenz@501:
cmlenz@501:
cmlenz@506: class StreamBuffer(Stream):
cmlenz@506: """Stream event buffer used for cut and copy transformations."""
cmlenz@506:
cmlenz@506: def __init__(self):
cmlenz@506: """Create the buffer."""
cmlenz@506: Stream.__init__(self, [])
cmlenz@506:
cmlenz@506: def append(self, event):
cmlenz@506: """Add an event to the buffer.
athomas@517:
cmlenz@506: :param event: the markup event to add
cmlenz@506: """
cmlenz@506: self.events.append(event)
cmlenz@506:
cmlenz@506: def reset(self):
cmlenz@506: """Reset the buffer so that it's empty."""
cmlenz@506: del self.events[:]
cmlenz@506:
cmlenz@506:
cmlenz@504: class CopyTransformation(object):
cmlenz@501: """Copy selected events into a buffer for later insertion."""
cmlenz@503:
cmlenz@501: def __init__(self, buffer):
cmlenz@506: """Create the copy transformation.
cmlenz@501:
cmlenz@506: :param buffer: the `StreamBuffer` in which the selection should be
cmlenz@506: stored
cmlenz@501: """
cmlenz@501: self.buffer = buffer
cmlenz@501:
cmlenz@501: def __call__(self, stream):
cmlenz@506: """Apply the transformation to the marked stream.
cmlenz@501:
cmlenz@503: :param stream: the marked event stream to filter
cmlenz@501: """
cmlenz@506: self.buffer.reset()
cmlenz@501: stream = list(stream)
cmlenz@501: for mark, event in stream:
cmlenz@501: if mark:
cmlenz@501: self.buffer.append(event)
cmlenz@501: return stream
cmlenz@501:
cmlenz@501:
athomas@519: class CutTransformation(object):
cmlenz@501: """Cut selected events into a buffer for later insertion and remove the
cmlenz@503: selection.
cmlenz@503: """
cmlenz@503:
athomas@519: def __init__(self, buffer):
athomas@519: """Create the cut transformation.
athomas@519:
athomas@519: :param buffer: the `StreamBuffer` in which the selection should be
athomas@519: stored
athomas@519: """
athomas@519: self.buffer = buffer
athomas@519:
cmlenz@501: def __call__(self, stream):
cmlenz@501: """Apply the transform filter to the marked stream.
cmlenz@501:
cmlenz@503: :param stream: the marked event stream to filter
cmlenz@501: """
athomas@519: out_stream = []
athomas@519: attributes = None
athomas@519: for mark, (kind, data, pos) in stream:
athomas@519: if attributes:
athomas@519: assert kind is START
athomas@519: data = (data[0], data[1] - attributes)
athomas@519: attributes = None
athomas@519: if mark:
athomas@519: # There is some magic here. ATTR marked events are pushed into
athomas@519: # the stream *before* the START event they originated from.
athomas@519: # This allows cut() to strip out the attributes from START
athomas@519: # event as would be expected.
athomas@519: if mark is ATTR:
athomas@519: self.buffer.append((kind, data, pos))
athomas@519: attributes = [name for name, _ in data[1]]
athomas@519: else:
athomas@519: self.buffer.append((kind, data, pos))
athomas@519: else:
athomas@519: out_stream.append((mark, (kind, data, pos)))
athomas@519: return out_stream