# HG changeset patch # User cmlenz # Date 1176462156 0 # Node ID 90f5908cd10ac128851b52f335a835b6496eba39 # Parent 906b346513b6f0c93c910312a28e8b63529ff6f7 Add basic I18n/L10n functionality, based on GenshiRecipes/Localization. diff --git a/ChangeLog b/ChangeLog --- a/ChangeLog +++ b/ChangeLog @@ -53,11 +53,16 @@ functions were previously only available when using Genshi via the template engine plugin (for compatibility with Kid). * `style` attributes are no longer allowed by the `HTMLSanitizer` by default. - If it is explicitly added to the set of safe attributes, and unicode escapes - in the attribute value are handled correctly. + If they are explicitly added to the set of safe attributes, any unicode + escapes in the attribute value are now handled properly. * Namespace declarations on conditional elements (for example using a `py:if` directive`) are no longer moved to the following element when the element originally carrying the declaration is removed from the stream (ticket #107). + * Added basic built-in support for internationalizing templates by providing + a new `Translator` class that can both extract localizable strings from a + stream, and replace those strings with their localizations at render time. + The code for this was largely taken from previous work done by Matt Good + and David Fraser. Version 0.3.6 diff --git a/genshi/filters/__init__.py b/genshi/filters/__init__.py new file mode 100644 --- /dev/null +++ b/genshi/filters/__init__.py @@ -0,0 +1,19 @@ +# -*- coding: utf-8 -*- +# +# Copyright (C) 2007 Edgewall Software +# All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://genshi.edgewall.org/wiki/License. +# +# This software consists of voluntary contributions made by many +# individuals. For the exact contribution history, see the revision +# history and logs, available at http://genshi.edgewall.org/log/. + +"""Implementation of a number of stream filters.""" + +from genshi.filters.html import HTMLFormFiller, HTMLSanitizer +from genshi.filters.i18n import Translator + +__docformat__ = 'restructuredtext en' diff --git a/genshi/filters.py b/genshi/filters/html.py rename from genshi/filters.py rename to genshi/filters/html.py diff --git a/genshi/filters/i18n.py b/genshi/filters/i18n.py new file mode 100644 --- /dev/null +++ b/genshi/filters/i18n.py @@ -0,0 +1,231 @@ +"""Utilities for internationalization and localization of templates.""" + +try: + frozenset +except NameError: + from sets import ImmutableSet as frozenset +from gettext import gettext +from opcode import opmap +import re + +from genshi.core import Attrs, START, END, TEXT +from genshi.template.base import Template, EXPR, SUB +from genshi.template.markup import EXEC + +LOAD_NAME = chr(opmap['LOAD_NAME']) +LOAD_CONST = chr(opmap['LOAD_CONST']) +CALL_FUNCTION = chr(opmap['CALL_FUNCTION']) +BINARY_ADD = chr(opmap['BINARY_ADD']) + + +class Translator(object): + """Can extract and translate localizable strings from markup streams and + templates + + For example, assume the followng template: + + >>> from genshi.template import MarkupTemplate + >>> tmpl = MarkupTemplate(''' + ... + ... Example + ... + ... + ...

Example

+ ...

${_("Hello, %(name)s") % dict(name=username)}

+ ... + ... ''', filename='example.html') + + For demonstration, we define a dummy ``gettext``-style function with a + hard-coded translation table, and pass that to the `Translator` initializer: + + >>> def pseudo_gettext(string): + ... return { + ... 'Example': 'Beispiel', + ... 'Hello, %(name)s': 'Hallo, %(name)s' + ... }[string] + >>> translator = Translator(pseudo_gettext) + + Next, the translator needs to be prepended to any already defined filters + on the template: + + >>> tmpl.filters.insert(0, translator) + + When generating the template output, our hard-coded translations should be + applied as expected: + + >>> print tmpl.generate(username='Hans', _=pseudo_gettext) + + + Beispiel + + +

Beispiel

+

Hallo, Hans

+ + + """ + + IGNORE_TAGS = frozenset(['script', 'style']) + INCLUDE_ATTRS = frozenset(['title', 'alt']) + + def __init__(self, translate=gettext, ignore_tags=IGNORE_TAGS, + include_attrs=INCLUDE_ATTRS): + """Initialize the translator. + + :param translate: the translation function, for example ``gettext`` or + ``ugettext``. + :param ignore_tags: a set of tag names that should not be localized + :param include_attrs: a set of attribute names should be localized + """ + self.gettext = translate + self.ignore_tags = ignore_tags + self.include_attrs = include_attrs + + def __call__(self, stream, ctxt=None, search_text=True): + skip = 0 + + for kind, data, pos in stream: + + # skip chunks that should not be localized + if skip: + if kind is START: + tag, attrs = data + tag = tag.localname + if tag.localname in self.ignore_tags: + skip += 1 + elif kind is END: + if tag.localname in self.ignore_tags: + skip -= 1 + yield kind, data, pos + continue + + # handle different events that can be localized + if kind is START: + tag, attrs = data + if tag.localname in self.ignore_tags: + skip += 1 + yield kind, data, pos + continue + + new_attrs = list(attrs) + changed = False + for name, value in attrs: + if name in include_attrs: + if isinstance(value, basestring): + newval = ugettext(value) + else: + newval = list(self(value, ctxt, search_text=name in self.include_attrs)) + if newval != value: + value = new_val + changed = True + new_attrs.append((name, value)) + if changed: + attrs = new_attrs + + yield kind, (tag, attrs), pos + + elif kind is TEXT: + text = data.strip() + if text: + data = data.replace(text, self.gettext(text)) + yield kind, data, pos + + elif kind is SUB: + subkind, substream = data + new_substream = list(self(substream, ctxt)) + yield kind, (subkind, new_substream), pos + + else: + yield kind, data, pos + + def extract(self, stream, gettext_functions=('_', 'gettext', 'ngettext')): + """Extract localizable strings from the given template stream. + + For every string found, this function yields a ``(lineno, message)`` + tuple. + + :param stream: the event stream to extract strings from; can be a + regular stream or a template stream + + >>> from genshi.template import MarkupTemplate + >>> tmpl = MarkupTemplate(''' + ... + ... Example + ... + ... + ...

Example

+ ...

${_("Hello, %(name)s") % dict(name=username)}

+ ... + ... ''', filename='example.html') + >>> for lineno, message in Translator().extract(tmpl.stream): + ... print "Line %d: %r" % (lineno, message) + Line 3: u'Example' + Line 6: u'Example' + Line 7: u'Hello, %(name)s' + """ + tagname = None + skip = 0 + + for kind, data, pos in stream: + if skip: + if kind is START: + tag, attrs = data + if tag.localname in self.ignore_tags: + skip += 1 + if kind is END: + tag = data + if tag.localname in self.ignore_tags: + skip -= 1 + continue + + if kind is START: + tag, attrs = data + if tag.localname in self.ignore_tags: + skip += 1 + continue + + for name, value in attrs: + if name in self.include_attrs: + if isinstance(value, basestring): + text = value.strip() + if text: + yield pos[1], text + else: + for lineno, text in harvest(value): + yield lineno, text + + elif kind is TEXT: + text = data.strip() + if text and filter(None, [ch.isalpha() for ch in text]): + yield pos[1], text + + elif kind is EXPR or kind is EXEC: + consts = dict([(n, chr(i) + '\x00') for i, n in + enumerate(data.code.co_consts)]) + gettext_locs = [consts[n] for n in gettext_functions + if n in consts] + ops = [ + LOAD_CONST, '(', '|'.join(gettext_locs), ')', + CALL_FUNCTION, '.\x00', + '((?:', BINARY_ADD, '|', LOAD_CONST, '.\x00)+)' + ] + for _, opcodes in re.findall(''.join(ops), data.code.co_code): + strings = [] + opcodes = iter(opcodes) + for opcode in opcodes: + if opcode == BINARY_ADD: + arg = strings.pop() + strings[-1] += arg + else: + arg = data.code.co_consts[ord(opcodes.next())] + opcodes.next() # skip second byte + if not isinstance(arg, basestring): + break + strings.append(unicode(arg)) + for string in strings: + yield pos[1], string + + elif kind is SUB: + subkind, substream = data + for lineno, text in self.harvest(substream): + yield lineno, text diff --git a/genshi/filters/tests/__init__.py b/genshi/filters/tests/__init__.py new file mode 100644 --- /dev/null +++ b/genshi/filters/tests/__init__.py @@ -0,0 +1,25 @@ +# -*- coding: utf-8 -*- +# +# Copyright (C) 2007 Edgewall Software +# All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://genshi.edgewall.org/wiki/License. +# +# This software consists of voluntary contributions made by many +# individuals. For the exact contribution history, see the revision +# history and logs, available at http://genshi.edgewall.org/log/. + +import doctest +import unittest + +def suite(): + from genshi.filters.tests import html, i18n + suite = unittest.TestSuite() + suite.addTest(html.suite()) + suite.addTest(i18n.suite()) + return suite + +if __name__ == '__main__': + unittest.main(defaultTest='suite') diff --git a/genshi/tests/filters.py b/genshi/filters/tests/html.py rename from genshi/tests/filters.py rename to genshi/filters/tests/html.py --- a/genshi/tests/filters.py +++ b/genshi/filters/tests/html.py @@ -14,9 +14,8 @@ import doctest import unittest -from genshi import filters from genshi.input import HTML, ParseError -from genshi.filters import HTMLFormFiller, HTMLSanitizer +from genshi.filters.html import HTMLFormFiller, HTMLSanitizer class HTMLFormFillerTestCase(unittest.TestCase): @@ -384,7 +383,7 @@ def suite(): suite = unittest.TestSuite() - suite.addTest(doctest.DocTestSuite(filters)) + suite.addTest(doctest.DocTestSuite(HTMLFormFiller.__module__)) suite.addTest(unittest.makeSuite(HTMLFormFillerTestCase, 'test')) suite.addTest(unittest.makeSuite(HTMLSanitizerTestCase, 'test')) return suite diff --git a/genshi/filters/tests/i18n.py b/genshi/filters/tests/i18n.py new file mode 100644 --- /dev/null +++ b/genshi/filters/tests/i18n.py @@ -0,0 +1,30 @@ +# -*- coding: utf-8 -*- +# +# Copyright (C) 2007 Edgewall Software +# All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://genshi.edgewall.org/wiki/License. +# +# This software consists of voluntary contributions made by many +# individuals. For the exact contribution history, see the revision +# history and logs, available at http://genshi.edgewall.org/log/. + +import doctest +import unittest + +from genshi.filters.i18n import Translator + + +class TranslatorTestCase(unittest.TestCase): + pass + + +def suite(): + suite = unittest.TestSuite() + suite.addTests(doctest.DocTestSuite(Translator.__module__)) + return suite + +if __name__ == '__main__': + unittest.main(defaultTest='suite') diff --git a/genshi/template/tests/__init__.py b/genshi/template/tests/__init__.py --- a/genshi/template/tests/__init__.py +++ b/genshi/template/tests/__init__.py @@ -14,7 +14,6 @@ import doctest import unittest - def suite(): from genshi.template.tests import base, directives, eval, interpolation, \ loader, markup, plugin, text diff --git a/genshi/tests/__init__.py b/genshi/tests/__init__.py --- a/genshi/tests/__init__.py +++ b/genshi/tests/__init__.py @@ -15,8 +15,8 @@ def suite(): import genshi - from genshi.tests import builder, core, filters, input, output, path, \ - util + from genshi.tests import builder, core, input, output, path, util + from genshi.filters import tests as filters from genshi.template import tests as template suite = unittest.TestSuite()