annotate babel/messages/pofile.py @ 151:12e5f21dfcda

Respect charset specified in PO headers in `read_po()`. Fixes #17.
author cmlenz
date Wed, 20 Jun 2007 20:31:24 +0000
parents 9e3d2b227ec3
children b5659b7779be
rev   line source
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
1 # -*- coding: utf-8 -*-
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
2 #
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
3 # Copyright (C) 2007 Edgewall Software
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
4 # All rights reserved.
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
5 #
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
6 # This software is licensed as described in the file COPYING, which
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
7 # you should have received as part of this distribution. The terms
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
8 # are also available at http://babel.edgewall.org/wiki/License.
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
9 #
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
10 # This software consists of voluntary contributions made by many
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
11 # individuals. For the exact contribution history, see the revision
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
12 # history and logs, available at http://babel.edgewall.org/log/.
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
13
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
14 """Reading and writing of files in the ``gettext`` PO (portable object)
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
15 format.
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
16
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
17 :see: `The Format of PO Files
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
18 <http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files>`_
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
19 """
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
20
7
8d7b3077e6d1 * The creation-date header in generated PO files now includes the timezone offset.
cmlenz
parents: 3
diff changeset
21 from datetime import date, datetime
136
9e3d2b227ec3 More fixes for Windows compatibility:
cmlenz
parents: 122
diff changeset
22 import os
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
23 import re
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
24 try:
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
25 set
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
26 except NameError:
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
27 from sets import Set as set
105
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
28 from textwrap import wrap
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
29
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
30 from babel import __version__ as VERSION
58
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
31 from babel.messages.catalog import Catalog
99
b6b5992daa6c Renamed `LOCAL` to `LOCALTZ`.
cmlenz
parents: 98
diff changeset
32 from babel.util import LOCALTZ
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
33
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
34 __all__ = ['escape', 'normalize', 'read_po', 'write_po']
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
35
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
36 def read_po(fileobj):
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
37 """Read messages from a ``gettext`` PO (portable object) file from the given
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
38 file-like object and return a `Catalog`.
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
39
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
40 >>> from StringIO import StringIO
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
41 >>> buf = StringIO('''
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
42 ... #: main.py:1
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
43 ... #, fuzzy, python-format
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
44 ... msgid "foo %(name)s"
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
45 ... msgstr ""
23
f828705c3bce Change pot header's first line, "Translations Template for %%(project)s." instead of "SOME DESCRIPTIVE TITLE.". '''`project`''' and '''`version`''' now default to '''PROJECT''' and '''VERSION''' respectively. Fixed a bug regarding '''Content-Transfer-Encoding''', it shouldn't be the charset, and we're defaulting to `8bit` untill someone complains.
palgarvio
parents: 19
diff changeset
46 ...
96
6c07c38e23aa Updated `read_po` to add user comments besides just auto comments.
palgarvio
parents: 86
diff changeset
47 ... # A user comment
6c07c38e23aa Updated `read_po` to add user comments besides just auto comments.
palgarvio
parents: 86
diff changeset
48 ... #. An auto comment
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
49 ... #: main.py:3
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
50 ... msgid "bar"
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
51 ... msgid_plural "baz"
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
52 ... msgstr[0] ""
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
53 ... msgstr[1] ""
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
54 ... ''')
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
55 >>> catalog = read_po(buf)
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
56 >>> catalog.revision_date = datetime(2007, 04, 01)
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
57
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
58 >>> for message in catalog:
69
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
59 ... if message.id:
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
60 ... print (message.id, message.string)
107
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
61 ... print ' ', (message.locations, message.flags)
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
62 ... print ' ', (message.user_comments, message.auto_comments)
151
12e5f21dfcda Respect charset specified in PO headers in `read_po()`. Fixes #17.
cmlenz
parents: 136
diff changeset
63 (u'foo %(name)s', '')
12e5f21dfcda Respect charset specified in PO headers in `read_po()`. Fixes #17.
cmlenz
parents: 136
diff changeset
64 ([(u'main.py', 1)], set([u'fuzzy', u'python-format']))
107
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
65 ([], [])
151
12e5f21dfcda Respect charset specified in PO headers in `read_po()`. Fixes #17.
cmlenz
parents: 136
diff changeset
66 ((u'bar', u'baz'), ('', ''))
12e5f21dfcda Respect charset specified in PO headers in `read_po()`. Fixes #17.
cmlenz
parents: 136
diff changeset
67 ([(u'main.py', 3)], set([]))
12e5f21dfcda Respect charset specified in PO headers in `read_po()`. Fixes #17.
cmlenz
parents: 136
diff changeset
68 ([u'A user comment'], [u'An auto comment'])
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
69
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
70 :param fileobj: the file-like object to read the PO file from
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
71 :return: an iterator over ``(message, translation, location)`` tuples
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
72 :rtype: ``iterator``
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
73 """
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
74 catalog = Catalog()
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
75
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
76 messages = []
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
77 translations = []
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
78 locations = []
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
79 flags = []
107
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
80 user_comments = []
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
81 auto_comments = []
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
82 in_msgid = in_msgstr = False
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
83
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
84 def _add_message():
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
85 translations.sort()
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
86 if len(messages) > 1:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
87 msgid = tuple([denormalize(m) for m in messages])
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
88 else:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
89 msgid = denormalize(messages[0])
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
90 if len(translations) > 1:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
91 string = tuple([denormalize(t[1]) for t in translations])
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
92 else:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
93 string = denormalize(translations[0][1])
107
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
94 catalog.add(msgid, string, list(locations), set(flags),
110
ecc04be42086 Fixed a bug introduced in [106].
palgarvio
parents: 108
diff changeset
95 list(auto_comments), list(user_comments))
86
8a703ecdba91 Some cosmetic changes for the new translator comments support.
cmlenz
parents: 82
diff changeset
96 del messages[:]; del translations[:]; del locations[:];
107
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
97 del flags[:]; del auto_comments[:]; del user_comments[:]
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
98
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
99 for line in fileobj.readlines():
151
12e5f21dfcda Respect charset specified in PO headers in `read_po()`. Fixes #17.
cmlenz
parents: 136
diff changeset
100 line = line.strip().decode(catalog.charset)
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
101 if line.startswith('#'):
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
102 in_msgid = in_msgstr = False
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
103 if messages:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
104 _add_message()
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
105 if line[1:].startswith(':'):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
106 for location in line[2:].lstrip().split():
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
107 filename, lineno = location.split(':', 1)
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
108 locations.append((filename, int(lineno)))
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
109 elif line[1:].startswith(','):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
110 for flag in line[2:].lstrip().split(','):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
111 flags.append(flag.strip())
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
112 elif line[1:].startswith('.'):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
113 # These are called auto-comments
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
114 comment = line[2:].strip()
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
115 if comment:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
116 # Just check that we're not adding empty comments
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
117 auto_comments.append(comment)
122
03f106700f02 Added tests for `new_catalog` distutils command.
cmlenz
parents: 110
diff changeset
118 else:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
119 # These are called user comments
122
03f106700f02 Added tests for `new_catalog` distutils command.
cmlenz
parents: 110
diff changeset
120 user_comments.append(line[1:].strip())
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
121 else:
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
122 if line.startswith('msgid_plural'):
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
123 in_msgid = True
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
124 msg = line[12:].lstrip()
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
125 messages.append(msg)
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
126 elif line.startswith('msgid'):
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
127 in_msgid = True
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
128 if messages:
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
129 _add_message()
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
130 messages.append(line[5:].lstrip())
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
131 elif line.startswith('msgstr'):
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
132 in_msgid = False
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
133 in_msgstr = True
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
134 msg = line[6:].lstrip()
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
135 if msg.startswith('['):
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
136 idx, msg = msg[1:].split(']')
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
137 translations.append([int(idx), msg.lstrip()])
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
138 else:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
139 translations.append([0, msg])
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
140 elif line.startswith('"'):
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
141 if in_msgid:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
142 messages[-1] += u'\n' + line.rstrip()
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
143 elif in_msgstr:
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
144 translations[-1][1] += u'\n' + line.rstrip()
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
145
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
146 if messages:
66
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
147 _add_message()
d1a7425739d3 `read_po` now returns a `Catalog`.
cmlenz
parents: 58
diff changeset
148 return catalog
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
149
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
150 WORD_SEP = re.compile('('
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
151 r'\s+|' # any whitespace
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
152 r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
153 r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w)' # em-dash
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
154 ')')
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
155
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
156 def escape(string):
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
157 r"""Escape the given string so that it can be included in double-quoted
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
158 strings in ``PO`` files.
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
159
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
160 >>> escape('''Say:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
161 ... "hello, world!"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
162 ... ''')
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
163 '"Say:\\n \\"hello, world!\\"\\n"'
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
164
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
165 :param string: the string to escape
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
166 :return: the escaped string
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
167 :rtype: `str` or `unicode`
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
168 """
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
169 return '"%s"' % string.replace('\\', '\\\\') \
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
170 .replace('\t', '\\t') \
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
171 .replace('\r', '\\r') \
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
172 .replace('\n', '\\n') \
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
173 .replace('\"', '\\"')
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
174
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
175 def unescape(string):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
176 r"""Reverse escape the given string.
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
177
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
178 >>> print unescape('"Say:\\n \\"hello, world!\\"\\n"')
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
179 Say:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
180 "hello, world!"
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
181 <BLANKLINE>
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
182
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
183 :param string: the string to unescape
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
184 :return: the unescaped string
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
185 :rtype: `str` or `unicode`
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
186 """
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
187 return string[1:-1].replace('\\\\', '\\') \
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
188 .replace('\\t', '\t') \
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
189 .replace('\\r', '\r') \
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
190 .replace('\\n', '\n') \
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
191 .replace('\\"', '\"')
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
192
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
193 def normalize(string, width=76):
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
194 r"""Convert a string into a format that is appropriate for .po files.
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
195
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
196 >>> print normalize('''Say:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
197 ... "hello, world!"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
198 ... ''', width=None)
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
199 ""
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
200 "Say:\n"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
201 " \"hello, world!\"\n"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
202
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
203 >>> print normalize('''Say:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
204 ... "Lorem ipsum dolor sit amet, consectetur adipisicing elit, "
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
205 ... ''', width=32)
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
206 ""
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
207 "Say:\n"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
208 " \"Lorem ipsum dolor sit "
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
209 "amet, consectetur adipisicing"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
210 " elit, \"\n"
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
211
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
212 :param string: the string to normalize
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
213 :param width: the maximum line width; use `None`, 0, or a negative number
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
214 to completely disable line wrapping
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
215 :return: the normalized string
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
216 :rtype: `unicode`
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
217 """
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
218 if width and width > 0:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
219 lines = []
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
220 for idx, line in enumerate(string.splitlines(True)):
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
221 if len(escape(line)) > width:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
222 chunks = WORD_SEP.split(line)
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
223 chunks.reverse()
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
224 while chunks:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
225 buf = []
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
226 size = 2
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
227 while chunks:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
228 l = len(escape(chunks[-1])) - 2
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
229 if size + l < width:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
230 buf.append(chunks.pop())
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
231 size += l
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
232 else:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
233 if not buf:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
234 # handle long chunks by putting them on a
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
235 # separate line
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
236 buf.append(chunks.pop())
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
237 break
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
238 lines.append(u''.join(buf))
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
239 else:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
240 lines.append(line)
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
241 else:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
242 lines = string.splitlines(True)
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
243
69
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
244 if len(lines) <= 1:
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
245 return escape(string)
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
246
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
247 # Remove empty trailing line
69
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
248 if lines and not lines[-1]:
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
249 del lines[-1]
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
250 lines[-1] += '\n'
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
251 return u'""\n' + u'\n'.join([escape(l) for l in lines])
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
252
108
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
253 def denormalize(string):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
254 r"""Reverse the normalization done by the `normalize` function.
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
255
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
256 >>> print denormalize(r'''""
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
257 ... "Say:\n"
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
258 ... " \"hello, world!\"\n"''')
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
259 Say:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
260 "hello, world!"
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
261 <BLANKLINE>
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
262
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
263 >>> print denormalize(r'''""
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
264 ... "Say:\n"
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
265 ... " \"Lorem ipsum dolor sit "
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
266 ... "amet, consectetur adipisicing"
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
267 ... " elit, \"\n"''')
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
268 Say:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
269 "Lorem ipsum dolor sit amet, consectetur adipisicing elit, "
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
270 <BLANKLINE>
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
271
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
272 :param string: the string to denormalize
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
273 :return: the denormalized string
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
274 :rtype: `unicode` or `str`
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
275 """
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
276 if string.startswith('""'):
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
277 lines = []
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
278 for line in string.splitlines()[1:]:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
279 lines.append(unescape(line))
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
280 return ''.join(lines)
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
281 else:
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
282 return unescape(string)
8ea225f33f28 Fix for #16: the header message (`msgid = ""`) is now treated specially by `read_po` and `Catalog`.
cmlenz
parents: 107
diff changeset
283
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
284 def write_po(fileobj, catalog, width=76, no_location=False, omit_header=False,
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
285 sort_output=False, sort_by_file=False):
58
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
286 r"""Write a ``gettext`` PO (portable object) template file for a given
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
287 message catalog to the provided file-like object.
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
288
58
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
289 >>> catalog = Catalog()
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
290 >>> catalog.add(u'foo %(name)s', locations=[('main.py', 1)],
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
291 ... flags=('fuzzy',))
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
292 >>> catalog.add((u'bar', u'baz'), locations=[('main.py', 3)])
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
293 >>> from StringIO import StringIO
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
294 >>> buf = StringIO()
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
295 >>> write_po(buf, catalog, omit_header=True)
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
296 >>> print buf.getvalue()
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
297 #: main.py:1
8
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
298 #, fuzzy, python-format
ff5481545bfd Add basic PO file parsing, and change the PO writing procedure to also take flags (such as "python-format" or "fuzzy").
cmlenz
parents: 7
diff changeset
299 msgid "foo %(name)s"
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
300 msgstr ""
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
301 <BLANKLINE>
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
302 #: main.py:3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
303 msgid "bar"
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
304 msgid_plural "baz"
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
305 msgstr[0] ""
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
306 msgstr[1] ""
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
307 <BLANKLINE>
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
308 <BLANKLINE>
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
309
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
310 :param fileobj: the file-like object to write to
69
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
311 :param catalog: the `Catalog` instance
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
312 :param width: the maximum line width for the generated output; use `None`,
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
313 0, or a negative number to completely disable line wrapping
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
314 :param no_location: do not emit a location comment for every message
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
315 :param omit_header: do not include the ``msgid ""`` entry at the top of the
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
316 output
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
317 """
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
318 def _normalize(key):
104
57d2f21a1fcc Project name and version, and the charset are available via the `Catalog` object, and do not need to be passed to `write_pot()`.
cmlenz
parents: 99
diff changeset
319 return normalize(key, width=width).encode(catalog.charset,
57d2f21a1fcc Project name and version, and the charset are available via the `Catalog` object, and do not need to be passed to `write_pot()`.
cmlenz
parents: 99
diff changeset
320 'backslashreplace')
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
321
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
322 def _write(text):
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
323 if isinstance(text, unicode):
104
57d2f21a1fcc Project name and version, and the charset are available via the `Catalog` object, and do not need to be passed to `write_pot()`.
cmlenz
parents: 99
diff changeset
324 text = text.encode(catalog.charset)
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
325 fileobj.write(text)
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
326
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
327 messages = list(catalog)
73
5d8e87acdcc7 Implemented message sorting, see #7.
palgarvio
parents: 70
diff changeset
328 if sort_output:
5d8e87acdcc7 Implemented message sorting, see #7.
palgarvio
parents: 70
diff changeset
329 messages.sort(lambda x,y: cmp(x.id, y.id))
5d8e87acdcc7 Implemented message sorting, see #7.
palgarvio
parents: 70
diff changeset
330 elif sort_by_file:
5d8e87acdcc7 Implemented message sorting, see #7.
palgarvio
parents: 70
diff changeset
331 messages.sort(lambda x,y: cmp(x.locations, y.locations))
70
620fdd25657a Add back POT header broken in previous check-in.
cmlenz
parents: 69
diff changeset
332
73
5d8e87acdcc7 Implemented message sorting, see #7.
palgarvio
parents: 70
diff changeset
333 for message in messages:
69
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
334 if not message.id: # This is the header "message"
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
335 if omit_header:
1d8e81bfedf9 Enhance catalog to also manage the MIME headers.
cmlenz
parents: 66
diff changeset
336 continue
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
337 comment_header = catalog.header_comment
105
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
338 if width and width > 0:
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
339 lines = []
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
340 for line in comment_header.splitlines():
105
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
341 lines += wrap(line, width=width, subsequent_indent='# ',
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
342 break_long_words=False)
106
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
343 comment_header = u'\n'.join(lines) + u'\n'
2a00e352c986 Merged `write_pot` and `write_po` functions by moving more functionality to the `Catalog` class. This is certainly not perfect yet, but moves us in the right direction.
cmlenz
parents: 105
diff changeset
344 _write(comment_header)
104
57d2f21a1fcc Project name and version, and the charset are available via the `Catalog` object, and do not need to be passed to `write_pot()`.
cmlenz
parents: 99
diff changeset
345
107
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
346 if message.user_comments:
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
347 for comment in message.user_comments:
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
348 for line in wrap(comment, width, break_long_words=False):
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
349 _write('# %s\n' % line.strip())
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
350
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
351 if message.auto_comments:
4b42e23644e5 `Message`, `read_po` and `write_po` now all handle user/auto comments correctly.
palgarvio
parents: 106
diff changeset
352 for comment in message.auto_comments:
105
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
353 for line in wrap(comment, width, break_long_words=False):
82
f421e5576d26 Added support for translator comments at the API and frontends levels.(See #12, item 1). Updated docs and tests accordingly.
palgarvio
parents: 81
diff changeset
354 _write('#. %s\n' % line.strip())
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
355
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
356 if not no_location:
136
9e3d2b227ec3 More fixes for Windows compatibility:
cmlenz
parents: 122
diff changeset
357 locs = u' '.join([u'%s:%d' % (filename.replace(os.sep, '/'), lineno)
9e3d2b227ec3 More fixes for Windows compatibility:
cmlenz
parents: 122
diff changeset
358 for filename, lineno in message.locations])
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
359 if width and width > 0:
105
abd3a594dab4 Implement wrapping of header comments in PO(T) output. Related to #14.
cmlenz
parents: 104
diff changeset
360 locs = wrap(locs, width, break_long_words=False)
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
361 for line in locs:
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
362 _write('#: %s\n' % line.strip())
58
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
363 if message.flags:
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
364 _write('#%s\n' % ', '.join([''] + list(message.flags)))
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
365
58
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
366 if isinstance(message.id, (list, tuple)):
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
367 _write('msgid %s\n' % _normalize(message.id[0]))
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
368 _write('msgid_plural %s\n' % _normalize(message.id[1]))
70
620fdd25657a Add back POT header broken in previous check-in.
cmlenz
parents: 69
diff changeset
369 for i, string in enumerate(message.string):
620fdd25657a Add back POT header broken in previous check-in.
cmlenz
parents: 69
diff changeset
370 _write('msgstr[%d] %s\n' % (i, _normalize(message.string[i])))
3
e9eaddab598e Import of initial code base.
cmlenz
parents:
diff changeset
371 else:
58
068952b4d4c0 Add actual data structures for handling message catalogs, so that more code can be reused here between the frontends.
cmlenz
parents: 57
diff changeset
372 _write('msgid %s\n' % _normalize(message.id))
70
620fdd25657a Add back POT header broken in previous check-in.
cmlenz
parents: 69
diff changeset
373 _write('msgstr %s\n' % _normalize(message.string or ''))
26
93eaa2f4a0a2 Reimplement line wrapping for PO writing (as the `textwrap` module is too destructive with white space) and move it to the `normalize` function (which was already doing some handling of line breaks).
cmlenz
parents: 25
diff changeset
374 _write('\n')
Copyright (C) 2012-2017 Edgewall Software