babel/old/mirror: scripts/import

annotate scripts/import_cldr.py @ 129:d6aef0675953

Add a couple of CLI tests.

author	cmlenz
date	Mon, 18 Jun 2007 17:05:00 +0000
parents	a72de8971819
children	d0cd235ede46

rev	line source
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	1 #!/usr/bin/env python
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	2 # -- coding: utf-8 --
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	3 #
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	4 # Copyright (C) 2007 Edgewall Software
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	5 # All rights reserved.
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	6 #
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	7 # This software is licensed as described in the file COPYING, which
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	8 # you should have received as part of this distribution. The terms
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	9 # are also available at http://babel.edgewall.org/wiki/License.
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	10 #
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	11 # This software consists of voluntary contributions made by many
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	12 # individuals. For the exact contribution history, see the revision
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	13 # history and logs, available at http://babel.edgewall.org/log/.
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	14
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	15 import copy
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	16 from optparse import OptionParser
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	17 import os
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	18 import pickle
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	19 import sys
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	20 try:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	21 from xml.etree.ElementTree import parse
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	22 except ImportError:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	23 from elementtree.ElementTree import parse
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	24
67 ad48b95af0d9 Add Babel soruce path to CLDR import script automatically for asmodai ;-). cmlenz parents: 36 diff changeset	25 # Make sure we're using Babel source, and not some previously installed version
ad48b95af0d9 Add Babel soruce path to CLDR import script automatically for asmodai ;-). cmlenz parents: 36 diff changeset	26 sys.path.insert(0, os.path.join(os.path.dirname(sys.argv[0]), '..'))
ad48b95af0d9 Add Babel soruce path to CLDR import script automatically for asmodai ;-). cmlenz parents: 36 diff changeset	27
11 11f64b232b04 Add basic support for number format patterns. jonas parents: 10 diff changeset	28 from babel import dates, numbers
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	29
17 aa33ad077d24 Minor date formatting improvements. cmlenz parents: 15 diff changeset	30 weekdays = {'mon': 0, 'tue': 1, 'wed': 2, 'thu': 3, 'fri': 4, 'sat': 5,
aa33ad077d24 Minor date formatting improvements. cmlenz parents: 15 diff changeset	31 'sun': 6}
10 0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	32
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	33 try:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	34 any
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	35 except NameError:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	36 def any(iterable):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	37 return filter(None, list(iterable))
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	38
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	39 def _text(elem):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	40 buf = [elem.text or '']
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	41 for child in elem:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	42 buf.append(_text(child))
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	43 buf.append(elem.tail or '')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	44 return u''.join(filter(None, buf)).strip()
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	45
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	46 def main():
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	47 parser = OptionParser(usage='%prog path/to/cldr')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	48 options, args = parser.parse_args()
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	49 if len(args) != 1:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	50 parser.error('incorrect number of arguments')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	51
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	52 srcdir = args[0]
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	53 destdir = os.path.join(os.path.dirname(os.path.abspath(sys.argv[0])),
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	54 '..', 'babel', 'localedata')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	55
10 0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	56 sup = parse(os.path.join(srcdir, 'supplemental', 'supplementalData.xml'))
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	57
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	58 # build a territory containment mapping for inheritance
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	59 regions = {}
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	60 for elem in sup.findall('//territoryContainment/group'):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	61 regions[elem.attrib['type']] = elem.attrib['contains'].split()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	62
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	63 # Resolve territory containment
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	64 territory_containment = {}
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	65 region_items = regions.items()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	66 region_items.sort()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	67 for group, territory_list in region_items:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	68 for territory in territory_list:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	69 containers = territory_containment.setdefault(territory, set([]))
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	70 if group in territory_containment:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	71 containers \|= territory_containment[group]
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	72 containers.add(group)
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	73
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	74 filenames = os.listdir(os.path.join(srcdir, 'main'))
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	75 filenames.remove('root.xml')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	76 filenames.sort(lambda a,b: len(a)-len(b))
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	77 filenames.insert(0, 'root.xml')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	78
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	79 dicts = {}
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	80
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	81 for filename in filenames:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	82 print>>sys.stderr, 'Processing input file %r' % filename
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	83 stem, ext = os.path.splitext(filename)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	84 if ext != '.xml':
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	85 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	86
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	87 tree = parse(os.path.join(srcdir, 'main', filename))
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	88 data = {}
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	89
10 0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	90 language = None
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	91 elem = tree.find('//identity/language')
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	92 if elem is not None:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	93 language = elem.attrib['type']
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	94 print>>sys.stderr, ' Language: %r' % language
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	95
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	96 territory = None
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	97 elem = tree.find('//identity/territory')
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	98 if elem is not None:
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	99 territory = elem.attrib['type']
15 b47c34d42eda Extended and documented `LazyProxy`. cmlenz parents: 11 diff changeset	100 else:
b47c34d42eda Extended and documented `LazyProxy`. cmlenz parents: 11 diff changeset	101 territory = '001' # world
10 0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	102 print>>sys.stderr, ' Territory: %r' % territory
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	103 regions = territory_containment.get(territory, [])
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	104 print>>sys.stderr, ' Regions: %r' % regions
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	105
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	106 # <localeDisplayNames>
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	107
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	108 territories = data.setdefault('territories', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	109 for elem in tree.findall('//territories/territory'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	110 if 'draft' in elem.attrib and elem.attrib['type'] in territories:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	111 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	112 territories[elem.attrib['type']] = _text(elem)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	113
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	114 languages = data.setdefault('languages', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	115 for elem in tree.findall('//languages/language'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	116 if 'draft' in elem.attrib and elem.attrib['type'] in languages:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	117 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	118 languages[elem.attrib['type']] = _text(elem)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	119
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	120 variants = data.setdefault('variants', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	121 for elem in tree.findall('//variants/variant'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	122 if 'draft' in elem.attrib and elem.attrib['type'] in variants:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	123 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	124 variants[elem.attrib['type']] = _text(elem)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	125
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	126 scripts = data.setdefault('scripts', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	127 for elem in tree.findall('//scripts/script'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	128 if 'draft' in elem.attrib and elem.attrib['type'] in scripts:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	129 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	130 scripts[elem.attrib['type']] = _text(elem)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	131
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	132 # <dates>
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	133
10 0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	134 week_data = data.setdefault('week_data', {})
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	135 supelem = sup.find('//weekData')
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	136
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	137 for elem in supelem.findall('minDays'):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	138 territories = elem.attrib['territories'].split()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	139 if territory in territories or any([r in territories for r in regions]):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	140 week_data['min_days'] = int(elem.attrib['count'])
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	141
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	142 for elem in supelem.findall('firstDay'):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	143 territories = elem.attrib['territories'].split()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	144 if territory in territories or any([r in territories for r in regions]):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	145 week_data['first_day'] = weekdays[elem.attrib['day']]
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	146
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	147 for elem in supelem.findall('weekendStart'):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	148 territories = elem.attrib['territories'].split()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	149 if territory in territories or any([r in territories for r in regions]):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	150 week_data['weekend_start'] = weekdays[elem.attrib['day']]
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	151
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	152 for elem in supelem.findall('weekendEnd'):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	153 territories = elem.attrib['territories'].split()
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	154 if territory in territories or any([r in territories for r in regions]):
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	155 week_data['weekend_end'] = weekdays[elem.attrib['day']]
0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	156
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	157 time_zones = data.setdefault('time_zones', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	158 for elem in tree.findall('//timeZoneNames/zone'):
30 9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	159 info = {}
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	160 city = elem.findtext('exemplarCity')
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	161 if city:
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	162 info['city'] = unicode(city)
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	163 for child in elem.findall('long/*'):
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	164 info.setdefault('long', {})[child.tag] = unicode(child.text)
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	165 for child in elem.findall('short/*'):
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	166 info.setdefault('short', {})[child.tag] = unicode(child.text)
9a00ac84004c Import basic timezone info from CLDR (see #3). Still missing a couple other pieces in the puzzle. cmlenz parents: 28 diff changeset	167 time_zones[elem.attrib['type']] = info
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	168
36 2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	169 zone_aliases = data.setdefault('zone_aliases', {})
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	170 if stem == 'root':
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	171 for elem in sup.findall('//timezoneData/zoneFormatting/zoneItem'):
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	172 if 'aliases' in elem.attrib:
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	173 canonical_id = elem.attrib['type']
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	174 for alias in elem.attrib['aliases'].split():
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	175 zone_aliases[alias] = canonical_id
2e143f1a0003 Extended time-zone support. cmlenz parents: 35 diff changeset	176
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	177 for calendar in tree.findall('//calendars/calendar'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	178 if calendar.attrib['type'] != 'gregorian':
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	179 # TODO: support other calendar types
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	180 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	181
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	182 months = data.setdefault('months', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	183 for ctxt in calendar.findall('months/monthContext'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	184 ctxts = months.setdefault(ctxt.attrib['type'], {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	185 for width in ctxt.findall('monthWidth'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	186 widths = ctxts.setdefault(width.attrib['type'], {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	187 for elem in width.findall('month'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	188 if 'draft' in elem.attrib and int(elem.attrib['type']) in widths:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	189 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	190 widths[int(elem.attrib.get('type'))] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	191
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	192 days = data.setdefault('days', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	193 for ctxt in calendar.findall('days/dayContext'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	194 ctxts = days.setdefault(ctxt.attrib['type'], {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	195 for width in ctxt.findall('dayWidth'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	196 widths = ctxts.setdefault(width.attrib['type'], {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	197 for elem in width.findall('day'):
10 0ca5dd65594f Pull in some supplemental data from the CLDR, for things like the first day of the week. cmlenz parents: 3 diff changeset	198 dtype = weekdays[elem.attrib['type']]
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	199 if 'draft' in elem.attrib and dtype in widths:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	200 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	201 widths[dtype] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	202
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	203 quarters = data.setdefault('quarters', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	204 for ctxt in calendar.findall('quarters/quarterContext'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	205 ctxts = quarters.setdefault(ctxt.attrib['type'], {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	206 for width in ctxt.findall('quarterWidth'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	207 widths = ctxts.setdefault(width.attrib['type'], {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	208 for elem in width.findall('quarter'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	209 if 'draft' in elem.attrib and int(elem.attrib['type']) in widths:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	210 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	211 widths[int(elem.attrib.get('type'))] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	212
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	213 eras = data.setdefault('eras', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	214 for width in calendar.findall('eras/*'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	215 ewidth = {'eraNames': 'wide', 'eraAbbr': 'abbreviated'}[width.tag]
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	216 widths = eras.setdefault(ewidth, {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	217 for elem in width.findall('era'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	218 if 'draft' in elem.attrib and int(elem.attrib['type']) in widths:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	219 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	220 widths[int(elem.attrib.get('type'))] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	221
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	222 # AM/PM
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	223 periods = data.setdefault('periods', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	224 for elem in calendar.findall('am'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	225 if 'draft' in elem.attrib and elem.tag in periods:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	226 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	227 periods[elem.tag] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	228 for elem in calendar.findall('pm'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	229 if 'draft' in elem.attrib and elem.tag in periods:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	230 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	231 periods[elem.tag] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	232
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	233 date_formats = data.setdefault('date_formats', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	234 for elem in calendar.findall('dateFormats/dateFormatLength'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	235 if 'draft' in elem.attrib and elem.attrib.get('type') in date_formats:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	236 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	237 try:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	238 date_formats[elem.attrib.get('type')] = \
11 11f64b232b04 Add basic support for number format patterns. jonas parents: 10 diff changeset	239 dates.parse_pattern(unicode(elem.findtext('dateFormat/pattern')))
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	240 except ValueError, e:
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	241 print>>sys.stderr, 'ERROR: %s' % e
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	242
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	243 time_formats = data.setdefault('time_formats', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	244 for elem in calendar.findall('timeFormats/timeFormatLength'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	245 if 'draft' in elem.attrib and elem.attrib.get('type') in time_formats:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	246 continue
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	247 try:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	248 time_formats[elem.attrib.get('type')] = \
11 11f64b232b04 Add basic support for number format patterns. jonas parents: 10 diff changeset	249 dates.parse_pattern(unicode(elem.findtext('timeFormat/pattern')))
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	250 except ValueError, e:
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	251 print>>sys.stderr, 'ERROR: %s' % e
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	252
35 0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	253 datetime_formats = data.setdefault('datetime_formats', {})
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	254 for elem in calendar.findall('dateTimeFormats/dateTimeFormatLength'):
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	255 if 'draft' in elem.attrib and elem.attrib.get('type') in datetime_formats:
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	256 continue
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	257 try:
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	258 datetime_formats[elem.attrib.get('type')] = \
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	259 unicode(elem.findtext('dateTimeFormat/pattern'))
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	260 except ValueError, e:
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	261 print>>sys.stderr, 'ERROR: %s' % e
0505d666fa1f * Import datetime patterns from CLDR. cmlenz parents: 30 diff changeset	262
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	263 # <numbers>
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	264
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	265 number_symbols = data.setdefault('number_symbols', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	266 for elem in tree.findall('//numbers/symbols/*'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	267 number_symbols[elem.tag] = unicode(elem.text)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	268
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	269 decimal_formats = data.setdefault('decimal_formats', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	270 for elem in tree.findall('//decimalFormats/decimalFormatLength'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	271 if 'draft' in elem.attrib and elem.attrib.get('type') in decimal_formats:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	272 continue
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	273 pattern = unicode(elem.findtext('decimalFormat/pattern'))
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	274 decimal_formats[elem.attrib.get('type')] = numbers.parse_pattern(pattern)
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	275
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	276 scientific_formats = data.setdefault('scientific_formats', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	277 for elem in tree.findall('//scientificFormats/scientificFormatLength'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	278 if 'draft' in elem.attrib and elem.attrib.get('type') in scientific_formats:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	279 continue
127 a72de8971819 Add currency formatting. cmlenz parents: 67 diff changeset	280 pattern = unicode(elem.findtext('scientificFormat/pattern'))
a72de8971819 Add currency formatting. cmlenz parents: 67 diff changeset	281 scientific_formats[elem.attrib.get('type')] = numbers.parse_pattern(pattern)
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	282
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	283 currency_formats = data.setdefault('currency_formats', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	284 for elem in tree.findall('//currencyFormats/currencyFormatLength'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	285 if 'draft' in elem.attrib and elem.attrib.get('type') in currency_formats:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	286 continue
127 a72de8971819 Add currency formatting. cmlenz parents: 67 diff changeset	287 pattern = unicode(elem.findtext('currencyFormat/pattern'))
a72de8971819 Add currency formatting. cmlenz parents: 67 diff changeset	288 currency_formats[elem.attrib.get('type')] = numbers.parse_pattern(pattern)
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	289
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	290 percent_formats = data.setdefault('percent_formats', {})
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	291 for elem in tree.findall('//percentFormats/percentFormatLength'):
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	292 if 'draft' in elem.attrib and elem.attrib.get('type') in percent_formats:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	293 continue
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	294 pattern = unicode(elem.findtext('percentFormat/pattern'))
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	295 percent_formats[elem.attrib.get('type')] = numbers.parse_pattern(pattern)
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	296
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	297 currency_names = data.setdefault('currency_names', {})
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	298 currency_symbols = data.setdefault('currency_symbols', {})
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	299 for elem in tree.findall('//currencies/currency'):
28 695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	300 name = elem.findtext('displayName')
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	301 if name:
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	302 currency_names[elem.attrib['type']] = unicode(name)
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	303 symbol = elem.findtext('symbol')
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	304 if symbol:
695884591af6 * Reduce size of locale data pickles by only storing the data provided by each locale itself, and merging inherited data at runtime. cmlenz parents: 24 diff changeset	305 currency_symbols[elem.attrib['type']] = unicode(symbol)
3 e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	306
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	307 dicts[stem] = data
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	308 outfile = open(os.path.join(destdir, stem + '.dat'), 'wb')
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	309 try:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	310 pickle.dump(data, outfile, 2)
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	311 finally:
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	312 outfile.close()
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	313
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	314 if __name__ == '__main__':
e9eaddab598e Import of initial code base. cmlenz parents: diff changeset	315 main()

Mercurial > babel > old > mirror

annotate scripts/import_cldr.py @ 129:d6aef0675953