qbJzuB5@=Bi>t=;b@jToZ;&o0WJO3zYuMmFEvJ?-M`de=&ZN#U%6+(EHaA-oc3+{+T5i)&y;~4&l8z@dv#TaLXE32W(9yol9-(W
z4%8mDqNWXtbvp(kt&(k19KfTeYFX`7N!s{J=RUeS1Ys(@+GRerX{$!&0Cl*wkWzPJ
zXyEs@?Yv{0((wZLUN=B$&{t=qA6BvhGFz*LJhM;uG#*!ac!O8NM?3QawD~KVEzNeT
zB=+>h!|a~>MivlA+R!>4yZT&81jMrwafYw(Astare2OQ_T2|%fp0)`2ZOPyuorZs8
zw?OOC;R>{HZ}!%%HDdAA^INSn*|rS3)C;+-R7asj$7^445VZOmU@p!|mHupdVT<}P
zc=C59CHQ?Vh<~Wh&Ak{?7-;?AIQynP4>oQ)F(Kh$=BNSmK0)q7cLdzTcNv$euBH?^
z0hmsjlue@)D-|CRK_xTk9nt6Ym90!c=FYH!3hPb$Tx)%U_kU=+alrMNv^D4a$`4Zk
zGD-{tr7sumKOh@=loh@83XuDkxSY6ZzfXG`!ahd4;NwBgk_382_r>uSaT`c2^1))X
z;+pVfHjdcB*-tvdLAzH=okHb@bRr0NUmR(9M9tv^AHUVnE#LLHC62CfVKBWx1Ue-x
z*|)os&*K3j(ED-5EMT*NoX*dkIO%-4KJkj%M#Ztv$Vuh+!J8Fy%@@Scnt>p8zR43$
zeKnyi@idp=V~hje9`=Kx@Y|La9h9YKh&a@5QTM{@P6lE9A0O-^UOkR?NMhVxTs^_;
zQPmtcKDL<4DMsd3%!{lfN0={f>>`qL{M;9wmX>CLJ#08~G;zq-P&C^bllG*!_Oj*F
zIm-y*Rl`fnfL$;2cc7ijOr=+Y#;dT?Z~GiOp@%@3l@=cx?NKa(0>~#U$<;vWRIy<6
z(k{;DT%4V;(|5ei43wjkFjEfc45Vf~8$rgU3DGZMv=ODKHF@motC^#wNz*arD5npf
zY3fXaO{i8`)1A3S+`*i@4CQTlBW!}jf(RZPmVDYP^ZdoUJO(M`{(aD`ch}GW@OWN)
zBK8oEL5?&b6%@mGsQy-OT~p~hTRr~3VKzvu{_x2X2|xL!Tq
zFVt*m_TE|yhE%JMP40Np?dR|3N2
z2N~qz3BiC*y_v7)H#=&4{g|KcxXVzL9X`j#LD2Xs^le>DIyTm^2Jg*ua>fv4Putv~3?&If{MrYTB&xZ1r
zCHA!;QKGipm(5gbn%|99h>`710YLIhs{#V+-013r$gh$WxyH-qV@|R&%@m2G%eLA{?R(Q6
z_=P>Jgg~dCr>y;HA~a(Uo1C?i84lnuHTY8FTM9?}vlV8^iDG12czzvz-A*ViJz&Dm
zZ*cm_4$^|WL1N*5(3;F;
z5UY=H{lqK(RNZTFqihm&I6j^}jHN0?#}sUIQIaMBbl{GBg-@S8v4J6*&BeCDP;_p*
zOt~alA?n2l4Y^Y*wM6N$uLD8vJ=!NM!|QK2)4kn1y%s*R^}Nm}NG})~TAe$yIx3{E
zvmLpsgOUFkpP2r*>XcNFE97=t-3lSqtA19~cF_bt6K>hQhJw&hB=Lzi{%4-LQoW+<
zxB&XyrI*Cq?eB}qla+dt{%A5mciV;P-0fckfqr|ZC@0VTl@@-_Qg}a^=UwS+zBJrj
zc=hAAo?;8t-wdQZ_w)h(Na7PL!{he4qCKAn`zuw{Q0V_kY)0l-P;QM*EXjod()8O5
zMacU9DKXtpG}}LSZPak9x5U0&)oIkJG;CUs?XCGqT2E7N
zKkd79Qg=S@T!q^&FUYxEw!W&C?TQh39;q6D31$hl^H9zSUiOJ8{c#eDM5Rts#Phb}
zFdYElN&Q%=t+V;=Hq{xu|?aAoJeh_`|io~
zej@Y=>Q$NXjbUc65qjT+>^Q>U1N~5-xdlKjVn0A&68pt?z4eXg-#41N97%ICM&d|-
zPCDm1at9IoSy7*1C{Vbj7?i}534)Ajfs1P*4V6jwtMg%6&I_GM*owAyHD61ek#>%a
zq$iX$Xs2byCPrTt$LM|$+u?aWj_ROT#DVYq+L{jtf^Amj=Be;sTF`+&AUf2Haw(p9*8sb|0)i`|F
zF-a?BQ->EiIzC}0-laVy(uB*KY@y1o#NDLEuLcjqVdgeUf@eh(GTscnBnItG1YF)5
z$y)R61oASvj~bXSA_CfY;Jx)l1QcLpDvZa$*B>lG={HIo0S|SP>!4vvp@rQ^>^PJ*
zr|>M|&R&BCRJ)TI#yguZ>DaTJJ*+YVSy3rLQN!UGC!Qy2R>z;Gc!19GEif+r4aiu<
zytD%DCwlr?E#d|aLsn`&fwip?ln8ArB9N45#w4N#$}N+J?`>L$2skhfUa
zUZV9kEBv&M+nl0o@v{v%c-F4bf8|TV#d)70J^1OJpJ8#|0gdPsbQ#Ad#zv
z;qx+Q$woZbm?2>
z3oJmHd6mmMyq1~%iYwL7GZI%Eq5}3F%v19X2h2w%V;#?WXDOG}m)t(qFIP+PO}RMq
zubd-oyt};ZHFys5a^Sty+134pz%BnH%%lbN0n4c%yzF5@jHoApKZX)`qgWGhtD+AGBe)86Uf+kg+#;DW{*!&)@7F)^V!vtm>_&oqVH_d87ET5ii6%$N+`4Y;1>SPXSu>n3WM1A`1n8i;Li&LCd$Mq)qm++$bQ~}S*Scy1B)G@
z-c0etZKh4*!MOArJuK*2AIwdUIKAqO61+_Esv0+#=asZQwm5^JXvgafZDMv$lPSc(UqQS7(mrzhq^xt7
zk<;pO{#^WG18UtR8qYmkHQ`@|S4vfAox4QaArcML7bH_n`Kh}eEbFldxj){o84`O-
zT}fON$(jq9#PRygK8-ag2UI+%iC-XxW{Ym+WQCH2nQNn(<89sEi8x@p)Ar2xs07
zCGPW4Zto)-rM?W_6E~KpTkezf_9cIU6>4AKL4V+=-;lrY%#-bq5JwYyVY7
zp1jVM)eM@7{{cP(Bi1c%9hN(VPU6!teS2@?2S61-ksA{;T{3S}iSdVl8HmYbFTNInp`^Qh
zpMUofuH|=Up^(olDGE^^?+Cz`3D+z%P5ud+b!PqDu-JHqJTYYR3!gzRm6w2!_$Mlp
zO^X9n$ZSv;c4o>L6e76lLK0tW4p(rNvv-w+N>y{sCqY3JP`qW0F^IL%qMTmdjo`g>
z&T8pVg)GcSQh+)sRv$NOd2}#3cLeTM^KQqypgtDy7vfcM0Db|GW9Pv?vPn5Tj9_<(
zb9i)+*d5)uGIKlD*g2GwMqmOvtbVopQRMjR
z*I1Va-ux3D2jQ4F1uFlqllNqbZKy)mL$x{r4&btBF!fWD^?OnT&yxb55dnNeeJcVW
zD?lQkMi&ecd1|E)%ZY_?tQ%dT!7vgiU?ZNjILyh}2A<}GPO#g3=wBU$+1EI6nmW?v
z*0~|@QFrj&+pZKmXAw!b>-*FdHr92ZR{z2YJ_;Gp{eBT@j11sAA6j^rkSb`2Zzs;<
zwYnCYn9^1Dx452U*f-zs;_p8z4aLec1
zxGZ|E5~JgFnGFrK^Yyx*(H-#L(=FTgzc9q^`!4fes9^%=mk>KoaDeQ;7j%|m=q9{f
zSFZ%u_-5V&i?ci;4Y#{7f@l1_sx}I%ZMh+QhG=>^iB)%J28HJ3#Z_f0D^p9m41x+wT90dkpi{)M#fY!=N>2TtG2O#&CvI$^~AeS@U|9Mx(UGll+(H8ryn+WKm8
zPNDUptx4xq&YRp9q5dW7(=Hu^^}jb-JY>Hl7!%UeHC7`bPCxIS#ONN?3#G#wU0{Gc
z+Q^}+9|nbtn17zEcVQGQNVLBmV6qxt6Kg}c_g5}r*Nlu9F6S#c17pLp177_O;z7!N$EU+c=xxVolM
z%d;Z1v44N@lAB~+x>_kjCgEKdQr7^t`zz{7$rqWK^i_~y
z9t(OHHu4-vIV9?WK0xB~j;&W3rcRY7a8Vd1YlGQZz$DCFWxhN2M&ON%wz_+><7Z
zIc-O@0xGfgg;fpeYgTlwGS@Oj)-*nTou)Hz=--2$HQy`K|CyB3;BL2C+}mLAzphwB
zr02<>zf@V??z$}u9>%(7{XdmkcQjmGzdmXr2(JXuB}zn(-ogYC36UtHcO!ZtVPo6he)k3!(zm!)Kr9N<7
zMmw8ets=&C9zX<6rp3hcrh8kP9OWvlO3K;S>3dToBbfcSHl-=AkngJXa|#*P$XtxZbie4?>BBK8Y*2{35;YK^N(Z<4Zer*qn%0yqD4c)e
zY+=Y%7H*0&i92wol6%8tt0Q>nEI{&o*Lxi8UP@-1cqHe>hdX2lN
zBJc@eWPlf9>u5-P%Cxu8u+}Fl#s#GiA&!j?g9W3|`RM%o{Lu%_aXr*|lU0J7=JiA%
zEWXV9?KL_syj>|Vujy2cN86n&%=NQ!4w2h%UDA`~t8
zmZIdV%nB)ax&s2nVZHi67CLPaF8q}d?p!u*AosH=4smATZS_{>ZwU|3RZNJ$AC>iu
zeWc&6#ftGpm^SXtId&DA@tGLeTHXp8T*bgP4Pr&~e{im1%J}(~{B1g^wolNoW+A%(TM__0=w{RP^eV>Y
z?{4(}gsCp9#0rOxaH(sIltDsM*XJ&aM0bC@=qE^F{&$8vR>9POfyMkbL@wn|H@ic)
z?rkX9WxuK)H%K2XjY8&WH#jdio^%GF%o&D181X%OdN~DLC-(p>^qb9P`k1#4t}|Ze
zZ58Jdb|>8`ZidX?J*yUXyw=8-OnfUyz8un=bn5c&?C$@E!cSI(h#a@ZkiC;Hg4|id
z<6~=x6;;CimjW^G1FkzolLh>KE=$SiFF^r2jd8jBcj^}6q%;ag?Zr|sE~`&XZrsvaB-aTD$>W@hXAcia51I}qaEn!vX&cCm)VtX`jV3(3jXf`TA*D5jQZ
zG7vR!y2kkM@C~0^3=9*4gIOLUx=ck+J2%2xWc{S1HmDcyLeMdF&
z1gIkZD`N0Ft$+>>%%>@Z(i?-JEaw0T-)$Q0GIfsvd1!*lx}vfq7x$HQ!D{(WZ=RZv
z++MD+8KQ`lx43(eOvGwdR#2k!XYS!s0e<_Jzn0$Zk}kGV0S&f3GHQ|8Ki%xle>cq0
z61u)e=wG@mzW)Y0#Otp{yj*J4e_iS%-bC8zrwim29|bLpr}no3YmK5A4-U3!EV_tA
zMwe)@I@M*1Tf!7BLWSBbv)vCQ_>+15RYUvF_mG~`lip2dx2A#!nVgY$^=q>l!Lr!P
zGZcCxUX@>@X?xd-(G`C7IU8k@G>-YFzn-K)W{DxTZmnVk-cN>_5TR!&7kdvkeLE`s
zE?;fSaV9@VrL<*FV#6kB!|?DD)i?&-%a^yaexxPWiyspPqkNk~0ZGd{#
z9NLIZaP7Ce>YH)?*8S<{Z(Kwfn_@Px%wl)3Wf?SaA0jz!_DSfamfCccsrj9&&?EoE
zsk#;Je&mg_Ogbz}sqE<2?@%}Yc_QQH4;_Nojy}PP
zsXzRlo0>(V_Y91B_!y(8Ekb?fC^I*zMiqQqF$|#0wU!7RP;!k>DucXbZs(4(W2$vt
z{+G52ZomDs)6rvV2KO+zc=Fe>A?cXzmr6qdrmkez=_vu>EzVWPiCEUu@`6@y^1x^!
zKVK0qI{t2jlso?5C&qbDas!nA-!?o4*5tAz3;{(G%TG8~J@~UK-b1FK5c7+1A)KKv-XsLmW}E59vqsehxWbS^|};F#8^M85nw-mmo|?8u)YM4=3)b40|9MmaUaYz
zB$(p23Dyh}W?t(AKF+VSvPONY%*&@)i!!2#z{r5^*-b~wQ1&pIYF#F+OVR+#igR6YNVC`uB7>jw?#M{
zJ+F^q0<|m%OdJ_~X2|`d+dP>3rbV_x0F3w1f9qUfZE}p%o-@cKRk3~EQfBnlET5RQ
zsux^gKDu%`uHkrrOeSiWMk>u#qqZbp`%bT5pq`}7{LT%bjbM%zlcqmSer7jIa`h5GegeY9WN
zvB3~+h!kh!pkU@myw96{ykjtZ=9K;>^71TGAhEK1&@-j))1p?w#A?#I3zhxcZFW&X
z1i~*sajA3P(N~WBp+kY4m5T;;FQfMvFOT2XB_I;
z+M0%FKaKTWk-$$}gF_|SUAkPG%C^Hr&uU|Gl?h_v!w=~j{mK^{=O$*^g6eK6+67&K
zA_v$`E#}fQP9_icTK1-ow&W%mUV!0NR>eKAMRV&c_`DEU&$z+G_ltt;;G!nVYLm}i
zVzkOsHq(k+-=_|B_O0po-BeppkjgS7q+gGXM+#&ABdOra`8E8HQ)nLP=**Z~W;-l9
z=d-Z2&rpqc%%-yTzMoU7y&68!snKS)$*5DCf4aVAp5{
z943b;%smKIt93`(JNjXk*W{<1GuxrrIP<|e7%W1%T^$IiKJGZ5xDUhhOszk((-c88
zMJ{;%poH=QKjGbul~1U!vdivEGswn+??bmhlm{B$=gHmLVH^yq_N+0eI=#`#&kRiDnElU@Ft-CG_CwI4r
zc)s`Jy>M-XpQ~1&Lrxa!B~t}4#n@o_U_bU3hm!|T3=LIMv3cvFStWAT6{p-1hCXp1
zU>%S0N-j2cyLVSXp=$4YKk0_;#hz$P>0!F=O@`oE&%Gb?)4+AF!P5Mu`*>u+CO$ML
z5UI_y=LTRm(~~BnjJmapzMbSO6(+RjvAqy%cqLT+0(X+0bbEjaHhoPvN#Hpj*8IAA
z&_X*hVSJ2swf}XRY1HBsz`N;inC+ezKmB0Nc8@gB**owJbDK{Opl12`)qoK3RY89D
zx<-J-O=9*9J<@&klxPXY0Jh2H$GdL8Y&7DUAV)Ic7au)WOe^3_6$pG>`S{An3gXug
z*`#phkoIgSk5g2uedz9(b&1#u5acWBWp^Fxwg(}QI|lwyqXc}NS$WgOSR8eR8;T_$
z9x$aK6r%(d1Xnl3#6aEnqNi$XmBr`@$4Yc%MqEN)%_sU?@#kR42G~a5xS3Ef-+-vk
za3v8vY!X(vgK?0QJDQ?Dc#Jw5we`B#<>?hpm7?SSfH_e;L#>IPijI2Mbsx`{z}P(J
zS`=modZpgEY;Dtdj)qns>yPBH00LWX-t@9WTFb04&4AzK55Gth36ldpMMDqOtp(k+
zuV`>WHbkOuM%VW+ZA?h7mB6#S?8qCoa^k+$!=hl^^XRmU6DW
zA4_8??gL4;q*_-;*Jm}8&S35c*IVa0|T=A5i
zDK(+QbDbgFp-)8EA#-LHJ&e*-b2_kGle4z$Uuh031GJ8c&|%orB(|IbOoDgt8EKs2
zCY0>E0or>phY(c)6M&zIbj7F=QTH)KP#zR~j!ERz9W=~D<_~m7|V50R<1SlS7c+$6G0NUYQV$SF%i2k6h?rjOC0oZF?A59Ji)@-Sri?L7-
z%H3WW6n{x@_2gZ>`Ice9mF)Zp%K}+uqpE?topT;7*Fv^l5ll8+@|CX>5-FU?`v_lU
zhAZJQ>_FE2Hrm7_I-w!YbcPmZ#KQ4o3ZO&Xo9Rsu-bPTb*m_K!PgzMyc%)|w~OGQfZT@p{U2y0Fuqa;0-`uFpSwKCss_4_RTxatJ5N
zS*50|BK(&!4hn$;T+SqU;&+bq)eI@mX{!ubaZpC8jWIWN_=o(@ooWaKprC5^nEm31
z0*X6W*1zi!Kvr@}G~OR`0NIS}YW%=sU+4G9&q`3uQ^P;D|l8#-!PjY_S?+%QKDMREh4r2r=8+U2EOC>Lmg?F-!&NB`Z;1Pm;-(UC3)?}mgp>FJGiqw
zK5Og^lOYmVgayBpABJ%s(hGXedy>mkFxo(z%`FD+85j)9o7Me{*n8&HJ#y!Zl-g*u
z1+C`djWa`T!`|+chN!tsxzYD82EJsa?T|seUMC>>9z;dgFMZW{hK
zskJk{E(rFJz?khuX%Oj^tACqC(^xkoBhsWu9j4jLmipezR6AlrQPPh+gn{$4Joy(6
zEaAeRK5rI7yp!HC6z8}?S3|oKe>pt=Gv`rm;46}c)RH^`B4lYkSw?@DRkc}@Jc8c>
z#C+Gt?8tm#gdXzy@IQ;EOSXJ$ZK-zJbFD>> from markup.input import XML
+>>> doc = XML('My document')
+
+This results in a `Stream` object that can be used in a number of way.
+
+>>> doc.render(method='html', encoding='utf-8')
+'My document'
+
+>>> from markup.input import HTML
+>>> doc = HTML('My document')
+>>> doc.render(method='html', encoding='utf-8')
+'My document'
+
+>>> title = doc.select('head/title')
+>>> title.render(method='html', encoding='utf-8')
+'My document'
+
+
+Markup streams can also be generated programmatically using the
+`markup.builder` module:
+
+>>> from markup.builder import tag
+>>> doc = tag.DOC(tag.TITLE('My document'), lang='en')
+>>> doc.generate().render(method='html')
+'My document'
+
+"""
+
+from markup.core import *
+from markup.input import XML, HTML
diff --git a/markup/builder.py b/markup/builder.py
new file mode 100644
--- /dev/null
+++ b/markup/builder.py
@@ -0,0 +1,178 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+from markup.core import Attributes, QName, Stream
+
+__all__ = ['Fragment', 'Element', 'tag']
+
+
+class Fragment(object):
+ __slots__ = ['children']
+
+ def __init__(self):
+ self.children = []
+
+ def append(self, node):
+ """Append an element or string as child node."""
+ if isinstance(node, (Element, basestring, int, float, long)):
+ # For objects of a known/primitive type, we avoid the check for
+ # whether it is iterable for better performance
+ self.children.append(node)
+ elif isinstance(node, Fragment):
+ self.children += node.children
+ elif node is not None:
+ try:
+ children = iter(node)
+ except TypeError:
+ self.children.append(node)
+ else:
+ for child in node:
+ self.append(children)
+
+ def __add__(self, other):
+ return Fragment()(self, other)
+
+ def __call__(self, *args):
+ for arg in args:
+ self.append(arg)
+ return self
+
+ def generate(self):
+ """Generator that yield tags and text nodes as strings."""
+ def _generate():
+ for child in self.children:
+ if isinstance(child, Fragment):
+ for event in child.generate():
+ yield event
+ else:
+ yield Stream.TEXT, child, (-1, -1)
+ return Stream(_generate())
+
+ def __iter__(self):
+ return iter(self.generate())
+
+ def __str__(self):
+ return str(self.generate())
+
+ def __unicode__(self):
+ return unicode(self.generate())
+
+
+class Element(Fragment):
+ """Simple XML output generator based on the builder pattern.
+
+ Construct XML elements by passing the tag name to the constructor:
+
+ >>> print Element('strong')
+
+
+ Attributes can be specified using keyword arguments. The values of the
+ arguments will be converted to strings and any special XML characters
+ escaped:
+
+ >>> print Element('textarea', rows=10, cols=60)
+
+ >>> print Element('span', title='1 < 2')
+
+ >>> print Element('span', title='"baz"')
+
+
+ The " character is escaped using a numerical entity.
+ The order in which attributes are rendered is undefined.
+
+ If an attribute value evaluates to `None`, that attribute is not included
+ in the output:
+
+ >>> print Element('a', name=None)
+
+
+ Attribute names that conflict with Python keywords can be specified by
+ appending an underscore:
+
+ >>> print Element('div', class_='warning')
+
+
+ Nested elements can be added to an element using item access notation.
+ The call notation can also be used for this and for adding attributes
+ using keyword arguments, as one would do in the constructor.
+
+ >>> print Element('ul')(Element('li'), Element('li'))
+
+ >>> print Element('a')('Label')
+ Label
+ >>> print Element('a')('Label', href="target")
+ Label
+
+ Text nodes can be nested in an element by adding strings instead of
+ elements. Any special characters in the strings are escaped automatically:
+
+ >>> print Element('em')('Hello world')
+ Hello world
+ >>> print Element('em')(42)
+ 42
+ >>> print Element('em')('1 < 2')
+ 1 < 2
+
+ This technique also allows mixed content:
+
+ >>> print Element('p')('Hello ', Element('b')('world'))
+ Hello world
+
+ Elements can also be combined with other elements or strings using the
+ addition operator, which results in a `Fragment` object that contains the
+ operands:
+
+ >>> print Element('br') + 'some text' + Element('br')
+
some text
+
+ Elements with a namespace can be generated using the `Namespace` and/or
+ `QName` classes:
+
+ >>> from markup.core import Namespace
+ >>> xhtml = Namespace('http://www.w3.org/1999/xhtml')
+ >>> print Element(xhtml.html, lang='en')
+
+ """
+ __slots__ = ['tag', 'attrib']
+
+ def __init__(self, tag_, **attrib):
+ Fragment.__init__(self)
+ self.tag = QName(tag_)
+ self.attrib = Attributes()
+ self(**attrib)
+
+ def __call__(self, *args, **kwargs):
+ for attr, value in kwargs.items():
+ if value is None:
+ continue
+ attr = attr.rstrip('_').replace('_', '-')
+ self.attrib.set(attr, value)
+ return Fragment.__call__(self, *args)
+
+ def generate(self):
+ """Generator that yield tags and text nodes as strings."""
+ def _generate():
+ yield Stream.START, (self.tag, self.attrib), (-1, -1)
+ for kind, data, pos in Fragment.generate(self):
+ yield kind, data, pos
+ yield Stream.END, self.tag, (-1, -1)
+ return Stream(_generate())
+
+
+class ElementFactory(object):
+
+ def __getattribute__(self, name):
+ return Element(name.lower())
+
+
+tag = ElementFactory()
diff --git a/markup/core.py b/markup/core.py
new file mode 100644
--- /dev/null
+++ b/markup/core.py
@@ -0,0 +1,309 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+"""Core classes for markup processing."""
+
+import htmlentitydefs
+import re
+from StringIO import StringIO
+
+__all__ = ['Stream', 'Markup', 'escape', 'unescape', 'Namespace', 'QName']
+
+
+class StreamEventKind(object):
+ """A kind of event on an XML stream."""
+
+ __slots__ = ['name']
+
+ def __init__(self, name):
+ self.name = name
+
+ def __repr__(self):
+ return self.name
+
+
+class Stream(object):
+ """Represents a stream of markup events.
+
+ This class is basically an iterator over the events.
+
+ Also provided are ways to serialize the stream to text. The `serialize()`
+ method will return an iterator over generated strings, while `render()`
+ returns the complete generated text at once. Both accept various parameters
+ that impact the way the stream is serialized.
+
+ Stream events are tuples of the form:
+
+ (kind, data, position)
+
+ where `kind` is the event kind (such as `START`, `END`, `TEXT`, etc), `data`
+ depends on the kind of event, and `position` is a `(line, offset)` tuple
+ that contains the location of the original element or text in the input.
+ """
+ __slots__ = ['events']
+
+ START = StreamEventKind('start') # a start tag
+ END = StreamEventKind('end') # an end tag
+ TEXT = StreamEventKind('text') # literal text
+ EXPR = StreamEventKind('expr') # an expression
+ SUB = StreamEventKind('sub') # a "subprogram"
+ PROLOG = StreamEventKind('prolog') # XML prolog
+ DOCTYPE = StreamEventKind('doctype') # doctype declaration
+ START_NS = StreamEventKind('start-ns') # start namespace mapping
+ END_NS = StreamEventKind('end-ns') # end namespace mapping
+ PI = StreamEventKind('pi') # processing instruction
+ COMMENT = StreamEventKind('comment') # comment
+
+ def __init__(self, events):
+ """Initialize the stream with a sequence of markup events.
+
+ @oaram events: a sequence or iterable providing the events
+ """
+ self.events = events
+
+ def __iter__(self):
+ return iter(self.events)
+
+ def render(self, method='xml', encoding='utf-8', **kwargs):
+ """Return a string representation of the stream.
+
+ @param method: determines how the stream is serialized; can be either
+ 'xml' or 'html', or a custom `Serializer` subclass
+ @param encoding: how the output string should be encoded; if set to
+ `None`, this method returns a `unicode` object
+
+ Any additional keyword arguments are passed to the serializer, and thus
+ depend on the `method` parameter value.
+ """
+ retval = u''.join(self.serialize(method=method, **kwargs))
+ if encoding is not None:
+ return retval.encode('utf-8')
+ return retval
+
+ def select(self, path):
+ """Return a new stream that contains the events matching the given
+ XPath expression.
+
+ @param path: a string containing the XPath expression
+ """
+ from markup.path import Path
+ path = Path(path)
+ return path.select(self)
+
+ def serialize(self, method='xml', **kwargs):
+ """Generate strings corresponding to a specific serialization of the
+ stream.
+
+ Unlike the `render()` method, this method is a generator this returns
+ the serialized output incrementally, as opposed to returning a single
+ string.
+
+ @param method: determines how the stream is serialized; can be either
+ 'xml' or 'html', or a custom `Serializer` subclass
+ """
+ from markup import output
+ cls = method
+ if isinstance(method, basestring):
+ cls = {'xml': output.XMLSerializer,
+ 'html': output.HTMLSerializer}[method]
+ else:
+ assert issubclass(cls, serializers.Serializer)
+ serializer = cls(**kwargs)
+ return serializer.serialize(self)
+
+ def __str__(self):
+ return self.render()
+
+ def __unicode__(self):
+ return self.render(encoding=None)
+
+
+class Attributes(list):
+
+ def __init__(self, attrib=None):
+ list.__init__(self, map(lambda (k, v): (QName(k), v), attrib or []))
+
+ def __contains__(self, name):
+ return name in [attr for attr, value in self]
+
+ def get(self, name, default=None):
+ for attr, value in self:
+ if attr == name:
+ return value
+ return default
+
+ def set(self, name, value):
+ for idx, (attr, _) in enumerate(self):
+ if attr == name:
+ self[idx] = (attr, value)
+ break
+ else:
+ self.append((QName(name), value))
+
+
+class Markup(unicode):
+ """Marks a string as being safe for inclusion in HTML/XML output without
+ needing to be escaped.
+ """
+ def __new__(self, text='', *args):
+ if args:
+ text %= tuple([escape(arg) for arg in args])
+ return unicode.__new__(self, text)
+
+ def __add__(self, other):
+ return Markup(unicode(self) + Markup.escape(other))
+
+ def __mod__(self, args):
+ if not isinstance(args, (list, tuple)):
+ args = [args]
+ return Markup(unicode.__mod__(self,
+ tuple([escape(arg) for arg in args])))
+
+ def __mul__(self, num):
+ return Markup(unicode(self) * num)
+
+ def join(self, seq):
+ return Markup(unicode(self).join([Markup.escape(item) for item in seq]))
+
+ def stripentities(self, keepxmlentities=False):
+ """Return a copy of the text with any character or numeric entities
+ replaced by the equivalent UTF-8 characters.
+
+ If the `keepxmlentities` parameter is provided and evaluates to `True`,
+ the core XML entities (&, ', >, < and ").
+ """
+ def _replace_entity(match):
+ if match.group(1): # numeric entity
+ ref = match.group(1)
+ if ref.startswith('x'):
+ ref = int(ref[1:], 16)
+ else:
+ ref = int(ref, 10)
+ return unichr(ref)
+ else: # character entity
+ ref = match.group(2)
+ if keepxmlentities and ref in ('amp', 'apos', 'gt', 'lt', 'quot'):
+ return '&%s;' % ref
+ try:
+ codepoint = htmlentitydefs.name2codepoint[ref]
+ return unichr(codepoint)
+ except KeyError:
+ if keepxmlentities:
+ return '&%s;' % ref
+ else:
+ return ref
+ return Markup(re.sub(r'&(?:#((?:\d+)|(?:[xX][0-9a-fA-F]+));?|(\w+);)',
+ _replace_entity, self))
+
+ def striptags(self):
+ """Return a copy of the text with all XML/HTML tags removed."""
+ return Markup(re.sub(r'<[^>]*?>', '', self))
+
+ def escape(cls, text, quotes=True):
+ """Create a Markup instance from a string and escape special characters
+ it may contain (<, >, & and \").
+
+ If the `quotes` parameter is set to `False`, the \" character is left
+ as is. Escaping quotes is generally only required for strings that are
+ to be used in attribute values.
+ """
+ if isinstance(text, cls):
+ return text
+ text = unicode(text)
+ if not text:
+ return cls()
+ text = text.replace('&', '&') \
+ .replace('<', '<') \
+ .replace('>', '>')
+ if quotes:
+ text = text.replace('"', '"')
+ return cls(text)
+ escape = classmethod(escape)
+
+ def unescape(self):
+ """Reverse-escapes &, <, > and \" and returns a `unicode` object."""
+ if not self:
+ return ''
+ return unicode(self).replace('"', '"') \
+ .replace('>', '>') \
+ .replace('<', '<') \
+ .replace('&', '&')
+
+ def plaintext(self, keeplinebreaks=True):
+ """Returns the text as a `unicode`with all entities and tags removed."""
+ text = unicode(self.striptags().stripentities())
+ if not keeplinebreaks:
+ text = text.replace('\n', ' ')
+ return text
+
+ def sanitize(self):
+ from markup.filters import HTMLSanitizer
+ from markup.input import HTMLParser
+ sanitize = HTMLSanitizer()
+ text = self.stripentities(keepxmlentities=True)
+ return Stream(sanitize(HTMLParser(StringIO(text)), None))
+
+
+escape = Markup.escape
+
+def unescape(text):
+ """Reverse-escapes &, <, > and \" and returns a `unicode` object."""
+ if not isinstance(text, Markup):
+ return text
+ return text.unescape()
+
+
+class Namespace(object):
+
+ def __init__(self, uri):
+ self.uri = uri
+
+ def __getitem__(self, name):
+ return QName(self.uri + '}' + name)
+
+ __getattr__ = __getitem__
+
+ def __repr__(self):
+ return '' % self.uri
+
+ def __str__(self):
+ return self.uri
+
+ def __unicode__(self):
+ return unicode(self.uri)
+
+
+class QName(unicode):
+ """A qualified element or attribute name.
+
+ The unicode value of instances of this class contains the qualified name of
+ the element or attribute, in the form `{namespace}localname`. The namespace
+ URI can be obtained through the additional `namespace` attribute, while the
+ local name can be accessed through the `localname` attribute.
+ """
+ __slots__ = ['namespace', 'localname']
+
+ def __new__(cls, qname):
+ if isinstance(qname, QName):
+ return qname
+
+ parts = qname.split('}', 1)
+ if qname.find('}') > 0:
+ self = unicode.__new__(cls, '{' + qname)
+ self.namespace = parts[0]
+ self.localname = parts[1]
+ else:
+ self = unicode.__new__(cls, qname)
+ self.namespace = None
+ self.localname = qname
+ return self
diff --git a/markup/eval.py b/markup/eval.py
new file mode 100644
--- /dev/null
+++ b/markup/eval.py
@@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import __builtin__
+import compiler
+import operator
+
+from markup.core import Stream
+
+__all__ = ['Expression']
+
+
+class Expression(object):
+ """Evaluates Python expressions used in templates.
+
+ >>> data = dict(test='Foo', items=[1, 2, 3], dict={'some': 'thing'})
+ >>> Expression('test').evaluate(data)
+ 'Foo'
+ >>> Expression('items[0]').evaluate(data)
+ 1
+ >>> Expression('items[-1]').evaluate(data)
+ 3
+ >>> Expression('dict["some"]').evaluate(data)
+ 'thing'
+
+ Similar to e.g. Javascript, expressions in templates can use the dot
+ notation for attribute access to access items in mappings:
+
+ >>> Expression('dict.some').evaluate(data)
+ 'thing'
+
+ This also works the other way around: item access can be used to access
+ any object attribute (meaning there's no use for `getattr()` in templates):
+
+ >>> class MyClass(object):
+ ... myattr = 'Bar'
+ >>> data = dict(mine=MyClass(), key='myattr')
+ >>> Expression('mine.myattr').evaluate(data)
+ 'Bar'
+ >>> Expression('mine["myattr"]').evaluate(data)
+ 'Bar'
+ >>> Expression('mine[key]').evaluate(data)
+ 'Bar'
+
+ Most of the standard Python operators are also available to template
+ expressions. Bitwise operators (including inversion and shifting) are not
+ supported.
+
+ >>> Expression('1 + 1').evaluate(data)
+ 2
+ >>> Expression('3 - 1').evaluate(data)
+ 2
+ >>> Expression('1 * 2').evaluate(data)
+ 2
+ >>> Expression('4 / 2').evaluate(data)
+ 2
+ >>> Expression('4 // 3').evaluate(data)
+ 1
+ >>> Expression('4 % 3').evaluate(data)
+ 1
+ >>> Expression('2 ** 3').evaluate(data)
+ 8
+ >>> Expression('not True').evaluate(data)
+ False
+ >>> Expression('True and False').evaluate(data)
+ False
+ >>> Expression('True or False').evaluate(data)
+ True
+ >>> Expression('1 == 3').evaluate(data)
+ False
+ >>> Expression('1 != 3 == 3').evaluate(data)
+ True
+
+ Built-in functions such as `len()` are also available in template
+ expressions:
+
+ >>> data = dict(items=[1, 2, 3])
+ >>> Expression('len(items)').evaluate(data)
+ 3
+ """
+ __slots__ = ['source', 'ast']
+ __visitors = {}
+
+ def __init__(self, source):
+ self.source = source
+ self.ast = None
+
+ def evaluate(self, data, default=None):
+ if not self.ast:
+ self.ast = compiler.parse(self.source, 'eval')
+ retval = self._visit(self.ast.node, data)
+ if retval is not None:
+ return retval
+ return default
+
+ def __repr__(self):
+ return '' % self.source
+
+ # AST traversal
+
+ def _visit(self, node, data):
+ v = self.__visitors.get(node.__class__)
+ if not v:
+ v = getattr(self, '_visit_%s' % node.__class__.__name__.lower())
+ self.__visitors[node.__class__] = v
+ return v(node, data)
+
+ def _visit_expression(self, node, data):
+ for child in node.getChildNodes():
+ return self._visit(child, data)
+
+ # Functions & Accessors
+
+ def _visit_callfunc(self, node, data):
+ func = self._visit(node.node, data)
+ if func is None:
+ return None
+ args = [self._visit(arg, data) for arg in node.args
+ if not isinstance(arg, compiler.ast.Keyword)]
+ kwargs = dict([(arg.name, self._visit(arg.expr, data)) for arg
+ in node.args if isinstance(arg, compiler.ast.Keyword)])
+ return func(*args, **kwargs)
+
+ def _visit_getattr(self, node, data):
+ obj = self._visit(node.expr, data)
+ try:
+ return getattr(obj, node.attrname)
+ except AttributeError, e:
+ try:
+ return obj[node.attrname]
+ except (KeyError, TypeError):
+ return None
+
+ def _visit_slice(self, node, data):
+ obj = self._visit(node.expr, data)
+ lower = node.lower and self._visit(node.lower, data) or None
+ upper = node.upper and self._visit(node.upper, data) or None
+ return obj[lower:upper]
+
+ def _visit_subscript(self, node, data):
+ obj = self._visit(node.expr, data)
+ subs = map(lambda sub: self._visit(sub, data), node.subs)
+ if len(subs) == 1:
+ subs = subs[0]
+ try:
+ return obj[subs]
+ except (KeyError, IndexError, TypeError):
+ try:
+ return getattr(obj, subs)
+ except (AttributeError, TypeError):
+ return None
+
+ # Operators
+
+ def _visit_and(self, node, data):
+ return reduce(operator.and_, [self._visit(n, data) for n in node.nodes])
+
+ def _visit_or(self, node, data):
+ return reduce(operator.or_, [self._visit(n, data) for n in node.nodes])
+
+ _OP_MAP = {'==': operator.eq, '!=': operator.ne,
+ '<': operator.lt, '<=': operator.le,
+ '>': operator.gt, '>=': operator.ge,
+ 'in': operator.contains}
+ def _visit_compare(self, node, data):
+ result = self._visit(node.expr, data)
+ ops = node.ops[:]
+ ops.reverse()
+ for op, rval in ops:
+ result = self._OP_MAP[op](result, self._visit(rval, data))
+ return result
+
+ def _visit_add(self, node, data):
+ return self._visit(node.left, data) + self._visit(node.right, data)
+
+ def _visit_div(self, node, data):
+ return self._visit(node.left, data) / self._visit(node.right, data)
+
+ def _visit_floordiv(self, node, data):
+ return self._visit(node.left, data) // self._visit(node.right, data)
+
+ def _visit_mod(self, node, data):
+ return self._visit(node.left, data) % self._visit(node.right, data)
+
+ def _visit_mul(self, node, data):
+ return self._visit(node.left, data) * self._visit(node.right, data)
+
+ def _visit_power(self, node, data):
+ return self._visit(node.left, data) ** self._visit(node.right, data)
+
+ def _visit_sub(self, node, data):
+ return self._visit(node.left, data) - self._visit(node.right, data)
+
+ def _visit_not(self, node, data):
+ return not self._visit(node.expr, data)
+
+ def _visit_unaryadd(self, node, data):
+ return +self._visit(node.expr, data)
+
+ def _visit_unarysub(self, node, data):
+ return -self._visit(node.expr, data)
+
+ # Identifiers & Literals
+
+ def _visit_name(self, node, data):
+ val = data.get(node.name)
+ if val is None:
+ val = getattr(__builtin__, node.name, None)
+ return val
+
+ def _visit_const(self, node, data):
+ return node.value
+
+ def _visit_dict(self, node, data):
+ return dict([(self._visit(k, data), self._visit(v, data))
+ for k, v in node.items])
+
+ def _visit_tuple(self, node, data):
+ return tuple([self._visit(n, data) for n in node.nodes])
+
+ def _visit_list(self, node, data):
+ return [self._visit(n, data) for n in node.nodes]
diff --git a/markup/filters.py b/markup/filters.py
new file mode 100644
--- /dev/null
+++ b/markup/filters.py
@@ -0,0 +1,319 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+"""Implementation of a number of stream filters."""
+
+try:
+ frozenset
+except NameError:
+ from sets import ImmutableSet as frozenset
+import re
+
+from markup.core import Attributes, Markup, Stream
+from markup.path import Path
+
+__all__ = ['EvalFilter', 'IncludeFilter', 'MatchFilter', 'WhitespaceFilter',
+ 'HTMLSanitizer']
+
+
+class EvalFilter(object):
+ """Responsible for evaluating expressions in a template."""
+
+ def __call__(self, stream, ctxt=None):
+ for kind, data, pos in stream:
+
+ if kind is Stream.START:
+ # Attributes may still contain expressions in start tags at
+ # this point, so do some evaluation
+ tag, attrib = data
+ new_attrib = []
+ for name, substream in attrib:
+ if isinstance(substream, basestring):
+ value = substream
+ else:
+ values = []
+ for subkind, subdata, subpos in substream:
+ if subkind is Stream.EXPR:
+ values.append(subdata.evaluate(ctxt))
+ else:
+ values.append(subdata)
+ value = filter(lambda x: x is not None, values)
+ if not value:
+ continue
+ new_attrib.append((name, ''.join(value)))
+ yield kind, (tag, Attributes(new_attrib)), pos
+
+ elif kind is Stream.EXPR:
+ result = data.evaluate(ctxt)
+ if result is None:
+ continue
+
+ # First check for a string, otherwise the iterable
+ # test below succeeds, and the string will be
+ # chopped up into characters
+ if isinstance(result, basestring):
+ yield Stream.TEXT, result, pos
+ else:
+ # Test if the expression evaluated to an
+ # iterable, in which case we yield the
+ # individual items
+ try:
+ yield Stream.SUB, ([], iter(result)), pos
+ except TypeError:
+ # Neither a string nor an iterable, so just
+ # pass it through
+ yield Stream.TEXT, unicode(result), pos
+
+ else:
+ yield kind, data, pos
+
+
+class IncludeFilter(object):
+ """Template filter providing (very) basic XInclude support
+ (see http://www.w3.org/TR/xinclude/) in templates.
+ """
+
+ _NAMESPACE = 'http://www.w3.org/2001/XInclude'
+
+ def __init__(self, loader):
+ """Initialize the filter.
+
+ @param loader: the `TemplateLoader` to use for resolving references to
+ external template files
+ """
+ self.loader = loader
+
+ def __call__(self, stream, ctxt=None):
+ """Filter the stream, processing any XInclude directives it may
+ contain.
+
+ @param ctxt: the template context
+ @param stream: the markup event stream to filter
+ """
+ from markup.template import TemplateError, TemplateNotFound
+
+ in_fallback = False
+ include_href, fallback_stream = None, None
+ indent = 0
+
+ for kind, data, pos in stream:
+
+ if kind is Stream.START and data[0].namespace == self._NAMESPACE \
+ and not in_fallback:
+ tag, attrib = data
+ if tag.localname == 'include':
+ include_href = attrib.get('href')
+ indent = pos[1]
+ elif tag.localname == 'fallback':
+ in_fallback = True
+ fallback_stream = []
+
+ elif kind is Stream.END and data.namespace == self._NAMESPACE:
+ if data.localname == 'include':
+ try:
+ if not include_href:
+ raise TemplateError('Include misses required '
+ 'attribute "href"')
+ template = self.loader.load(include_href)
+ for ikind, idata, ipos in template.generate(ctxt):
+ # Fixup indentation of included markup
+ if ikind is Stream.TEXT:
+ idata = idata.replace('\n', '\n' + ' ' * indent)
+ yield ikind, idata, ipos
+
+ # If the included template defines any filters added at
+ # runtime (such as py:match templates), those need to be
+ # applied to the including template, too.
+ for filter_ in template.filters:
+ stream = filter_(stream, ctxt)
+
+ except TemplateNotFound:
+ if fallback_stream is None:
+ raise
+ for event in fallback_stream:
+ yield event
+
+ include_href = None
+ fallback_stream = None
+ indent = 0
+ break
+ elif data.localname == 'fallback':
+ in_fallback = False
+
+ elif in_fallback:
+ fallback_stream.append((kind, data, pos))
+
+ elif kind is Stream.START_NS and data[1] == self._NAMESPACE:
+ continue
+
+ else:
+ yield kind, data, pos
+ else:
+ # The loop exited normally, so there shouldn't be further events to
+ # process
+ return
+
+ for event in self(stream, ctxt):
+ yield event
+
+
+class MatchFilter(object):
+ """A filter that delegates to a given handler function when the input stream
+ matches some path expression.
+ """
+
+ def __init__(self, path, handler):
+ self.path = Path(path)
+ self.handler = handler
+
+ def __call__(self, stream, ctxt=None):
+ test = self.path.test()
+ for kind, data, pos in stream:
+ result = test(kind, data, pos)
+ if result is True:
+ content = [(kind, data, pos)]
+ depth = 1
+ while depth > 0:
+ ev = stream.next()
+ if ev[0] is Stream.START:
+ depth += 1
+ elif ev[0] is Stream.END:
+ depth -= 1
+ content.append(ev)
+ test(*ev)
+
+ yield (Stream.SUB,
+ ([lambda stream, ctxt: self.handler(content, ctxt)], []),
+ pos)
+ else:
+ yield kind, data, pos
+
+
+class WhitespaceFilter(object):
+ """A filter that removes extraneous white space from the stream.
+
+ Todo:
+ * Support for xml:space
+ """
+
+ _TRAILING_SPACE = re.compile('[ \t]+(?=\n)')
+ _LINE_COLLAPSE = re.compile('\n{2,}')
+
+ def __call__(self, stream, ctxt=None):
+ textbuf = []
+ prev_kind = None
+ for kind, data, pos in stream:
+ if kind is Stream.TEXT:
+ textbuf.append(data)
+ elif prev_kind is Stream.TEXT:
+ text = ''.join(textbuf)
+ text = self._TRAILING_SPACE.sub('', text)
+ text = self._LINE_COLLAPSE.sub('\n', text)
+ yield Stream.TEXT, text, pos
+ del textbuf[:]
+ prev_kind = kind
+ if kind is not Stream.TEXT:
+ yield kind, data, pos
+
+ if textbuf:
+ text = self._LINE_COLLAPSE.sub('\n', ''.join(textbuf))
+ yield Stream.TEXT, text, pos
+
+
+class HTMLSanitizer(object):
+ """A filter that removes potentially dangerous HTML tags and attributes
+ from the stream.
+ """
+
+ _SAFE_TAGS = frozenset(['a', 'abbr', 'acronym', 'address', 'area', 'b',
+ 'big', 'blockquote', 'br', 'button', 'caption', 'center', 'cite',
+ 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt',
+ 'em', 'fieldset', 'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
+ 'hr', 'i', 'img', 'input', 'ins', 'kbd', 'label', 'legend', 'li', 'map',
+ 'menu', 'ol', 'optgroup', 'option', 'p', 'pre', 'q', 's', 'samp',
+ 'select', 'small', 'span', 'strike', 'strong', 'sub', 'sup', 'table',
+ 'tbody', 'td', 'textarea', 'tfoot', 'th', 'thead', 'tr', 'tt', 'u',
+ 'ul', 'var'])
+
+ _SAFE_ATTRS = frozenset(['abbr', 'accept', 'accept-charset', 'accesskey',
+ 'action', 'align', 'alt', 'axis', 'border', 'cellpadding',
+ 'cellspacing', 'char', 'charoff', 'charset', 'checked', 'cite', 'class',
+ 'clear', 'cols', 'colspan', 'color', 'compact', 'coords', 'datetime',
+ 'dir', 'disabled', 'enctype', 'for', 'frame', 'headers', 'height',
+ 'href', 'hreflang', 'hspace', 'id', 'ismap', 'label', 'lang',
+ 'longdesc', 'maxlength', 'media', 'method', 'multiple', 'name',
+ 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly', 'rel', 'rev',
+ 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
+ 'span', 'src', 'start', 'style', 'summary', 'tabindex', 'target',
+ 'title', 'type', 'usemap', 'valign', 'value', 'vspace', 'width'])
+ _URI_ATTRS = frozenset(['action', 'background', 'dynsrc', 'href', 'lowsrc',
+ 'src'])
+ _SAFE_SCHEMES = frozenset(['file', 'ftp', 'http', 'https', 'mailto', None])
+
+ def __call__(self, stream, ctxt=None):
+ waiting_for = None
+
+ for kind, data, pos in stream:
+ if kind is Stream.START:
+ if waiting_for:
+ continue
+ tag, attrib = data
+ if tag not in self._SAFE_TAGS:
+ waiting_for = tag
+ continue
+
+ new_attrib = []
+ for attr, value in attrib:
+ if attr not in self._SAFE_ATTRS:
+ continue
+ elif attr in self._URI_ATTRS:
+ # Don't allow URI schemes such as "javascript:"
+ if self._get_scheme(value) not in self._SAFE_SCHEMES:
+ continue
+ elif attr == 'style':
+ # Remove dangerous CSS declarations from inline styles
+ decls = []
+ for decl in filter(None, value.split(';')):
+ is_evil = False
+ if 'expression' in decl:
+ is_evil = True
+ for m in re.finditer(r'url\s*\(([^)]+)', decl):
+ if self._get_scheme(m.group(1)) not in self._SAFE_SCHEMES:
+ is_evil = True
+ break
+ if not is_evil:
+ decls.append(decl.strip())
+ if not decls:
+ continue
+ value = '; '.join(decls)
+ new_attrib.append((attr, value))
+
+ yield kind, (tag, new_attrib), pos
+
+ elif kind is Stream.END:
+ tag = data
+ if waiting_for:
+ if waiting_for == tag:
+ waiting_for = None
+ else:
+ yield kind, data, pos
+
+ else:
+ if not waiting_for:
+ yield kind, data, pos
+
+ def _get_scheme(self, text):
+ if ':' not in text:
+ return None
+ chars = [char for char in text.split(':', 1)[0] if char.isalnum()]
+ return ''.join(chars).lower()
diff --git a/markup/input.py b/markup/input.py
new file mode 100644
--- /dev/null
+++ b/markup/input.py
@@ -0,0 +1,202 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+from xml.parsers import expat
+try:
+ frozenset
+except NameError:
+ from sets import ImmutableSet as frozenset
+import HTMLParser as html
+import htmlentitydefs
+import re
+from StringIO import StringIO
+
+from markup.core import Attributes, Markup, QName, Stream
+
+
+class XMLParser(object):
+ """Generator-based XML parser based on roughly equivalent code in
+ Kid/ElementTree."""
+
+ def __init__(self, source):
+ self.source = source
+
+ # Setup the Expat parser
+ parser = expat.ParserCreate('utf-8', '}')
+ parser.buffer_text = True
+ parser.returns_unicode = True
+ parser.StartElementHandler = self._handle_start
+ parser.EndElementHandler = self._handle_end
+ parser.CharacterDataHandler = self._handle_data
+ parser.XmlDeclHandler = self._handle_prolog
+ parser.StartDoctypeDeclHandler = self._handle_doctype
+ parser.StartNamespaceDeclHandler = self._handle_start_ns
+ parser.EndNamespaceDeclHandler = self._handle_end_ns
+ parser.ProcessingInstructionHandler = self._handle_pi
+ parser.CommentHandler = self._handle_comment
+ parser.DefaultHandler = self._handle_other
+
+ # Location reporting is only support in Python >= 2.4
+ if not hasattr(parser, 'CurrentLineNumber'):
+ self.getpos = self._getpos_unknown
+
+ self.expat = parser
+ self.queue = []
+
+ def __iter__(self):
+ bufsize = 4 * 1024 # 4K
+ done = False
+ while True:
+ while not done and len(self.queue) == 0:
+ data = self.source.read(bufsize)
+ if data == '': # end of data
+ if hasattr(self, 'expat'):
+ self.expat.Parse('', True)
+ del self.expat # get rid of circular references
+ done = True
+ else:
+ self.expat.Parse(data, False)
+ for event in self.queue:
+ yield event
+ self.queue = []
+ if done:
+ break
+
+ def _getpos_unknown(self):
+ return (-1, -1)
+
+ def getpos(self):
+ return self.expat.CurrentLineNumber, self.expat.CurrentColumnNumber
+
+ def _handle_start(self, tag, attrib):
+ self.queue.append((Stream.START, (QName(tag), Attributes(attrib.items())),
+ self.getpos()))
+
+ def _handle_end(self, tag):
+ self.queue.append((Stream.END, QName(tag), self.getpos()))
+
+ def _handle_data(self, text):
+ self.queue.append((Stream.TEXT, text, self.getpos()))
+
+ def _handle_prolog(self, version, encoding, standalone):
+ self.queue.append((Stream.PROLOG, (version, encoding, standalone),
+ self.getpos()))
+
+ def _handle_doctype(self, name, sysid, pubid, has_internal_subset):
+ self.queue.append((Stream.DOCTYPE, (name, pubid, sysid), self.getpos()))
+
+ def _handle_start_ns(self, prefix, uri):
+ self.queue.append((Stream.START_NS, (prefix or '', uri), self.getpos()))
+
+ def _handle_end_ns(self, prefix):
+ self.queue.append((Stream.END_NS, prefix or '', self.getpos()))
+
+ def _handle_pi(self, target, data):
+ self.queue.append((Stream.PI, (target, data), self.getpos()))
+
+ def _handle_comment(self, text):
+ self.queue.append((Stream.COMMENT, text, self.getpos()))
+
+ def _handle_other(self, text):
+ if text.startswith('&'):
+ # deal with undefined entities
+ try:
+ text = unichr(htmlentitydefs.name2codepoint[text[1:-1]])
+ self.queue.append((Stream.TEXT, text, self.getpos()))
+ except KeyError:
+ lineno, offset = self.getpos()
+ raise expat.error("undefined entity %s: line %d, column %d" %
+ (text, lineno, offset))
+
+
+def XML(text):
+ return Stream(list(XMLParser(StringIO(text))))
+
+
+class HTMLParser(html.HTMLParser):
+ """Parser for HTML input based on the Python `HTMLParser` module.
+
+ This class provides the same interface for generating stream events as
+ `XMLParser`, and attempts to automatically balance tags.
+ """
+
+ _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame',
+ 'hr', 'img', 'input', 'isindex', 'link', 'meta',
+ 'param'])
+
+ def __init__(self, source):
+ html.HTMLParser.__init__(self)
+ self.source = source
+ self.queue = []
+ self._open_tags = []
+
+ def __iter__(self):
+ bufsize = 4 * 1024 # 4K
+ done = False
+ while True:
+ while not done and len(self.queue) == 0:
+ data = self.source.read(bufsize)
+ if data == '': # end of data
+ self.close()
+ done = True
+ else:
+ self.feed(data)
+ for kind, data, pos in self.queue:
+ yield kind, data, pos
+ self.queue = []
+ if done:
+ open_tags = self._open_tags
+ open_tags.reverse()
+ for tag in open_tags:
+ yield Stream.END, QName(tag), pos
+ break
+
+ def handle_starttag(self, tag, attrib):
+ pos = self.getpos()
+ self.queue.append((Stream.START, (QName(tag), Attributes(attrib)), pos))
+ if tag in self._EMPTY_ELEMS:
+ self.queue.append((Stream.END, QName(tag), pos))
+ else:
+ self._open_tags.append(tag)
+
+ def handle_endtag(self, tag):
+ if tag not in self._EMPTY_ELEMS:
+ pos = self.getpos()
+ while self._open_tags:
+ open_tag = self._open_tags.pop()
+ if open_tag.lower() == tag.lower():
+ break
+ self.queue.append((Stream.END, QName(open_tag), pos))
+ self.queue.append((Stream.END, QName(tag), pos))
+
+ def handle_data(self, text):
+ self.queue.append((Stream.TEXT, text, self.getpos()))
+
+ def handle_charref(self, name):
+ self.queue.append((Stream.TEXT, Markup('%s;' % name), self.getpos()))
+
+ def handle_entityref(self, name):
+ self.queue.append((Stream.TEXT, Markup('&%s;' % name), self.getpos()))
+
+ def handle_pi(self, data):
+ target, data = data.split(maxsplit=1)
+ data = data.rstrip('?')
+ self.queue.append((Stream.PI, (target.strip(), data.strip()),
+ self.getpos()))
+
+ def handle_comment(self, text):
+ self.queue.append((Stream.COMMENT, text, self.getpos()))
+
+
+def HTML(text):
+ return Stream(list(HTMLParser(StringIO(text))))
diff --git a/markup/output.py b/markup/output.py
new file mode 100644
--- /dev/null
+++ b/markup/output.py
@@ -0,0 +1,199 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+"""This module provides different kinds of serialization methods for XML event
+streams.
+"""
+
+try:
+ frozenset
+except NameError:
+ from sets import ImmutableSet as frozenset
+
+from markup.core import Markup, QName, Stream
+from markup.filters import WhitespaceFilter
+
+__all__ = ['Serializer', 'XMLSerializer', 'HTMLSerializer']
+
+
+class Serializer(object):
+ """Base class for serializers."""
+
+ def serialize(self, stream):
+ raise NotImplementedError
+
+
+class XMLSerializer(Serializer):
+ """Produces XML text from an event stream.
+
+ >>> from markup.builder import tag
+ >>> elem = tag.DIV(tag.A(href='foo'), tag.BR, tag.HR(noshade=True))
+ >>> print ''.join(XMLSerializer().serialize(elem.generate()))
+
+ """
+
+ def serialize(self, stream):
+ ns_attrib = []
+ ns_mapping = {}
+
+ stream = PushbackIterator(stream)
+ for kind, data, pos in stream:
+
+ if kind is Stream.DOCTYPE:
+ # FIXME: what if there's no system or public ID in the input?
+ yield Markup('\n' % data)
+
+ elif kind is Stream.START_NS:
+ prefix, uri = data
+ if uri not in ns_mapping:
+ ns_mapping[uri] = prefix
+ if not prefix:
+ ns_attrib.append((QName('xmlns'), uri))
+ else:
+ ns_attrib.append((QName('xmlns:%s' % prefix), uri))
+
+ elif kind is Stream.START:
+ tag, attrib = data
+
+ tagname = tag.localname
+ if tag.namespace:
+ try:
+ prefix = ns_mapping[tag.namespace]
+ if prefix:
+ tagname = prefix + ':' + tag.localname
+ except KeyError:
+ ns_attrib.append((QName('xmlns'), tag.namespace))
+ buf = ['<', tagname]
+
+ if ns_attrib:
+ attrib.extend(ns_attrib)
+ ns_attrib = []
+ for attr, value in attrib:
+ attrname = attr.localname
+ if attr.namespace:
+ try:
+ prefix = ns_mapping[attr.namespace]
+ except KeyError:
+ # FIXME: synthesize a prefix for the attribute?
+ prefix = ''
+ if prefix:
+ attrname = prefix + ':' + attrname
+ buf.append(' %s="%s"' % (attrname, Markup.escape(value)))
+
+ kind, data, pos = stream.next()
+ if kind is Stream.END:
+ buf.append('/>')
+ else:
+ buf.append('>')
+ stream.pushback((kind, data, pos))
+
+ yield Markup(''.join(buf))
+
+ elif kind is Stream.END:
+ tag = data
+ tagname = tag.localname
+ if tag.namespace:
+ prefix = ns_mapping[tag.namespace]
+ if prefix:
+ tagname = prefix + ':' + tag.localname
+ yield Markup('%s>' % tagname)
+
+ elif kind is Stream.TEXT:
+ yield Markup.escape(data, quotes=False)
+
+
+class HTMLSerializer(Serializer):
+ """Produces HTML text from an event stream.
+
+ >>> from markup.builder import tag
+ >>> elem = tag.DIV(tag.A(href='foo'), tag.BR, tag.HR(noshade=True))
+ >>> print ''.join(HTMLSerializer().serialize(elem.generate()))
+
+ """
+
+ NAMESPACE = 'http://www.w3.org/1999/xhtml'
+
+ _EMPTY_ELEMS = frozenset(['area', 'base', 'basefont', 'br', 'col', 'frame',
+ 'hr', 'img', 'input', 'isindex', 'link', 'meta',
+ 'param'])
+ _BOOLEAN_ATTRS = frozenset(['selected', 'checked', 'compact', 'declare',
+ 'defer', 'disabled', 'ismap', 'multiple',
+ 'nohref', 'noresize', 'noshade', 'nowrap'])
+
+ def serialize(self, stream):
+ ns_mapping = {}
+
+ stream = PushbackIterator(stream)
+ for kind, data, pos in stream:
+
+ if kind is Stream.DOCTYPE:
+ yield Markup('\n' % data)
+
+ elif kind is Stream.START_NS:
+ prefix, uri = data
+ if uri not in ns_mapping:
+ ns_mapping[uri] = prefix
+
+ elif kind is Stream.START:
+ tag, attrib = data
+ if tag.namespace and tag.namespace != self.NAMESPACE:
+ continue # not in the HTML namespace, so don't emit
+ buf = ['<', tag.localname]
+ for attr, value in attrib:
+ if attr.namespace and attr.namespace != self.NAMESPACE:
+ continue # not in the HTML namespace, so don't emit
+ if attr.localname in self._BOOLEAN_ATTRS:
+ if value:
+ buf.append(' %s' % attr.localname)
+ else:
+ buf.append(' %s="%s"' % (attr.localname,
+ Markup.escape(value)))
+
+ if tag.localname in self._EMPTY_ELEMS:
+ kind, data, pos = stream.next()
+ if kind is not Stream.END:
+ stream.pushback((kind, data, pos))
+
+ yield Markup(''.join(buf + ['>']))
+
+ elif kind is Stream.END:
+ tag = data
+ if tag.namespace and tag.namespace != self.NAMESPACE:
+ continue # not in the HTML namespace, so don't emit
+ yield Markup('%s>' % tag.localname)
+
+ elif kind is Stream.TEXT:
+ yield Markup.escape(data, quotes=False)
+
+
+class PushbackIterator(object):
+ """A simple wrapper for iterators that allows pushing items back on the
+ queue via the `pushback()` method.
+
+ That can effectively be used to peek at the next item."""
+ __slots__ = ['iterable', 'buf']
+
+ def __init__(self, iterable):
+ self.iterable = iter(iterable)
+ self.buf = []
+
+ def __iter__(self):
+ return self
+
+ def next(self):
+ if self.buf:
+ return self.buf.pop(0)
+ return self.iterable.next()
+
+ def pushback(self, item):
+ self.buf.append(item)
diff --git a/markup/path.py b/markup/path.py
new file mode 100644
--- /dev/null
+++ b/markup/path.py
@@ -0,0 +1,308 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Edgewall Software
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+"""Basic support for evaluating XPath expressions against streams."""
+
+import re
+
+from markup.core import QName, Stream
+
+__all__ = ['Path']
+
+_QUOTES = (("'", "'"), ('"', '"'))
+
+class Path(object):
+ """Basic XPath support on markup event streams.
+
+ >>> from markup.input import XML
+
+ Selecting specific tags:
+
+ >>> Path('root').select(XML('')).render()
+ ''
+ >>> Path('//root').select(XML('')).render()
+ ''
+
+ Using wildcards for tag names:
+
+ >>> Path('*').select(XML('')).render()
+ ''
+ >>> Path('//*').select(XML('')).render()
+ ''
+
+ Selecting attribute values:
+
+ >>> Path('@foo').select(XML('')).render()
+ ''
+ >>> Path('@foo').select(XML('')).render()
+ 'bar'
+
+ Selecting descendants:
+
+ >>> Path("root/*").select(XML('')).render()
+ ''
+ >>> Path("root/bar").select(XML('')).render()
+ ''
+ >>> Path("root/baz").select(XML('')).render()
+ ''
+ >>> Path("root/foo/*").select(XML('')).render()
+ ''
+
+ Selecting text nodes:
+ >>> Path("item/text()").select(XML('- Foo
')).render()
+ 'Foo'
+ >>> Path("item/text()").select(XML('- Foo
- Bar
')).render()
+ 'FooBar'
+
+ Skipping ancestors:
+
+ >>> Path("foo/bar").select(XML('')).render()
+ ''
+ >>> Path("foo/*").select(XML('')).render()
+ ''
+ >>> Path("root/bar").select(XML('')).render()
+ ''
+ >>> Path("root/bar").select(XML('')).render()
+ ''
+ >>> Path("root/*/bar").select(XML('')).render()
+ ''
+ >>> Path("root//bar").select(XML('')).render()
+ ''
+ >>> Path("root//bar").select(XML('')).render()
+ ''
+
+ Using simple attribute predicates:
+ >>> Path("root/item[@important]").select(XML(' ')).render()
+ ' '
+ >>> Path('root/item[@important="very"]').select(XML(' ')).render()
+ ' '
+ >>> Path("root/item[@important='very']").select(XML(' ')).render()
+ ''
+ >>> Path("root/item[@important!='very']").select(
+ ... XML(' ')).render()
+ ' '
+ """
+
+ _TOKEN_RE = re.compile('(::|\.\.|\(\)|[/.:\[\]\(\)@=!])|'
+ '([^/:\[\]\(\)@=!\s]+)|'
+ '\s+')
+
+ def __init__(self, text):
+ self.source = text
+
+ steps = []
+ cur_op = ''
+ cur_tag = ''
+ in_predicate = False
+ for op, tag in self._TOKEN_RE.findall(text):
+ if op:
+ if op == '[':
+ in_predicate = True
+ elif op == ']':
+ in_predicate = False
+ elif op.startswith('('):
+ if cur_tag == 'text':
+ steps[-1] = (False, self.fn_text(), [])
+ else:
+ raise NotImplementedError('XPath function "%s" not '
+ 'supported' % cur_tag)
+ else:
+ cur_op += op
+ cur_tag = ''
+ else:
+ closure = cur_op in ('', '//')
+ if cur_op == '@':
+ if tag == '*':
+ node_test = self.any_attribute()
+ else:
+ node_test = self.attribute_by_name(tag)
+ else:
+ if tag == '*':
+ node_test = self.any_element()
+ elif in_predicate:
+ if len(tag) > 1 and (tag[0], tag[-1]) in _QUOTES:
+ node_test = self.literal_string(tag[1:-1])
+ if cur_op == '=':
+ node_test = self.op_eq(steps[-1][2][-1], node_test)
+ steps[-1][2].pop()
+ elif cur_op == '!=':
+ node_test = self.op_neq(steps[-1][2][-1], node_test)
+ steps[-1][2].pop()
+ else:
+ node_test = self.element_by_name(tag)
+ if in_predicate:
+ steps[-1][2].append(node_test)
+ else:
+ steps.append([closure, node_test, []])
+ cur_op = ''
+ cur_tag = tag
+ self.steps = steps
+
+ def __repr__(self):
+ return '<%s "%s">' % (self.__class__.__name__, self.source)
+
+ def select(self, stream):
+ stream = iter(stream)
+ def _generate(tests):
+ test = self.test()
+ for kind, data, pos in stream:
+ result = test(kind, data, pos)
+ if result is True:
+ yield kind, data, pos
+ depth = 1
+ while depth > 0:
+ ev = stream.next()
+ if ev[0] is Stream.START:
+ depth += 1
+ elif ev[0] is Stream.END:
+ depth -= 1
+ yield ev
+ test(*ev)
+ elif result:
+ yield result
+ return Stream(_generate(self.steps))
+
+ def test(self):
+ stack = [0] # stack of cursors into the location path
+
+ def _test(kind, data, pos):
+ #print '\nTracker %r test [%s] %r' % (self, kind, data)
+
+ if not stack:
+ return False
+
+ if kind is Stream.END:
+ stack.pop()
+ return None
+
+ if kind is Stream.START:
+ stack.append(stack[-1])
+
+ matched = False
+ closure, node_test, predicates = self.steps[stack[-1]]
+
+ #print ' Testing against %r' % node_test
+ matched = node_test(kind, data, pos)
+ if matched and predicates:
+ for predicate in predicates:
+ if not predicate(kind, data, pos):
+ matched = None
+ break
+
+ if matched:
+ if stack[-1] == len(self.steps) - 1:
+ #print ' Last step %r... returned %r' % (node_test, matched)
+ return matched
+
+ #print ' Matched intermediate step %r... proceed to next step %r' % (node_test, self.steps[stack[-1] + 1])
+ stack[-1] += 1
+
+ elif kind is Stream.START and not closure:
+ # FIXME: If this step is not a closure, it cannot be matched
+ # until the current element is closed... so we need to
+ # move the cursor back to the last closure and retest
+ # that against the current element
+ closures = [step for step in self.steps[:stack[-1]] if step[0]]
+ closures.reverse()
+ for closure, node_test, predicates in closures:
+ stack[-1] -= 1
+ if closure:
+ matched = node_test(kind, data, pos)
+ if matched:
+ stack[-1] += 1
+ break
+
+ return None
+
+ return _test
+
+ class any_element(object):
+ def __call__(self, kind, data, pos):
+ if kind is Stream.START:
+ return True
+ return None
+ def __repr__(self):
+ return '<%s>' % self.__class__.__name__
+
+ class element_by_name(object):
+ def __init__(self, name):
+ self.name = QName(name)
+ def __call__(self, kind, data, pos):
+ if kind is Stream.START:
+ return data[0].localname == self.name
+ return None
+ def __repr__(self):
+ return '<%s "%s">' % (self.__class__.__name__, self.name)
+
+ class any_attribute(object):
+ def __call__(self, kind, data, pos):
+ if kind is Stream.START:
+ text = ''.join([val for name, val in data[1]])
+ if text:
+ return Stream.TEXT, text, pos
+ return None
+ return None
+ def __repr__(self):
+ return '<%s>' % (self.__class__.__name__)
+
+ class attribute_by_name(object):
+ def __init__(self, name):
+ self.name = QName(name)
+ def __call__(self, kind, data, pos):
+ if kind is Stream.START:
+ if self.name in data[1]:
+ return Stream.TEXT, data[1].get(self.name), pos
+ return None
+ return None
+ def __repr__(self):
+ return '<%s "%s">' % (self.__class__.__name__, self.name)
+
+ class fn_text(object):
+ def __call__(self, kind, data, pos):
+ if kind is Stream.TEXT:
+ return kind, data, pos
+ return None
+ def __repr__(self):
+ return '<%s>' % (self.__class__.__name__)
+
+ class literal_string(object):
+ def __init__(self, value):
+ self.value = value
+ def __call__(self, kind, data, pos):
+ return Stream.TEXT, self.value, (-1, -1)
+ def __repr__(self):
+ return '<%s>' % (self.__class__.__name__)
+
+ class op_eq(object):
+ def __init__(self, lval, rval):
+ self.lval = lval
+ self.rval = rval
+ def __call__(self, kind, data, pos):
+ lval = self.lval(kind, data, pos)
+ rval = self.rval(kind, data, pos)
+ return (lval and lval[1]) == (rval and rval[1])
+ def __repr__(self):
+ return '<%s %r = %r>' % (self.__class__.__name__, self.lval,
+ self.rval)
+
+ class op_neq(object):
+ def __init__(self, lval, rval):
+ self.lval = lval
+ self.rval = rval
+ def __call__(self, kind, data, pos):
+ lval = self.lval(kind, data, pos)
+ rval = self.rval(kind, data, pos)
+ return (lval and lval[1]) != (rval and rval[1])
+ def __repr__(self):
+ return '<%s %r != %r>' % (self.__class__.__name__, self.lval,
+ self.rval)
diff --git a/markup/template.py b/markup/template.py
new file mode 100644
--- /dev/null
+++ b/markup/template.py
@@ -0,0 +1,780 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+"""Template engine that is compatible with Kid (http://kid.lesscode.org) to a
+certain extent.
+
+Differences include:
+ * No generation of Python code for a template; the template is "interpreted"
+ * No support for processing instructions
+ * Expressions are evaluated in a more flexible manner, meaning you can use e.g.
+ attribute access notation to access items in a dictionary, etc
+ * Use of XInclude and match templates instead of Kid's py:extends/py:layout
+ directives
+ * Real (thread-safe) search path support
+ * No dependency on ElementTree (due to the lack of pos info)
+ * The original pos of parse events is kept throughout the processing
+ pipeline, so that errors can be tracked back to a specific line/column in
+ the template file
+ * py:match directives use (basic) XPath expressions to match against input
+ nodes, making match templates more powerful while keeping the syntax simple
+
+Todo items:
+ * XPath support needs a real implementation
+ * Improved error reporting
+ * Support for using directives as elements and not just as attributes, reducing
+ the need for wrapper elements with py:strip=""
+ * Support for py:choose/py:when/py:otherwise (similar to XSLT)
+ * Support for list comprehensions and generator expressions in expressions
+
+Random thoughts:
+ * Is there any need to support py:extends and/or py:layout?
+ * Could we generate byte code from expressions?
+"""
+
+import compiler
+from itertools import chain
+import os
+import re
+from StringIO import StringIO
+
+from markup.core import Attributes, Stream
+from markup.eval import Expression
+from markup.filters import EvalFilter, IncludeFilter, MatchFilter, \
+ WhitespaceFilter
+from markup.input import HTML, XMLParser, XML
+
+__all__ = ['Context', 'BadDirectiveError', 'TemplateError',
+ 'TemplateSyntaxError', 'TemplateNotFound', 'Template',
+ 'TemplateLoader']
+
+
+class TemplateError(Exception):
+ """Base exception class for errors related to template processing."""
+
+
+class TemplateSyntaxError(TemplateError):
+ """Exception raised when an expression in a template causes a Python syntax
+ error."""
+
+ def __init__(self, message, filename='', lineno=-1, offset=-1):
+ if isinstance(message, SyntaxError) and message.lineno is not None:
+ message = str(message).replace(' (line %d)' % message.lineno, '')
+ TemplateError.__init__(self, message)
+ self.filename = filename
+ self.lineno = lineno
+ self.offset = offset
+
+
+class BadDirectiveError(TemplateSyntaxError):
+ """Exception raised when an unknown directive is encountered when parsing
+ a template.
+
+ An unknown directive is any attribute using the namespace for directives,
+ with a local name that doesn't match any registered directive.
+ """
+
+ def __init__(self, name, filename='', lineno=-1):
+ TemplateSyntaxError.__init__(self, 'Bad directive "%s"' % name.localname,
+ filename, lineno)
+
+
+class TemplateNotFound(TemplateError):
+ """Exception raised when a specific template file could not be found."""
+
+ def __init__(self, name, search_path):
+ TemplateError.__init__(self, 'Template "%s" not found' % name)
+ self.search_path = search_path
+
+
+class Context(object):
+ """A container for template input data.
+
+ A context provides a stack of scopes. Template directives such as loops can
+ push a new scope on the stack with data that should only be available
+ inside the loop. When the loop terminates, that scope can get popped off
+ the stack again.
+
+ >>> ctxt = Context(one='foo', other=1)
+ >>> ctxt.get('one')
+ 'foo'
+ >>> ctxt.get('other')
+ 1
+ >>> ctxt.push(one='frost')
+ >>> ctxt.get('one')
+ 'frost'
+ >>> ctxt.get('other')
+ 1
+ >>> ctxt.pop()
+ >>> ctxt.get('one')
+ 'foo'
+ """
+
+ def __init__(self, **data):
+ self.stack = [data]
+
+ def __getitem__(self, key):
+ """Get a variable's value, starting at the current context and going
+ upward.
+ """
+ return self.get(key)
+
+ def __repr__(self):
+ return repr(self.stack)
+
+ def __setitem__(self, key, value):
+ """Set a variable in the current context."""
+ self.stack[0][key] = value
+
+ def get(self, key):
+ for frame in self.stack:
+ if key in frame:
+ return frame[key]
+
+ def push(self, **data):
+ self.stack.insert(0, data)
+
+ def pop(self):
+ assert self.stack, 'Pop from empty context stack'
+ self.stack.pop(0)
+
+
+class Directive(object):
+ """Abstract base class for template directives.
+
+ A directive is basically a callable that takes two parameters: `ctxt` is
+ the template data context, and `stream` is an iterable over the events that
+ the directive applies to.
+
+ Directives can be "anonymous" or "registered". Registered directives can be
+ applied by the template author using an XML attribute with the
+ corresponding name in the template. Such directives should be subclasses of
+ this base class that can be instantiated with two parameters: `template`
+ is the `Template` instance, and `value` is the value of the directive
+ attribute.
+
+ Anonymous directives are simply functions conforming to the protocol
+ described above, and can only be applied programmatically (for example by
+ template filters).
+ """
+ __slots__ = ['expr']
+
+ def __init__(self, template, value, pos):
+ self.expr = value and Expression(value) or None
+
+ def __call__(self, stream, ctxt):
+ raise NotImplementedError
+
+ def __repr__(self):
+ expr = ''
+ if self.expr is not None:
+ expr = ' "%s"' % self.expr.source
+ return '<%s%s>' % (self.__class__.__name__, expr)
+
+
+class AttrsDirective(Directive):
+ """Implementation of the `py:attrs` template directive.
+
+ The value of the `py:attrs` attribute should be a dictionary. The keys and
+ values of that dictionary will be added as attributes to the element:
+
+ >>> ctxt = Context(foo={'class': 'collapse'})
+ >>> tmpl = Template('''''')
+ >>> print tmpl.generate(ctxt)
+
+
+ If the value evaluates to `None` (or any other non-truth value), no
+ attributes are added:
+
+ >>> ctxt = Context(foo=None)
+ >>> print tmpl.generate(ctxt)
+
+ """
+ def __call__(self, stream, ctxt):
+ kind, (tag, attrib), pos = stream.next()
+ attrs = self.expr.evaluate(ctxt)
+ if attrs:
+ attrib = attrib[:]
+ for name, value in attrs.items():
+ if value is not None:
+ value = unicode(value).strip()
+ attrib.append((name, value))
+ yield kind, (tag, Attributes(attrib)), pos
+ for event in stream:
+ yield event
+
+
+class ContentDirective(Directive):
+ """Implementation of the `py:content` template directive.
+
+ This directive replaces the content of the element with the result of
+ evaluating the value of the `py:content` attribute:
+
+ >>> ctxt = Context(bar='Bye')
+ >>> tmpl = Template('''''')
+ >>> print tmpl.generate(ctxt)
+
+ """
+ def __call__(self, stream, ctxt):
+ kind, data, pos = stream.next()
+ if kind is Stream.START:
+ yield kind, data, pos # emit start tag
+ yield Stream.EXPR, self.expr, pos
+ previous = None
+ try:
+ while True:
+ previous = stream.next()
+ except StopIteration:
+ if previous is not None:
+ yield previous
+
+
+class DefDirective(Directive):
+ """Implementation of the `py:def` template directive.
+
+ This directive can be used to create "Named Template Functions", which
+ are template snippets that are not actually output during normal
+ processing, but rather can be expanded from expressions in other places
+ in the template.
+
+ A named template function can be used just like a normal Python function
+ from template expressions:
+
+ >>> ctxt = Context(bar='Bye')
+ >>> tmpl = Template('''
+ ...
+ ... ${greeting}, ${name}!
+ ...
+ ... ${echo('hi', name='you')}
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+
+ >>> ctxt = Context(bar='Bye')
+ >>> tmpl = Template('''
+ ...
+ ... ${greeting}, ${name}!
+ ...
+ ...
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+ """
+ __slots__ = ['name', 'args', 'defaults', 'stream']
+
+ def __init__(self, template, args, pos):
+ Directive.__init__(self, template, None, pos)
+ ast = compiler.parse(args, 'eval').node
+ self.args = []
+ self.defaults = {}
+ if isinstance(ast, compiler.ast.CallFunc):
+ self.name = ast.node.name
+ for arg in ast.args:
+ if isinstance(arg, compiler.ast.Keyword):
+ self.args.append(arg.name)
+ self.defaults[arg.name] = arg.expr.value
+ else:
+ self.args.append(arg.name)
+ else:
+ self.name = ast.name
+ self.stream = []
+
+ def __call__(self, stream, ctxt):
+ self.stream = list(stream)
+ ctxt[self.name] = lambda *args, **kwargs: self._exec(ctxt, *args,
+ **kwargs)
+ return []
+
+ def _exec(self, ctxt, *args, **kwargs):
+ scope = {}
+ args = list(args) # make mutable
+ for name in self.args:
+ if args:
+ scope[name] = args.pop(0)
+ else:
+ scope[name] = kwargs.pop(name, self.defaults.get(name))
+ ctxt.push(**scope)
+ for event in self.stream:
+ yield event
+ ctxt.pop()
+
+
+class ForDirective(Directive):
+ """Implementation of the `py:for` template directive.
+
+ >>> ctxt = Context(items=[1, 2, 3])
+ >>> tmpl = Template('''''')
+ >>> print tmpl.generate(ctxt)
+
+ """
+ __slots__ = ['targets']
+
+ def __init__(self, template, value, pos):
+ targets, expr_source = value.split(' in ', 1)
+ self.targets = [str(name.strip()) for name in targets.split(',')]
+ Directive.__init__(self, template, expr_source, pos)
+
+ def __call__(self, stream, ctxt):
+ iterable = self.expr.evaluate(ctxt, [])
+ if iterable is not None:
+ stream = list(stream)
+ for item in iter(iterable):
+ if len(self.targets) == 1:
+ item = [item]
+ scope = {}
+ for idx, name in enumerate(self.targets):
+ scope[name] = item[idx]
+ ctxt.push(**scope)
+ for event in stream:
+ yield event
+ ctxt.pop()
+
+ def __repr__(self):
+ return '<%s "%s in %s">' % (self.__class__.__name__,
+ ', '.join(self.targets), self.expr.source)
+
+
+class IfDirective(Directive):
+ """Implementation of the `py:if` template directive.
+
+ >>> ctxt = Context(foo=True, bar='Hello')
+ >>> tmpl = Template('''
+ ... ${bar}
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+ Hello
+
+ """
+ def __call__(self, stream, ctxt):
+ if self.expr.evaluate(ctxt):
+ return stream
+ return []
+
+
+class MatchDirective(Directive):
+ """Implementation of the `py:match` template directive.
+
+ >>> ctxt = Context()
+ >>> tmpl = Template('''
+ ...
+ ... Hello ${select('@name')}
+ ...
+ ...
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+
+ Hello Dude
+
+
+ """
+ __slots__ = ['path', 'stream']
+
+ def __init__(self, template, value, pos):
+ Directive.__init__(self, template, None, pos)
+ template.filters.append(MatchFilter(value, self._handle_match))
+ self.path = value
+ self.stream = []
+
+ def __call__(self, stream, ctxt):
+ self.stream = list(stream)
+ return []
+
+ def __repr__(self):
+ return '<%s "%s">' % (self.__class__.__name__, self.path)
+
+ def _handle_match(self, orig_stream, ctxt):
+ ctxt.push(select=lambda path: Stream(orig_stream).select(path))
+ for event in self.stream:
+ yield event
+ ctxt.pop()
+
+
+class ReplaceDirective(Directive):
+ """Implementation of the `py:replace` template directive.
+
+ >>> ctxt = Context(bar='Bye')
+ >>> tmpl = Template('''
+ ... Hello
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+ Bye
+
+
+ This directive is equivalent to `py:content` combined with `py:strip`,
+ providing a less verbose way to achieve the same effect:
+
+ >>> ctxt = Context(bar='Bye')
+ >>> tmpl = Template('''
+ ... Hello
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+ Bye
+
+ """
+ def __call__(self, stream, ctxt):
+ kind, data, pos = stream.next()
+ yield Stream.EXPR, self.expr, pos
+
+
+class StripDirective(Directive):
+ """Implementation of the `py:strip` template directive.
+
+ When the value of the `py:strip` attribute evaluates to `True`, the element
+ is stripped from the output
+
+ >>> ctxt = Context()
+ >>> tmpl = Template('''''')
+ >>> print tmpl.generate(ctxt)
+
+ foo
+
+
+ On the other hand, when the attribute evaluates to `False`, the element is
+ not stripped:
+
+ >>> ctxt = Context()
+ >>> tmpl = Template('''''')
+ >>> print tmpl.generate(ctxt)
+
+
+ Leaving the attribute value empty is equivalent to a truth value:
+
+ >>> ctxt = Context()
+ >>> tmpl = Template('''''')
+ >>> print tmpl.generate(ctxt)
+
+ foo
+
+
+ This directive is particulary interesting for named template functions or
+ match templates that do not generate a top-level element:
+
+ >>> ctxt = Context()
+ >>> tmpl = Template('''
+ ...
+ ... ${what}
+ ...
+ ... ${echo('foo')}
+ ...
''')
+ >>> print tmpl.generate(ctxt)
+
+ foo
+
+ """
+ def __call__(self, stream, ctxt):
+ if self.expr:
+ strip = self.expr.evaluate(ctxt)
+ else:
+ strip = True
+ if strip:
+ stream.next() # skip start tag
+ # can ignore StopIteration since it will just break from this
+ # generator
+ previous = stream.next()
+ for event in stream:
+ yield previous
+ previous = event
+ else:
+ for event in stream:
+ yield event
+
+
+class Template(object):
+ """Can parse a template and transform it into the corresponding output
+ based on context data.
+ """
+ NAMESPACE = 'http://purl.org/kid/ns#'
+
+ directives = [('def', DefDirective),
+ ('match', MatchDirective),
+ ('for', ForDirective),
+ ('if', IfDirective),
+ ('replace', ReplaceDirective),
+ ('content', ContentDirective),
+ ('attrs', AttrsDirective),
+ ('strip', StripDirective)]
+ _dir_by_name = dict(directives)
+ _dir_order = [directive[1] for directive in directives]
+
+ def __init__(self, source, filename=None):
+ """Initialize a template from either a string or a file-like object."""
+ if isinstance(source, basestring):
+ self.source = StringIO(source)
+ else:
+ self.source = source
+ self.filename = filename or ''
+
+ self.pre_filters = [EvalFilter()]
+ self.filters = []
+ self.post_filters = [WhitespaceFilter()]
+ self.parse()
+
+ def __repr__(self):
+ return '<%s "%s">' % (self.__class__.__name__,
+ os.path.basename(self.filename))
+
+ def parse(self):
+ """Parse the template.
+
+ The parsing stage parses the XML template and constructs a list of
+ directives that will be executed in the render stage. The input is
+ split up into literal output (markup that does not depend on the
+ context data) and actual directives (commands or variable
+ substitution).
+ """
+ stream = [] # list of events of the "compiled" template
+ dirmap = {} # temporary mapping of directives to elements
+ ns_prefix = {}
+ depth = 0
+
+ for kind, data, pos in XMLParser(self.source):
+
+ if kind is Stream.START_NS:
+ # Strip out the namespace declaration for template directives
+ prefix, uri = data
+ if uri == self.NAMESPACE:
+ ns_prefix[prefix] = uri
+ else:
+ stream.append((kind, data, pos))
+
+ elif kind is Stream.END_NS:
+ if data in ns_prefix:
+ del ns_prefix[data]
+ else:
+ stream.append((kind, data, pos))
+
+ elif kind is Stream.START:
+ # Record any directive attributes in start tags
+ tag, attrib = data
+ directives = []
+ new_attrib = []
+ for name, value in attrib:
+ if name.namespace == self.NAMESPACE:
+ cls = self._dir_by_name.get(name.localname)
+ if cls is None:
+ raise BadDirectiveError(name, self.filename, pos[0])
+ else:
+ directives.append(cls(self, value, pos))
+ else:
+ value = list(self._interpolate(value, *pos))
+ new_attrib.append((name, value))
+ if directives:
+ directives.sort(lambda a, b: cmp(self._dir_order.index(a.__class__),
+ self._dir_order.index(b.__class__)))
+ dirmap[(depth, tag)] = (directives, len(stream))
+
+ stream.append((kind, (tag, Attributes(new_attrib)), pos))
+ depth += 1
+
+ elif kind is Stream.END:
+ depth -= 1
+ stream.append((kind, data, pos))
+
+ # If there have have directive attributes with the corresponding
+ # start tag, move the events inbetween into a "subprogram"
+ if (depth, data) in dirmap:
+ directives, start_offset = dirmap.pop((depth, data))
+ substream = stream[start_offset:]
+ stream[start_offset:] = [(Stream.SUB,
+ (directives, substream), pos)]
+
+ elif kind is Stream.TEXT:
+ for kind, data, pos in self._interpolate(data, *pos):
+ stream.append((kind, data, pos))
+
+ else:
+ stream.append((kind, data, pos))
+
+ self.stream = stream
+
+ def generate(self, ctxt):
+ """Transform the template based on the given context data."""
+
+ def _transform(stream):
+ # Apply pre and runtime filters
+ for filter_ in chain(self.pre_filters, self.filters):
+ stream = filter_(iter(stream), ctxt)
+
+ try:
+ for kind, data, pos in stream:
+
+ if kind is Stream.SUB:
+ # This event is a list of directives and a list of
+ # nested events to which those directives should be
+ # applied
+ directives, substream = data
+ directives.reverse()
+ for directive in directives:
+ substream = directive(iter(substream), ctxt)
+ for event in _transform(iter(substream)):
+ yield event
+
+ else:
+ yield kind, data, pos
+ except SyntaxError, err:
+ raise TemplateSyntaxError(err, self.filename, pos[0],
+ pos[1] + (err.offset or 0))
+
+ stream = _transform(self.stream)
+
+ # Apply post-filters
+ for filter_ in self.post_filters:
+ stream = filter_(iter(stream), ctxt)
+
+ return Stream(stream)
+
+ _FULL_EXPR_RE = re.compile(r'(?>> import tempfile
+ >>> fd, path = tempfile.mkstemp(suffix='.html', prefix='template')
+ >>> os.write(fd, '$var
')
+ 11
+ >>> os.close(fd)
+
+ The template loader accepts a list of directory paths that are then used
+ when searching for template files, in the given order:
+
+ >>> loader = TemplateLoader([os.path.dirname(path)])
+
+ The `load()` method first checks the template cache whether the requested
+ template has already been loaded. If not, it attempts to locate the
+ template file, and returns the corresponding `Template` object:
+
+ >>> template = loader.load(os.path.basename(path))
+ >>> isinstance(template, Template)
+ True
+
+ Template instances are cached: requesting a template with the same name
+ results in the same instance being returned:
+
+ >>> loader.load(os.path.basename(path)) is template
+ True
+ """
+ def __init__(self, search_path=None, auto_reload=False):
+ """Create the template laoder.
+
+ @param search_path: a list of absolute path names that should be
+ searched for template files
+ @param auto_reload: whether to check the last modification time of
+ template files, and reload them if they have changed
+ """
+ self.search_path = search_path
+ if self.search_path is None:
+ self.search_path = []
+ self.auto_reload = auto_reload
+ self._cache = {}
+ self._mtime = {}
+
+ def load(self, filename):
+ """Load the template with the given name.
+
+ This method searches the search path trying to locate a template
+ matching the given name. If no such template is found, a
+ `TemplateNotFound` exception is raised. Otherwise, a `Template` object
+ representing the requested template is returned.
+
+ Template searches are cached to avoid having to parse the same template
+ file more than once. Thus, subsequent calls of this method with the
+ same template file name will return the same `Template` object.
+
+ @param filename: the relative path of the template file to load
+ """
+ filename = os.path.normpath(filename)
+ try:
+ tmpl = self._cache[filename]
+ if not self.auto_reload or \
+ os.path.getmtime(tmpl.filename) == self._mtime[filename]:
+ return tmpl
+ except KeyError:
+ pass
+ for dirname in self.search_path:
+ filepath = os.path.join(dirname, filename)
+ try:
+ fileobj = file(filepath, 'rt')
+ try:
+ tmpl = Template(fileobj, filename=filepath)
+ tmpl.pre_filters.append(IncludeFilter(self))
+ finally:
+ fileobj.close()
+ self._cache[filename] = tmpl
+ self._mtime[filename] = os.path.getmtime(filepath)
+ return tmpl
+ except IOError:
+ continue
+ raise TemplateNotFound(filename, self.search_path)
diff --git a/markup/tests/__init__.py b/markup/tests/__init__.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/__init__.py
@@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+import unittest
+
+def suite():
+ import markup
+ from markup.tests import builder, core, eval, input, output, path, template
+ suite = unittest.TestSuite()
+ suite.addTest(doctest.DocTestSuite(markup))
+ suite.addTest(builder.suite())
+ suite.addTest(core.suite())
+ suite.addTest(eval.suite())
+ suite.addTest(input.suite())
+ suite.addTest(output.suite())
+ suite.addTest(path.suite())
+ suite.addTest(template.suite())
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/builder.py b/markup/tests/builder.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/builder.py
@@ -0,0 +1,40 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+from HTMLParser import HTMLParseError
+import unittest
+
+from markup.builder import Element, tag
+from markup.core import Stream
+
+
+class ElementFactoryTestCase(unittest.TestCase):
+
+ def test_link(self):
+ link = tag.A(href='#', title='Foo', accesskey=None)('Bar')
+ bits = iter(link.generate())
+ self.assertEqual((Stream.START, ('a', [('href', "#"), ('title', "Foo")]),
+ (-1, -1)), bits.next())
+ self.assertEqual((Stream.TEXT, u'Bar', (-1, -1)), bits.next())
+ self.assertEqual((Stream.END, 'a', (-1, -1)), bits.next())
+
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(doctest.DocTestSuite(Element.__module__))
+ suite.addTest(unittest.makeSuite(ElementFactoryTestCase, 'test'))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/core.py b/markup/tests/core.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/core.py
@@ -0,0 +1,189 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+from HTMLParser import HTMLParseError
+import unittest
+
+from markup.core import Markup, escape, unescape
+
+
+class MarkupTestCase(unittest.TestCase):
+
+ def test_escape(self):
+ markup = escape('"&"')
+ assert isinstance(markup, Markup)
+ self.assertEquals('<b>"&"</b>', markup)
+
+ def test_escape_noquotes(self):
+ markup = escape('"&"', quotes=False)
+ assert isinstance(markup, Markup)
+ self.assertEquals('<b>"&"</b>', markup)
+
+ def test_unescape_markup(self):
+ string = '"&"'
+ markup = Markup.escape(string)
+ assert isinstance(markup, Markup)
+ self.assertEquals(string, unescape(markup))
+
+ def test_add_str(self):
+ markup = Markup('foo') + '
'
+ assert isinstance(markup, Markup)
+ self.assertEquals('foo<br/>', markup)
+
+ def test_add_markup(self):
+ markup = Markup('foo') + Markup('
')
+ assert isinstance(markup, Markup)
+ self.assertEquals('foo
', markup)
+
+ def test_add_reverse(self):
+ markup = 'foo' + Markup('bar')
+ assert isinstance(markup, unicode)
+ self.assertEquals('foobar', markup)
+
+ def test_mod(self):
+ markup = Markup('%s') % '&'
+ assert isinstance(markup, Markup)
+ self.assertEquals('&', markup)
+
+ def test_mod_multi(self):
+ markup = Markup('%s %s') % ('&', 'boo')
+ assert isinstance(markup, Markup)
+ self.assertEquals('& boo', markup)
+
+ def test_mul(self):
+ markup = Markup('foo') * 2
+ assert isinstance(markup, Markup)
+ self.assertEquals('foofoo', markup)
+
+ def test_join(self):
+ markup = Markup('
').join(['foo', '', Markup('')])
+ assert isinstance(markup, Markup)
+ self.assertEquals('foo
<bar />
', markup)
+
+ def test_stripentities_all(self):
+ markup = Markup('& j').stripentities()
+ assert isinstance(markup, Markup)
+ self.assertEquals('& j', markup)
+
+ def test_stripentities_keepxml(self):
+ markup = Markup('fo
o').striptags()
+ assert isinstance(markup, Markup)
+ self.assertEquals('foo', markup)
+
+ def test_striptags_empty(self):
+ markup = Markup('
').striptags()
+ assert isinstance(markup, Markup)
+ self.assertEquals('', markup)
+
+ def test_striptags_mid(self):
+ markup = Markup('fo
o').striptags()
+ assert isinstance(markup, Markup)
+ self.assertEquals('foo', markup)
+
+ def test_sanitize_unchanged(self):
+ markup = Markup('fo
o')
+ self.assertEquals('fo
o', str(markup.sanitize()))
+
+ def test_sanitize_escape_text(self):
+ markup = Markup('fo&')
+ self.assertEquals('fo&', str(markup.sanitize()))
+ markup = Markup('<foo>')
+ self.assertEquals('<foo>', str(markup.sanitize()))
+
+ def test_sanitize_entityref_text(self):
+ markup = Markup('foö')
+ self.assertEquals(u'foö', unicode(markup.sanitize()))
+
+ def test_sanitize_escape_attr(self):
+ markup = Markup('')
+ self.assertEquals('', str(markup.sanitize()))
+
+ def test_sanitize_close_empty_tag(self):
+ markup = Markup('fo
o')
+ self.assertEquals('fo
o', str(markup.sanitize()))
+
+ def test_sanitize_invalid_entity(self):
+ markup = Markup('&junk;')
+ self.assertEquals('&junk;', str(markup.sanitize()))
+
+ def test_sanitize_remove_script_elem(self):
+ markup = Markup('')
+ self.assertEquals('', str(markup.sanitize()))
+ markup = Markup('')
+ self.assertEquals('', str(markup.sanitize()))
+ markup = Markup('alert("foo")')
+ self.assertRaises(HTMLParseError, markup.sanitize().render)
+ markup = Markup('')
+ self.assertRaises(HTMLParseError, markup.sanitize().render)
+
+ def test_sanitize_remove_onclick_attr(self):
+ markup = Markup('')
+ self.assertEquals('', str(markup.sanitize()))
+
+ def test_sanitize_remove_style_scripts(self):
+ # Inline style with url() using javascript: scheme
+ markup = Markup('')
+ self.assertEquals('
', str(markup.sanitize()))
+ # Inline style with url() using javascript: scheme, using control char
+ markup = Markup('
')
+ self.assertEquals('
', str(markup.sanitize()))
+ # Inline style with url() using javascript: scheme, in quotes
+ markup = Markup('
')
+ self.assertEquals('
', str(markup.sanitize()))
+ # IE expressions in CSS not allowed
+ markup = Markup('
')
+ self.assertEquals('
', str(markup.sanitize()))
+ markup = Markup('
')
+ self.assertEquals('
', str(markup.sanitize()))
+
+ def test_sanitize_remove_src_javascript(self):
+ markup = Markup('
\')
')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+ # Case-insensitive protocol matching
+ markup = Markup('
\')
')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+ # Grave accents (not parsed)
+ markup = Markup('

')
+ self.assertRaises(HTMLParseError, markup.sanitize().render)
+ # Protocol encoded using UTF-8 numeric entities
+ markup = Markup('

')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+ # Protocol encoded using UTF-8 numeric entities without a semicolon
+ # (which is allowed because the max number of digits is used)
+ markup = Markup('

')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+ # Protocol encoded using UTF-8 numeric hex entities without a semicolon
+ # (which is allowed because the max number of digits is used)
+ markup = Markup('

')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+ # Embedded tab character in protocol
+ markup = Markup('
;\')
')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+ # Embedded tab character in protocol, but encoded this time
+ markup = Markup('
;\')
')
+ self.assertEquals('
![]()
', str(markup.sanitize()))
+
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(unittest.makeSuite(MarkupTestCase, 'test'))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/eval.py b/markup/tests/eval.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/eval.py
@@ -0,0 +1,26 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+import unittest
+
+from markup import eval
+
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(doctest.DocTestSuite(eval))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/input.py b/markup/tests/input.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/input.py
@@ -0,0 +1,31 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Edgewall Software
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import unittest
+
+from markup.core import Stream
+from markup.input import XMLParser
+
+
+class XMLParserTestCase(unittest.TestCase):
+ pass
+
+
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(unittest.makeSuite(XMLParserTestCase, 'test'))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/output.py b/markup/tests/output.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/output.py
@@ -0,0 +1,26 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Edgewall Software
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+import unittest
+import sys
+
+from markup import output
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(doctest.DocTestSuite(output))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/path.py b/markup/tests/path.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/path.py
@@ -0,0 +1,26 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+import unittest
+
+from markup import path
+
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(doctest.DocTestSuite(path))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/markup/tests/template.py b/markup/tests/template.py
new file mode 100644
--- /dev/null
+++ b/markup/tests/template.py
@@ -0,0 +1,111 @@
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Edgewall Software
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+import doctest
+import unittest
+import sys
+
+from markup.core import Stream
+from markup.template import BadDirectiveError, Context, Template, \
+ TemplateSyntaxError
+
+
+class TemplateTestCase(unittest.TestCase):
+
+ def test_interpolate_string(self):
+ parts = list(Template._interpolate('bla'))
+ self.assertEqual(1, len(parts))
+ self.assertEqual(Stream.TEXT, parts[0][0])
+ self.assertEqual('bla', parts[0][1])
+
+ def test_interpolate_simple(self):
+ parts = list(Template._interpolate('${bla}'))
+ self.assertEqual(1, len(parts))
+ self.assertEqual(Stream.EXPR, parts[0][0])
+ self.assertEqual('bla', parts[0][1].source)
+
+ def test_interpolate_escaped(self):
+ parts = list(Template._interpolate('$${bla}'))
+ self.assertEqual(1, len(parts))
+ self.assertEqual(Stream.TEXT, parts[0][0])
+ self.assertEqual('${bla}', parts[0][1])
+
+ def test_interpolate_short(self):
+ parts = list(Template._interpolate('$bla'))
+ self.assertEqual(1, len(parts))
+ self.assertEqual(Stream.EXPR, parts[0][0])
+ self.assertEqual('bla', parts[0][1].source)
+
+ def test_interpolate_mixed1(self):
+ parts = list(Template._interpolate('$foo bar $baz'))
+ self.assertEqual(3, len(parts))
+ self.assertEqual(Stream.EXPR, parts[0][0])
+ self.assertEqual('foo', parts[0][1].source)
+ self.assertEqual(Stream.TEXT, parts[1][0])
+ self.assertEqual(' bar ', parts[1][1])
+ self.assertEqual(Stream.EXPR, parts[2][0])
+ self.assertEqual('baz', parts[2][1].source)
+
+ def test_interpolate_mixed2(self):
+ parts = list(Template._interpolate('foo $bar baz'))
+ self.assertEqual(3, len(parts))
+ self.assertEqual(Stream.TEXT, parts[0][0])
+ self.assertEqual('foo ', parts[0][1])
+ self.assertEqual(Stream.EXPR, parts[1][0])
+ self.assertEqual('bar', parts[1][1].source)
+ self.assertEqual(Stream.TEXT, parts[2][0])
+ self.assertEqual(' baz', parts[2][1])
+
+ def test_bad_directive_error(self):
+ xml = '
'
+ try:
+ tmpl = Template(xml, 'test.html')
+ except BadDirectiveError, e:
+ self.assertEqual('test.html', e.filename)
+ if sys.version_info[:2] >= (2, 4):
+ self.assertEqual(1, e.lineno)
+
+ def test_directive_value_syntax_error(self):
+ xml = '
'
+ tmpl = Template(xml, 'test.html')
+ try:
+ list(tmpl.generate(Context()))
+ self.fail('Expected SyntaxError')
+ except TemplateSyntaxError, e:
+ self.assertEqual('test.html', e.filename)
+ if sys.version_info[:2] >= (2, 4):
+ self.assertEqual(1, e.lineno)
+ # We don't really care about the offset here, do we?
+
+ def test_expression_syntax_error(self):
+ xml = '
\n Foo ${bar"}\n
'
+ tmpl = Template(xml, filename='test.html')
+ ctxt = Context(bar='baz')
+ try:
+ list(tmpl.generate(ctxt))
+ self.fail('Expected SyntaxError')
+ except TemplateSyntaxError, e:
+ self.assertEqual('test.html', e.filename)
+ if sys.version_info[:2] >= (2, 4):
+ self.assertEqual(2, e.lineno)
+ self.assertEqual(10, e.offset)
+
+
+def suite():
+ suite = unittest.TestSuite()
+ suite.addTest(doctest.DocTestSuite(Template.__module__))
+ suite.addTest(unittest.makeSuite(TemplateTestCase, 'test'))
+ return suite
+
+if __name__ == '__main__':
+ unittest.main(defaultTest='suite')
diff --git a/setup.cfg b/setup.cfg
new file mode 100644
--- /dev/null
+++ b/setup.cfg
@@ -0,0 +1,3 @@
+[egg_info]
+tag_build = dev
+tag_svn_revision = true
diff --git a/setup.py b/setup.py
new file mode 100755
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,25 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (C) 2006 Christopher Lenz
+# All rights reserved.
+#
+# This software is licensed as described in the file COPYING, which
+# you should have received as part of this distribution. The terms
+# are also available at http://trac.edgewall.com/license.html.
+#
+# This software consists of voluntary contributions made by many
+# individuals. For the exact contribution history, see the revision
+# history and logs, available at http://projects.edgewall.com/trac/.
+
+from setuptools import setup, find_packages
+
+setup(
+ name='Markup', version='0.1',
+ description='Toolkit for stream-based generation of markup for the web',
+ author='Christopher Lenz', author_email='cmlenz@gmx.net',
+ license='BSD', url='http://markup.cmlenz.net/',
+ packages=find_packages(exclude=['*.tests*']),
+ test_suite = 'markup.tests.suite',
+ zip_safe = True
+)