pytst : ouch !
I had no time to cover this previously (what with my new fatherhood and all), but I recently have been contacted on the 1st of september by Stani of SPE fame. He was interested in using pytst to implement a mass search/replace function. He had tried the regex way but to no avail. I gave him a bit of source code to do this with pytst, and he came back to me with his own version :
DELIMITERS = ' .,?!@():"\' '
reDELIMITERS = re.compile('([%s])'%DELIMITERS)
class MultiReplaceTST(tst.TST):
def __call__(self,input_string):
output = cStringIO.StringIO()
for source_string, status, replace_with in self.scan_with_stop_chars(input_string,DELIMITERS,tst.TupleListAction()):
if status>0:
output.write(replace_with)
else:
output.write(source_string)
return output.getvalue()
class MultiReplaceDict(dict):
def __call__(self,input_string):
input_string = reDELIMITERS.split(input_string)
output = cStringIO.StringIO()
for word in input_string:
try:
output.write(self[word])
except KeyError:
output.write(word)
return output.getvalue()
Well, it’s hard to admit, but the second version, using Python dict and re, is apparently ten times faster than the first version using pytst… Ouch !
I’ll have to test it myself, but I think one of the culprit (besides me
is SWIG. As I wrote in a previous post, profiling pytst showed that the hotspots were mainly all in the C/C++ wrapping code generated by SWIG. This is very irritating, so I now feel the urge of reimplementing pytst without using SWIG.