Categories

pytst : ouch !

I had no time to cover this previously (what with my new fatherhood and all), but I recently have been contacted on the 1st of september by Stani of SPE fame. He was interested in using pytst to implement a mass search/replace function. He had tried the regex way but to no avail. I gave him a bit of source code to do this with pytst, and he came back to me with his own version :

DELIMITERS = ' .,?!@():"\' '
reDELIMITERS = re.compile('([%s])'%DELIMITERS)

class MultiReplaceTST(tst.TST):
     def __call__(self,input_string):
         output = cStringIO.StringIO()
         for source_string, status, replace_with in self.scan_with_stop_chars(input_string,DELIMITERS,tst.TupleListAction()):
             if status>0:
                 output.write(replace_with)
             else:
                 output.write(source_string)
         return output.getvalue()

class MultiReplaceDict(dict):
     def __call__(self,input_string):
         input_string = reDELIMITERS.split(input_string)
         output = cStringIO.StringIO()
         for word in input_string:
             try:
                 output.write(self[word])
             except KeyError:
                 output.write(word)
         return output.getvalue()

Well, it’s hard to admit, but the second version, using Python dict and re, is apparently ten times faster than the first version using pytst… Ouch !

I’ll have to test it myself, but I think one of the culprit (besides me :) is SWIG. As I wrote in a previous post, profiling pytst showed that the hotspots were mainly all in the C/C++ wrapping code generated by SWIG. This is very irritating, so I now feel the urge of reimplementing pytst without using SWIG.