Categories

pytst 1.01

Download it here.

What’s new ?

  • There is at last a nice test suite which I use to ensure that there are no regression and that both the SWIG version and the Boost.Python version are in sync. The test suite also records performance data in a CSV file which I can then analyse with Excel to check if there is a performance improvement or regression when I introduce something.
  • A creeping bug in close_match is definitely fixed. I always knew it was here but could not properly reproduce it. Luckily with the new test suite I was able to track it.
  • WARNING : there is a change in the DictAction collector : it now retains the lowest distance whenever a key,value pair is seen multiple times, for instance in close_match. The close_match algorithm now stores distances (the lower the better) rather than remaining distances (the higher the better). For instance, if you were looking for all entries with a maximum distance of 1 from "123", you will now get "122" with a distance of 1 instead of a remaining distance of 0. This is much more intuitive than before.
  • Speaking about collectors, you know, those ListAction, TupleListAction, DictAction classes ? Well, those are not the only way to collect data from the TST. There are now true Python iterators ! Instead of using walk or close_match, you can use the default iterator, the iterator([root]) method or the close_match_iterator(string,max_distance) method.

Examples of iterator usage :

from tst import * t = tst.TST()

# ... filling t...

# BEFORE :
contents = t.walk(None,DictAction())

# NOW :
content = {}
for key, value in t:
     content[key] = value

# BEFORE :
# All words beginning by "foo" :
from_foo = t.walk(None,DictAction(),"foo")

# NOW :
from_foo = {}
for key, value in r.iterator("foo"):
     from_foo[key]=value

# BEFORE :
# All words close to "foo" :
close_to_foo = t.close_match("foo",1,None,DictAction()) 

# NOW :
close_to_foo = {}
for key, value in r.close_match_iterator("foo",1):
     if key in close_to_foo:
         # the close match iterator can return the same
         # key multiple times
         print "%s already inserted"%key
     else:
         close_to_foo[key] = value

What’s the point if the new code is bigger ? Well, if your tree is big, you may save some memory by iterating into the tree rather than by collecting all the data in a list / list of tuples / dictionary THEN processing it. That’s your choice ! Performance-wise, I’d say the collectors should be faster, but I still need to check this.