pytst 1.01
Download it here.
What’s new ?
- There is at last a nice test suite which I use to ensure that there are no regression and that both the SWIG version and the Boost.Python version are in sync. The test suite also records performance data in a CSV file which I can then analyse with Excel to check if there is a performance improvement or regression when I introduce something.
- A creeping bug in
close_matchis definitely fixed. I always knew it was here but could not properly reproduce it. Luckily with the new test suite I was able to track it. - WARNING : there is a change in the
DictActioncollector : it now retains the lowest distance whenever akey,valuepair is seen multiple times, for instance inclose_match. Theclose_matchalgorithm now stores distances (the lower the better) rather than remaining distances (the higher the better). For instance, if you were looking for all entries with a maximum distance of1from"123", you will now get"122"with a distance of1instead of a remaining distance of0. This is much more intuitive than before. - Speaking about collectors, you know, those
ListAction,TupleListAction,DictActionclasses ? Well, those are not the only way to collect data from the TST. There are now true Python iterators ! Instead of usingwalkorclose_match, you can use the default iterator, theiterator([root])method or theclose_match_iterator(string,max_distance)method.
Examples of iterator usage :
from tst import * t = tst.TST()
# ... filling t...
# BEFORE :
contents = t.walk(None,DictAction())
# NOW :
content = {}
for key, value in t:
content[key] = value
# BEFORE :
# All words beginning by "foo" :
from_foo = t.walk(None,DictAction(),"foo")
# NOW :
from_foo = {}
for key, value in r.iterator("foo"):
from_foo[key]=value
# BEFORE :
# All words close to "foo" :
close_to_foo = t.close_match("foo",1,None,DictAction())
# NOW :
close_to_foo = {}
for key, value in r.close_match_iterator("foo",1):
if key in close_to_foo:
# the close match iterator can return the same
# key multiple times
print "%s already inserted"%key
else:
close_to_foo[key] = value
What’s the point if the new code is bigger ? Well, if your tree is big, you may save some memory by iterating into the tree rather than by collecting all the data in a list / list of tuples / dictionary THEN processing it. That’s your choice ! Performance-wise, I’d say the collectors should be faster, but I still need to check this.