Categories

pytst-0.50

As promised, I’ve released a very preliminary version of pytst, a Ternary Search Tree (TST or trie) implementation in C++ with a Python interface (built with SWIG). Download it here if you dare !

Basically, it behaves like a dictionary, but the keys can only be plain strings (sorry, not unicode strings yet). So why bother ? Because TSTs are a lot smarter than dictionaries when it comes to :

  • Prefix-matching : find the longest entry in the TST which is a prefix to a given string. Handy for things like scanners, url matchers and so on.
  • Scanning (corolary of the previous one) : using the Aho-Corasick algorithm, you can implement pretty efficient scanners with a TST. The good thing it that it can scale up to tens of thousands of entries and still perform well.
  • Spelling correctors : find a set of entries which spelling is close to a given string. The distance used is the Levenshtein distance.

I have been using this package in production for nearly a year now, without any problem (except a surprise due to SWIG directors when I switched to Python 2.4, but this is fixed now). I release this under the LGPL. Incidentally, this is my first release under a license, up until now I was pretty much releasing my dirty works in the public domain where it could safely be ignored :) . I am NOT satisfied with the packaging (especially the tests), the code layout (it’s been so long I had not been writing C++ that I forgot a whole bunch of coding conventions) and so on, but I decided to release this as is, and see what happens.

Uh, yes, there is no documentation for now. It’s part of the « let’s see what happens » scheme, you see. If someone really is interested by this package, I’d be delighted to hear about you ! This would be a real motivation to write a documentation ;) .

View Comments à to “pytst-0.50”

  • Nicolas Lehuen’s ternary search tree

    I’m keeping a bookmark of Nicolas Lehuen’s pytst ternary search tree library. I don’t need one right this second, but TST’s are very useful (look at Nicolas’ description to see why). It’s LGPL open source, written in C++. If I…

  • Blue Sky On Mars:

    <!– TB –>

    Nicolas Lehuen's ternary search tree

    I'm keeping a bookmark of Nicolas Lehuen's pytst ternary search tree library. I don't need one right this second, but TST's are very useful (look at Nicolas' description to see why). It's LGPL open source, written in C++. If I…

Laisser un commentaire

blog comments powered by Disqus