Basically, it behaves like a dictionary, but the keys can only be plain strings (sorry, not unicode strings yet). So why bother ? Because TSTs are a lot smarter than dictionaries when it comes to :
- Prefix-matching : find the longest entry in the TST which is a prefix to a given string. Handy for things like scanners, url matchers and so on.
- Scanning (corolary of the previous one) : using the Aho-Corasick algorithm, you can implement pretty efficient scanners with a TST. The good thing it that it can scale up to tens of thousands of entries and still perform well.
- Spelling correctors : find a set of entries which spelling is close to a given string. The distance used is the Levenshtein distance.
I have been using this package in production for nearly a year now, without any problem (except a surprise due to SWIG directors when I switched to Python 2.4, but this is fixed now). I release this under the LGPL. Incidentally, this is my first release under a license, up until now I was pretty much releasing my dirty works in the public domain where it could safely be ignored . I am NOT satisfied with the packaging (especially the tests), the code layout (it’s been so long I had not been writing C++ that I forgot a whole bunch of coding conventions) and so on, but I decided to release this as is, and see what happens.
Uh, yes, there is no documentation for now. It’s part of the « let’s see what happens » scheme, you see. If someone really is interested by this package, I’d be delighted to hear about you ! This would be a real motivation to write a documentation .