pytst 1.18
I’ve just released pytst 1.18 which fixes a bug reported by Keith Davidson. The bug was in the scan method and appeared in some very specific yet totally mundane circumstances : if you scanned an input string of a very specific length with regard to the content of the tree, then the first scan would be ok, but the next ones not. This was something that was not in the unit tests and that managed to survive many years before being reported.
The code for the scan method (in tst.h) was one of the most complicated piece of code in this project, and I had quite a hard time figuring out what was happening. In the process, I began to realise that a rewrite would be necessary, so I cleaned the slate and entirely re-designed the algorithm, this time with careful planning.
While doing this, I realised that the old algorithm was much more wrong than I thought. There is a catch with the new algorithm, however : it has to do some backtracking. That is to say, in some condition, the algorithm has to go back and read again some part of the input string.
The old algorithm never had to look back – but the old algorithm would miss some matches, if you had dictionary entries that included multiple smaller dictionary entries. For example, in a dictionary with {'111','222','111222111'}, if you provided the input string '11122211' (notice the missing 1 at the end), then you would have a match on '111', then a non match on '22211'. The match on '222' would be missed.
For the moment, I prefer to release pytst 1.18 with an algorithm that is correct yet not optimal, and try to find a non-backtracking solution later. From my measurements, the impact on performance is around 10% but YMMV.
For the record, the new algorithm looks like this :
There is backtracking in to places : (J) and (K) where the input string index i is modified. To prevent backtracking, I have to find a way to analyse the tree so that I emit any match included in the current state then change to a proper state. It’s what Aho-Corasick is supposed to do, but I need to have a better look at it since I obviously got it wrong the first time.
You can download pytst 1.18 here.
-
http://twitter.com/ivan_bezdomny Nikolai Yakovenko
-
http://twitter.com/nlehuen Nicolas Lehuen
-
dbg
-
dbg
-
http://nicolas.lehuen.com/ Nicolas Lehuen
