I have added a DFA state machine to the re2net library, as described here. This method computes the DFA states on demand, which makes subsequent matches with the same instance faster. The crude benchmark below shows run times for the NFA (as in part 1), DFA’s first run, DFA’s second run, and C#‘s Regex library as a comparison.
The results below show that the DFA is about an order of magnitude slower than the NFA on the first run (as expected, since the cache is being computed), but an order of magnitude faster on the second run (since the cache is being used).
Until now, this has been an academic exercise to learn C# and build a simple regex parser. With that done, the next step is to support the full regex syntax. The RE2 library is this project, but I’m going to use Go’s regexp package since I think the code will be easier to read, and it’s the same implementation. This is all assuming I maintain interest.
Column headers, all times in seconds.
n, NFA, DFA, DFA2, C# Regex: