Compiler From Scratch: Phase 1 - Tokenizer Generator 014: Regex code gen/testing, starting lazy eval

6 months ago

10

Technology Software & Development Programming C Compilers Tokenizer Lexer NFA DFA

Streamed on 2024-10-18 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

Last week we had debugged the NFA/DFA not picking up the "ANY" operator correctly and fixed it up through the DFA. I didn't like the way it printed out in the DFA table or DFA graphviz plot, so I fixed that first thing. That was followed up with a small refactor of function names left over from last week's output improvements. Just a couple of small, easy tasks to get warmed up.

After that it was time to do code generation for the ANY operator, which was easier than expected. It was basically putting "true" inside the if condition. A little debugging around the Regex for strings and it was good to go. After that I generated code for the StartOfLine and EndOfLine anchors. The main effort was two functions added to the ChrRef, then that function was called from the tokenizer. Since the anchor operators don't consume a character there was a bit of fiddling to suppress the "munch" function call in these cases. Just a bit more finagling around detecting the end of input and all was well.

I tested a couple of possible error cases with the new processing in place. One turned out to be fine, the other caused a crash as expected (unterminated string at the end of input). This was noted in the TODO for future attention.

And finally it was time to start building out the lazy tokenizer processing. There were a few opportunities to refactor/cleanup other code as we went through this exercise. The high-level methods for doing the lazy processing are put together well enough to compile, but there are still changes needed to get the whole thing working end-to-end, mostly in the helper functions. We'll pick up there next week.

Loading comments...

Comments