Compiler From Scratch: Phase 1 - Tokenizer Generator 021: Using tokenizer in the tokenizer generator

1 month ago

16

Technology Software & Development Programming C Compilers Tokenizer Lexer NFA DFA

Streamed on 2024-12-06 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

The tokenizer generator has to parse two files (so far): the project file and the tokenizer definition file. If we can generate a tokenizer, why not use that to parse those files? There's no reason not to so I started that task today. There is a bit of back and forth as you update your file parsing and the supported tokens and keeping them in sync enough to continue to build and run while making the switch.

Things were going reasonably smoothly until I ran into a bug. I thought it was a bug that I had been anticipating for quite a while, but it turned out to be something else. I anticipated it being an ordering problem, but instead it looks like our DFA isn't quite formed correctly. There are states that should be merged, like if two keywords start with the same letter. Also, there are states that should have reasonable fallbacks if they don't complete (or if they continue after the expected end) as in the case of a KEYWORD being a subset of an identifier. If the KEYWORD doesn't match exactly, it can and should still skip over into the identifier track. But that isn't happening right now.

I started looking into this bug, but didn't have time to finish it. Today's stream was a bit short and the code of interest is stuff I haven't looked at in quite a long time, so there was quite a bit of time spent trying to remember how it works and reason about what the fix should be. We'll finish debugging this next week.

Loading comments...

Comments