Compiler From Scratch: Phase 1 - Tokenizer Generator 012: Debugging NFA to DFA conversion

5 months ago

8

Technology Programming C Compilers Tokenizer Lexer NFA DFA

Streamed on 2024-10-04 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

Short stream.

Last time we were blocked from testing the newline functionality in the tokenizer because the newline characters weren't in the DFA correctly. There was a token effort to debug the issue, but I didn't get very far at the end of the last stream. Which brings us to today where I got serious about the issue. With lots of debugging print statements I found the issue where in the NFA to DFA conversion process. The problem was that when a state was being added to the closure it was already marked as "visited" and its children were then not being processed on the next time through the loop. A quick test and clearing that flag fixed the main issue.

While the DFA was looking better, the pattern for the newline character didn't look right. I changed the newline pattern from "([\n\r]|(\r\n))" to "(\n|(\r\n?))" and was much happier about the results. Those changes let us finally test the newline function in the tokenizer. The tokens are getting the right line numbers now. I tried to trim the newline characters off the end of the line as tracked by TxtBuf, but didn't have time to finish that off.

I also did a quick test of the string regex pattern, but didn't have time to dig in, although I have some ideas to check. So I left off with a couple of loose ends due to the shortness of the stream that we'll pick up next week.

Loading 1 comment...

Comments