Compiler From Scratch: Phase 1 - Tokenizer Generator 013: Line tracking TxtBuf, fixing ANY in regex

1 month ago

12

Technology Software & Development Programming C Compilers Tokenizer Lexer NFA DFA

Streamed on 2024-10-11 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

Last week I fixed the newline issue on the regex side, but the TxtBuf wasn't tracking the lines correctly at the end of that session; So that's what I started with today. It needed to trim the trailing whitespace, there was an off-by-one error with that, there was some special handling around the last line of the file and a small update to handle empty lines. With that working, the line tracking works great and we can now print out lines of text and underline the token in that context.

After that it was back to tackling the string regex. I was mostly sure that it had to do with the regex ANY operator. But as I was working through that issue, I was depressed by how bad the outputs were that I had to look at ... inconsistent, incomplete, and difficult to escape when embedding the text. So I backed up to an old TechDebt item on the TODO list and went to town on it. It took a bit of doing, but the output on the console and in the NFA/DFA graphs are now consistent and much more readable with proper escaping.

With better output to look at it was time to dig into the string regex. I identified the issue with the empty exclusion (ANY regex operator ".") and had to patch around that a bit in a few places. Ultimately it wasn't too bad to get the NFA and DFA generating correctly. The code generation doesn't work with these updates yet, but that is a problem to be solved next week.

Loading comments...

Comments