Compiler From Scratch: Phase 1 - Tokenizer Generator 015: Finishing Lazy Token Evaluation

2 months ago
1

Streamed on 2024-10-25 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

Last week we started with the Lazy evaluation build option, but didn't have time to finish it. So today we finished it.

As I dug in, it became clear that I needed a way to track execution of the regex DFA states to see what was happening and when. So I started coding that in. However, that introduced a bug that caused things to crash so hard that there was no debug information. After a log to printf debugging, I narrowed it down to a single snippet of text that was being written out to a file. For some reason, copying a single word out by offset using snprintf was exploding if the source text had a "%" anywhere in it . The "%" was not in the offset/count region that was actually being copied from at all. I replaced the call to snprintf with strncpy and everything worked again. It just makes me sad how much time it took to find the reason and the solution.

Once we were back up and running there was a bit of fiddling to finish off Lazy token evaluation. There are a few differences in the way the process flows when batch processing vs lazy processing, but I found them one at a time (greatly aided by the trace log of the tokenizing process). A bit of testing with other performance flags, an update to the macros in the resulting file, and lazy processing is reasonably working for all of the build options we tested.

Next week we need to write a script to test the different combinations more automatically and efficiently. There are a lot of build configuration options now.

Loading comments...