Compiler From Scratch: Phase 1 - Tokenizer Generator 007A: Code generating char/string references

4 months ago
3

Streamed on 2024-08-30 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

Interrupted stream today. My son unplugged the power to router so it took a few minutes to get the internet back up. There is a part B.

Today we generated more code to support the tokenizer we will be generating. We generated a class to handle character references (ChrRef) and another to handle string references (StrRef). The main idea is to never copy or pass strings around. These references just contain enough information to point to a location in the text buffer (TxtBuf). The only time these things will be turned into strings is when reporting errors and time doesn't matter any more.

Based on the project settings, you can generate these classes for one of three encodings: ASCII, Latin1 and UTF8. The UTF8 one will be the slowest, because if has to do quite a lot of verification for each character. The ASCII one will be pretty fast, but still has to check that each character is in [0-127]. Latin1 will be the fastest since there is no character checking.

Loading comments...