Compiler From Scratch: Phase 1 - Tokenizer Generator 009: Generating DFA State code

5 months ago

7

Technology Software & Development Programming C Compilers Tokenizer Lexer

Streamed on 2024-09-13 (https://www.twitch.tv/thediscouragerofhesitancy)

Zero Dependencies Programming!

I have to confess to cheating ... I did some work on the compiler off-stream. I created an asset pipeline for managing Code Snippets ... large chunks of text that are the same no matter what. These Code Snippets can be divided into chunks with a line of "...", so I can write a chunk, then emit custom code, and move on to the next chunk. The other thing was I refactored TxtBuf and ChrRef to remove the circular dependency that would necessarily exist if I use ChrRef in a TxtBuf::Iterator. And now I have a TxtBuff::Iterator. But that was all the cheating, I swear! Everything else we do on-stream.

Today was starting to emit DFAStates into our Tokenizer. Each DFAState has a list of Transitions and each Transition contains a list of conditions to check, and where the next DFAState is if those conditions are met. All the code emitted today was into simple methods, one per DFAState.

But a lot of the complexity we will face will come in the form of the PerformanceSwitches I am putting in. Whenever I come to a decision that I think will impact performance I am emitting code for each of those decisions, wrapped in a #ifdef/#endif so we can change some defines in our build and test the alternatives. There is some overlap between some of the options I've thought up, so we are going to be emitting way more code than we will end up with. As we go we will confirm correctness of each PerformanceSwitch, but we won't profile until things are working end-to-end and we have a large body of text to test against ... maybe JSON? So, if this stage of development seems a little slow, remember, I am generating 96 (currently) permutations of the way this code could be written.

Loading comments...

Comments