How to keep parser code and grammar definition in sync?

by Donentolon   Last Updated October 09, 2019 21:05 PM

I am working with a custom, fairly simple DSL for specifying how various scripts will be run. The DSL takes the form of config files that are very simple and easy for humans to read. They define what scripts must be executed in what order and how. I have a program that parses such a config file, and executes the steps.

The config files are sometimes authored by other developers who are not familiar with the parser. They only need to know what rules the grammar has. These rules are simple, which reduces the learning curve for writing these config files.

The de facto definition of the grammar is of course my parser code. However, this is obviously not accessible to new developers. So the logical next step is to write a succinct description of the grammar, for example as an EBNF. However, as I augment the grammar with new features, the EBNF document will become obsolete, and I will have to remember to manually update it after every relevant code change. Worse yet, I may inevitably forget to update the EBNF document, leading to developers being surprised that the actual parser behaves differently than what the documentation claims.

What strategy can I use to maintain documentation of my parser's grammar? My priorities are DRY and minimal investment of developer time (both mine and others').

  • Is it a good idea to try to write the EBNF as the ground truth grammar, and then somehow generate the parser logic automatically from this EBNF? How can I avoid making the parser much more complex and difficult to maintain?
  • Can I automatically spit out the EBNF from the code logic? Is this practical in Python?
  • Am I overthinking it? Maybe the best option is to just be disciplined about manually updating the docs after all.


Related Questions



How do I document my code for minimum time review?

Updated March 08, 2018 07:05 AM