Practical LR Parser Generation
Joe Zimmerman

TL;DR
This paper introduces a novel approach to automatically generate efficient LR parsers for a broad class of programming languages, overcoming traditional limitations and demonstrating practical performance improvements over hand-written parsers.
Contribution
It presents new algorithms and extensions, including automata optimization, grammar transformation, and the XLR extension, enabling automatic parser generation for a wide range of languages.
Findings
Generated parsers are 1.2x faster than hand-written Golang parser.
Generated parsers are 4.3x faster than CPython parser.
The approach supports a broad class of practical grammars.
Abstract
Parsing is a fundamental building block in modern compilers, and for industrial programming languages, it is a surprisingly involved task. There are known approaches to generate parsers automatically, but the prevailing consensus is that automatic parser generation is not practical for real programming languages: LR/LALR parsers are considered to be far too restrictive in the grammars they support, and LR parsers are often considered too inefficient in practice. As a result, virtually all modern languages use recursive-descent parsers written by hand, a lengthy and error-prone process that dramatically increases the barrier to new programming language development. In this work we demonstrate that, contrary to the prevailing consensus, we can have the best of both worlds: for a very general, practical class of grammars -- a strict superset of Knuth's canonical LR -- we can generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Software Testing and Debugging Techniques · Formal Methods in Verification
