A declarative extension of parsing expression grammars for recognizing most programming languages
Tetsuro Matsumura, Kimio Kuramitsu

TL;DR
This paper introduces Nez, an extension to Parsing Expression Grammars (PEGs), that enables recognition of complex programming language syntax features without semantic actions, improving parsing capabilities for languages like C, Python, and Ruby.
Contribution
The paper presents Nez, a declarative extension to PEGs, allowing recognition of PEG-hard syntax in programming languages without using semantic actions.
Findings
Nez can parse C, C#, Ruby, and Python syntax.
Nez handles PEG-hard syntax like typedef names, indentation, and HERE documents.
Nez extends PEGs with symbol tables and conditional parsing.
Abstract
Parsing Expression Grammars are a popular foundation for describing syntax. Unfortunately, several syntax of programming languages are still hard to recognize with pure PEGs. Notorious cases appears: typedef-defined names in C/C++, indentation-based code layout in Python, and HERE document in many scripting languages. To recognize such PEG-hard syntax, we have addressed a declarative extension to PEGs. The "declarative" extension means no programmed semantic actions, which are traditionally used to realize the extended parsing behavior. Nez is our extended PEG language, including symbol tables and conditional parsing. This paper demonstrates that the use of Nez Extensions can realize many practical programming languages, such as C, C\#, Ruby, and Python, which involve PEG-hard syntax.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Logic, programming, and type systems · Software Testing and Debugging Techniques
