Computational Model for Parsing Expression Grammars

Alexander Rubtsov; Nikita Chudinov

arXiv:2406.14911·cs.FL·September 6, 2024

Computational Model for Parsing Expression Grammars

Alexander Rubtsov, Nikita Chudinov

PDF

TL;DR

This paper introduces a computational model for Parsing Expression Grammars (PEGs), analyzing their structural properties and extending their formal framework with a new automaton model that supports efficient parsing.

Contribution

The paper presents a novel computational model for PEGs, proves key structural properties of PELs, and extends the automaton framework to improve parsing efficiency.

Findings

01

PEL class contains Boolean closure of regular closure of DCFLs

02

PEL is closed over left concatenation with regular closure of DCFLs

03

Linear-time simulation algorithm for the extended automaton model

Abstract

We present a computational model for Parsing Expression Grammars (PEGs). The predecessor of PEGs top-down parsing languages (TDPLs) were discovered by A. Birman and J. Ullman in the 1960-s, B. Ford showed in 2004 that both formalisms recognize the same class named Parsing Expression Languages (PELs). A. Birman and J. Ullman established such important properties like TDPLs generate any DCFL and some non-context-free languages like $a^{n} b^{n} c^{n}$ , a linear-time parsing algorithm was constructed as well. But since this parsing algorithm was impractical in the 60-s TDPLs were abandoned and then upgraded by B. Ford to PEGs, so the parsing algorithm was improved (from the practical point of view) as well. Now PEGs are actively used in compilers (eg., Python replaced LL(1)-parser with a PEG one) so as for text processing as well. In this paper, we present a computational model for PEG, obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.