From Regexes to Parsing Expression Grammars
S\'ergio Medeiros, Fabio Mascarenhas, Roberto Ierusalimschy

TL;DR
This paper formalizes regex pattern-matching by transforming regexes into Parsing Expression Grammars, providing a clear semantics, accommodating extensions, and enabling efficiency estimation and correctness verification.
Contribution
It introduces a formal transformation from regexes to PEGs, clarifying their semantics and supporting extensions and optimization.
Findings
Provides a formal semantics for regex extensions
Enables estimation of regex matcher efficiency
Supports correctness-preserving optimizations
Abstract
Most scripting languages nowadays use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an informal semantics that is different from the semantics of regular expressions, removing the commutativity of alternation and adding ad-hoc extensions that cannot be expressed by formalisms for efficient recognition of regular languages, such as deterministic finite automata. Parsing Expression Grammars are a formalism that can describe all deterministic context-free languages and has a simple computational model. In this paper, we present a formalization of regexes via transformation to Parsing Expression Grammars. The proposed transformation easily accommodates several of the common regex extensions, giving a formal meaning to them. It also provides a clear computational model that helps to estimate the efficiency of regex-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
