Regular Expression Matching and Operational Semantics
Asiri Rathnayake (University of Birmingham, United Kingdom), Hayo, Thielecke (University of Birmingham, United Kingdom)

TL;DR
This paper formalizes the operational semantics of regular expression matching, deriving abstract machines from theoretical definitions to practical implementations, including parallel processing on GPUs.
Contribution
It introduces a formal framework for regex matching via operational semantics and develops various abstract machines, including parallel implementations for high-performance computing.
Findings
Formal semantics for regex matching derived
Development of abstract machines including parallel GPU implementation
Preliminary experiments demonstrate feasibility of parallel regex matching
Abstract
Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it were a program with some non-deterministic constructs such as the Kleene star. We formalize this implementation technique for regular expression matching using operational semantics. Specifically, we derive a series of abstract machines, moving from the abstract definition of matching to increasingly realistic machines. First a continuation is added to the operational semantics to describe what remains to be matched after the current expression. Next, we represent the expression as a data structure using pointers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
