RE#: High Performance Derivative-Based Regex Matching with Intersection,   Complement and Lookarounds

Ian Erik Varatalu; Margus Veanes; Juhan-Peep Ernits

arXiv:2407.20479·cs.FL·January 27, 2025

RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds

Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits

PDF

Open Access

TL;DR

RE# introduces a high-performance regex matching tool based on symbolic derivatives that supports advanced operators like intersection and lookarounds, achieving significant speed improvements over existing engines.

Contribution

The paper develops a formal theory and implementation of RE# that supports complex regex operators without backtracking, with proven linear complexity and superior performance.

Findings

01

RE# is over 71% faster than the next fastest Rust regex engine.

02

RE# outperforms all state-of-the-art engines on extended benchmarks.

03

The main matching algorithm has input-linear complexity both theoretically and experimentally.

Abstract

We present a tool and theory RE# for regular expression matching that is built on symbolic derivatives, does not use backtracking, and, in addition to the classical operators, also supports complement, intersection and lookarounds. We develop the theory formally and show that the main matching algorithm has input-linear complexity both in theory as well as experimentally. We apply thorough evaluation on popular benchmarks that show that RE# is over 71% faster than the next fastest regex engine in Rust on the baseline, and outperforms all state-of-the-art engines on extensions of the benchmarks often by several orders of magnitude.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques