Fast Matching of Regular Patterns with Synchronizing Counting (Technical Report)
Luk\'a\v{s} Hol\'ik, Juraj S\'i\v{c}, Lenka Turo\v{n}ov\'a,, Tom\'a\v{s} Vojnar

TL;DR
This paper introduces a new approach for fast matching of a broad class of regular expressions with counting, achieving linear time complexity and demonstrating practical coverage of common regex applications.
Contribution
It identifies the class of synchronizing regular expressions and provides an improved algorithm for their efficient matching, bridging a long-standing open problem.
Findings
Matching complexity is linear in text length for synchronizing regexes.
Empirical evidence shows the class covers most practical counting regexes.
The approach improves upon recent automata-based algorithms.
Abstract
Fast matching of regular expressions with bounded repetition, aka counting, such as (ab){50,100}, i.e., matching linear in the length of the text and independent of the repetition bounds, has been an open problem for at least two decades. We show that, for a wide class of regular expressions with counting, which we call synchronizing, fast matching is possible. We empirically show that the class covers nearly all counting used in usual applications of regex matching. This complexity result is based on an improvement and analysis of a recent matching algorithm that compiles regexes to deterministic counting-set automata (automata with registers that hold sets of numbers).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · DNA and Biological Computing · Algorithms and Data Compression
