Fast Matching of Regular Patterns with Synchronizing Counting (Technical   Report)

Luk\'a\v{s} Hol\'ik; Juraj S\'i\v{c}; Lenka Turo\v{n}ov\'a,; Tom\'a\v{s} Vojnar

arXiv:2301.12851·cs.FL·January 31, 2023·1 cites

Fast Matching of Regular Patterns with Synchronizing Counting (Technical Report)

Luk\'a\v{s} Hol\'ik, Juraj S\'i\v{c}, Lenka Turo\v{n}ov\'a,, Tom\'a\v{s} Vojnar

PDF

Open Access

TL;DR

This paper introduces a new approach for fast matching of a broad class of regular expressions with counting, achieving linear time complexity and demonstrating practical coverage of common regex applications.

Contribution

It identifies the class of synchronizing regular expressions and provides an improved algorithm for their efficient matching, bridging a long-standing open problem.

Findings

01

Matching complexity is linear in text length for synchronizing regexes.

02

Empirical evidence shows the class covers most practical counting regexes.

03

The approach improves upon recent automata-based algorithms.

Abstract

Fast matching of regular expressions with bounded repetition, aka counting, such as (ab){50,100}, i.e., matching linear in the length of the text and independent of the repetition bounds, has been an open problem for at least two decades. We show that, for a wide class of regular expressions with counting, which we call synchronizing, fast matching is possible. We empirically show that the class covers nearly all counting used in usual applications of regex matching. This complexity result is based on an improvement and analysis of a recent matching algorithm that compiles regexes to deterministic counting-set automata (automata with registers that hold sets of numbers).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · DNA and Biological Computing · Algorithms and Data Compression