TL;DR
This paper characterizes the computational complexity of regular expression matching based on expression depth, revealing near-linear solutions for shallow expressions and SETH-based hardness for more complex cases, with improved algorithms for specific problems.
Contribution
It provides a complexity dichotomy for regular expression matching based on depth and introduces faster algorithms for the word break problem.
Findings
Depth-two expressions are solvable in near-linear time, except for concatenations of stars.
Depth-three expressions are either solvable in strongly sub-quadratic time or not, assuming SETH.
The runtime for the word break problem is improved from O(n√m) to O(nm^{0.44...}).
Abstract
Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an running time (where is the length of the pattern and is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Which Regular Expression Patterns are Hard to Match?· youtube
