Which Regular Expression Patterns are Hard to Match?

Arturs Backurs; Piotr Indyk

arXiv:1511.07070·cs.CC·September 28, 2016

Which Regular Expression Patterns are Hard to Match?

Arturs Backurs, Piotr Indyk

PDF

1 Video

TL;DR

This paper characterizes the computational complexity of regular expression matching based on expression depth, revealing near-linear solutions for shallow expressions and SETH-based hardness for more complex cases, with improved algorithms for specific problems.

Contribution

It provides a complexity dichotomy for regular expression matching based on depth and introduces faster algorithms for the word break problem.

Findings

01

Depth-two expressions are solvable in near-linear time, except for concatenations of stars.

02

Depth-three expressions are either solvable in strongly sub-quadratic time or not, assuming SETH.

03

The runtime for the word break problem is improved from O(n√m) to O(nm^{0.44...}).

Abstract

Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an $O (mn)$ running time (where $m$ is the length of the pattern and $n$ is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Which Regular Expression Patterns are Hard to Match?· youtube