On the Complexity of the Matching Problem of Regular Expressions with Backreferences
Soh Kumabe, Yuya Uezato

TL;DR
This paper explores the computational complexity of matching regular expressions with backreferences, demonstrating hardness results and providing an improved algorithm for specific cases.
Contribution
It establishes tight complexity bounds for REWB matching problems and introduces a more efficient algorithm for 1-use REWBs.
Findings
Matching REWBs is hard under SETH and W[2] complexity assumptions.
The problem cannot be solved in near-linear time for certain REWB classes unless major conjectures fail.
An $O(n \, \log^2 n)$ algorithm is proposed for 1-use REWBs, improving previous methods.
Abstract
ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that are provably efficient, running in (near-)linear time in the length of the input string? For classical regular expressions (REGEX), Thompson's construction yields a linear-time algorithm. However, practical engines support powerful features such as backreferences, which strictly extend the expressive power of REGEX but unfortunately increase the risk of ReDoS attacks. This paper investigates the fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
