Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching
Martin Berglund, Frank Drewes, Brink van der Merwe

TL;DR
This paper presents a formal automata model for Java-style regex matching and introduces static analysis techniques to identify regexes that can cause exponential runtime behavior, addressing practical backtracking issues.
Contribution
It provides a formal framework for understanding regex matching in Java and develops static analysis methods to detect potentially exponential backtracking scenarios.
Findings
Automata model accurately captures Java regex matching behavior.
Static analysis can identify regexes with exponential worst-case runtime.
Framework helps improve regex engine reliability and security.
Abstract
We develop a formal perspective on how regular expression matching works in Java, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
