Linear Matching of JavaScript Regular Expressions
Aur\`ele Barri\`ere (EPFL), Cl\'ement Pit-Claudel (EPFL)

TL;DR
This paper analyzes JavaScript regex semantics, identifies a subset that can be matched in linear time, and introduces novel algorithms to improve correctness and efficiency, including support for lookarounds.
Contribution
It provides the first linear-time algorithms for matching lookarounds in JavaScript regexes and corrects previous misconceptions about regex complexity and semantics.
Findings
Identified a larger subset of JavaScript regexes matchable in linear time.
Developed novel algorithms supporting lookarounds in linear time.
Validated algorithms in a prototype and integrated into V8 JavaScript engine.
Abstract
Modern regex languages have strayed far from well-understood traditional regular expressions: they include features that fundamentally transform the matching problem. In exchange for these features, modern regex engines at times suffer from exponential complexity blowups, a frequent source of denial-of-service vulnerabilities in JavaScript applications. Worse, regex semantics differ across languages, and the impact of these divergences on algorithmic design and worst-case matching complexity has seldom been investigated. This paper provides a novel perspective on JavaScript's regex semantics by identifying a larger-than-previously-understood subset of the language that can be matched with linear time guarantees. In the process, we discover several cases where state-of-the-art algorithms were either wrong (semantically incorrect), inefficient (suffering from superlinear complexity) or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Software Testing and Debugging Techniques · Security and Verification in Computing
