Efficient Matching with Memoization for Regexes with Look-around and Atomic Grouping (Extended Version)
Hiroya Fujinami, Ichiro Hasuo

TL;DR
This paper introduces linear-time backtracking algorithms for extended regexes with look-around and atomic grouping, using optimized memoization to prevent catastrophic backtracking and improve performance in real-world applications.
Contribution
It extends Davis et al.'s linear-time regex matching algorithm to handle look-around and atomic groups with optimized memoization strategies.
Findings
Algorithms achieve linear-time matching for extended regexes.
Experiments show significant performance improvements over traditional backtracking.
Memoization table optimization reduces memory usage without sacrificing speed.
Abstract
Regular expression (regex) matching is fundamental in many applications, especially in web services. However, matching by backtracking -- preferred by most real-world implementations for its practical performance and backward compatibility -- can suffer from so-called catastrophic backtracking, which makes the number of backtracking super-linear and leads to the well-known ReDoS vulnerability. Inspired by a recent algorithm by Davis et al. that runs in linear time for (non-extended) regexes, we study efficient backtracking matching for regexes with two common extensions, namely look-around and atomic grouping. We present linear-time backtracking matching algorithms for these extended regexes. Their efficiency relies on memoization, much like the one by Davis et al.; we also strive for smaller memoization tables by carefully trimming their range. Our experiments -- we used some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Web Application Security Vulnerabilities · Software Testing and Debugging Techniques
