Towards the Systematic Testing of Regular Expression Engines
Berk \c{C}akar, Dongyoon Lee, James C. Davis

TL;DR
This paper presents ReTest, a systematic testing framework for regex engines that combines grammar-aware fuzzing and metamorphic testing to improve bug detection and coverage.
Contribution
The work introduces ReTest, a novel framework that enhances regex engine testing through grammar-aware fuzzing and dialect-independent metamorphic testing.
Findings
ReTest achieves 3x higher edge coverage than existing fuzzers.
Identified three new memory safety defects in PCRE.
Analyzed 1,007 regex engine bugs and 156 CVEs.
Abstract
Software engineers use regular expressions (regexes) across a wide range of domains and tasks. To support regexes, software projects must integrate a regex engine, whether provided natively by the language runtime (e.g., Python's re) or included as an external dependency (e.g., PCRE). However, these engines may contain bugs and introduce vulnerabilities. A common strategy for testing regex engines involves differential testing -- comparing outputs across different implementations. However, this approach is concerning because regex syntax and semantics vary significantly between dialects (e.g., POSIX vs. PCRE). Fuzzing is also utilized to ease testing of feature-rich regex implementations to expose defects, but naive byte-level mutations generate syntactically invalid inputs that exercise only parsing logic, not matching internals. In this work, we describe our progress towards ReTest,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Web Application Security Vulnerabilities
