TL;DR
This paper introduces a complete method for accurately reasoning about JavaScript regular expressions in symbolic execution, improving test coverage and verification by modeling complex regex features.
Contribution
It presents the first comprehensive approach to handle JavaScript regex semantics in symbolic execution, including backreferences and greedy matching, with a practical implementation and evaluation.
Findings
Increased line coverage by up to 30% in JavaScript programs
Over 20% of analyzed packages use complex regex features
Effective modeling of regex operators improves test generation and verification
Abstract
Existing support for regular expressions in automated test generation or verification tools is lacking. Common aspects of regular expression engines found in mainstream programming languages, such as backreferences or greedy matching, are commonly ignored or imprecisely approximated, leading to poor test coverage or failed proofs. In this paper, we present the first complete strategy to faithfully reason about regular expressions in the context of symbolic execution, focusing on the operators found in JavaScript. We model regular expression operations using string constraints and classical regular expressions and use a refinement scheme to address the problem of matching precedence and greediness. Our survey of over 400,000 JavaScript packages from the NPM software repository shows that one fifth make use of complex regular expressions features. We implemented our model in a dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
