Solving String Constraints With Regex-Dependent Functions Through Transducers With Priorities And Variables
Taolue Chen, Alejandro Flores Lamas, Matthew Hague, Zhilei Han,, Denghang Hu, Shuanglong Kan, Anthony Widjaja Lin, Philipp Ruemmer, Zhilin Wu

TL;DR
This paper introduces a novel string solver based on prioritized streaming string transducers (PSST) that natively supports complex RegEx features like greedy/lazy semantics and capturing groups, validated through extensive experiments.
Contribution
It proposes the first formal string theory and solver supporting RegEx-dependent functions using PSSTs, combining priorities and variables for improved string constraint solving.
Findings
Validated the formal semantics with JavaScript RegEx behavior
Achieved significant improvements in precision and efficiency over existing methods
Demonstrated the solver's effectiveness on over 195,000 string constraints
Abstract
Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting important features of RegEx, such a string solver is hitherto missing. In this paper, we propose the first string theory and string solver that natively provide such a support. The key idea of our string solver is to introduce a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions. PSSTs combine priorities, which have previously been introduced in prioritized finite-state automata to capture greedy/lazy semantics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Application Security Vulnerabilities · Web Data Mining and Analysis
