Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions
Louis G. Michael IV, James Donohue, James C. Davis, Dongyoon Lee,, Francisco Servant

TL;DR
This study explores the decision-making process, difficulties, and risks faced by developers when programming regexes, revealing widespread challenges and lack of awareness about security issues, with implications for future tools and practices.
Contribution
First comprehensive analysis of regex development cycle focusing on decision-making, difficulties, and risk awareness, based on surveys and interviews with professional developers.
Findings
Regexes are hard to read, search, validate, and document.
Most developers are unaware of critical security risks in regexes.
Developers who know about risks often do not handle them effectively.
Abstract
Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor the difficulties that they face. In this paper, we provide the first study of the regex development cycle, with a focus on (1) how developers make decisions throughout the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We took a mixed-methods approach, surveying 279 professional developers from a diversity of backgrounds (including top tech firms) for a high-level perspective, and interviewing 17 developers to learn the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Software Testing and Debugging Techniques
