Demystifying Regular Expression Bugs: A comprehensive study on regular expression bug causes, fixes, and testing
Peipei Wang, Chris Brown, Jamie A. Jennings, Kathryn T. Stolee

TL;DR
This study empirically analyzes 356 regex-related bugs from major open-source projects, revealing that incorrect behavior is the main cause, fixes are complex, and test changes are often missing, highlighting practical challenges in regex usage.
Contribution
It provides a comprehensive classification of regex bugs, fixes, and testing practices, offering insights into real-world regex bug characteristics and developer challenges.
Findings
Incorrect regex behavior is the main root cause (46.3%).
Fixing regex bugs takes more time and code than typical bugs.
Over half of regex pull requests lack test code changes.
Abstract
Regular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 merged regex-related pull request bugs from Apache, Mozilla, Facebook, and Google GitHub repositories. We identify and classify the nature of the regular expression problems, the fixes, and the related changes in the test code. The most important findings in this paper are as follows: 1) incorrect regular expression behavior is the dominant root cause of regular expression bugs (165/356, 46.3%). The remaining root causes are incorrect API usage (9.3%) and other code issues that require regular expression changes in the fix (29.5%), 2) fixing regular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Web Application Security Vulnerabilities
