How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset
Rafael-Michael Karampatsis, Charles Sutton

TL;DR
This paper introduces the ManySStuBs4J dataset, comprising over 150,000 single-statement bug fixes from Java projects, to estimate how often such bugs occur and how they can be repaired using a small set of templates.
Contribution
It provides a large, annotated dataset of simple bugs and analyzes their frequency and repairability, filling a gap in empirical data for program repair research.
Findings
33% of simple bug fixes match predefined templates
Template fitting bugs occur roughly once every 1,600-2,500 lines of code
Dataset enables future research in program repair and empirical software engineering
Abstract
Program repair is an important but difficult software engineering problem. One way to achieve acceptable performance is to focus on classes of simple bugs, such as bugs with single statement fixes, or that match a small set of bug templates. However, it is very difficult to estimate the recall of repair techniques for simple bugs, as there are no datasets about how often the associated bugs occur in code. To fill this gap, we provide a dataset of 153,652 single statement bug-fix changes mined from 1,000 popular open-source Java projects, annotated by whether they match any of a set of 16 bug templates, inspired by state-of-the-art program repair techniques. In an initial analysis, we find that about 33% of the simple bug fixes match the templates, indicating that a remarkable number of single-statement bugs can be repaired with a relatively small set of templates. Further, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software Reliability and Analysis Research
