Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J
Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus,, Marcelo A. Maia

TL;DR
This paper provides a detailed analysis of 395 patches from the Defects4J bug dataset, revealing key properties of bug fixes to aid researchers in understanding and comparing datasets and techniques.
Contribution
It introduces a comprehensive methodology for characterizing bug patches, combining automatic and manual analysis, and offers new insights into the properties of Defects4J patches.
Findings
Median patch size is four lines.
92% of patches modify only one file.
Top repair actions are addition of method calls, conditionals, and assignments.
Abstract
Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "for which bugs is my technique effective?" depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
