Mining Bug Repositories for Multi-Fault Programs

Dylan Callaghan; Bernd Fischer

arXiv:2403.19171·cs.SE·April 11, 2024·1 cites

Mining Bug Repositories for Multi-Fault Programs

Dylan Callaghan, Bernd Fischer

PDF

Open Access

TL;DR

This paper extends bug datasets like Defects4J and BugsInPy to include multi-fault entries, enabling more realistic evaluation of debugging tools by simulating real-world multi-bug scenarios.

Contribution

It introduces a method to identify and locate multiple bugs within single software entries, enhancing existing datasets for more comprehensive testing.

Findings

01

Created datasets with multiple bugs per entry

02

Maintained original dataset properties and usability

03

Enabled realistic multi-fault software evaluation

Abstract

Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry, while real-world software projects (including those used in Defects4J and BugsInPy) typically contain multiple bugs at the same time. We lift this limitation and describe an extension to these datasets in which multiple bugs are identified in individual entries. We use test case transplantation and fault location translation, in order to expose and locate the bugs, respectively. We thus provide datasets of true multi-fault versions within real-world software projects, which maintain the properties and usability of the original datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability