An Anatomy of 488 Faults from Defects4J Based on the Control- and Data-Flow Graph Representations of Programs
Alexandra van der Spuy, Bernd Fischer

TL;DR
This paper introduces a new fault classification scheme based on control- and data-flow graph representations, applied to 488 faults from Defects4J, revealing dominant fault types and aiding fault localization and repair.
Contribution
A novel fault classification scheme based on control- and data-flow graphs, applied to a large dataset, providing insights into fault types for improved debugging techniques.
Findings
Most faults are classified into 1-3 classes.
Definition faults are the most common data-flow fault.
Majority of faults involve at least one control-flow fault.
Abstract
Software fault datasets such as Defects4J provide for each individual fault its location and repair, but do not characterize the faults. Current classifications use the repairs as proxies, but these do not capture the intrinsic nature of the fault. In this paper, we propose a new, direct fault classification scheme based on the control- and data-flow graph representations of programs. Our scheme comprises six control-flow and two data-flow fault classes. We manually apply this scheme to 488 faults from seven projects in the Defects4J dataset. We find that the majority of the faults are assigned between one and three classes. We also find that one of the data-flow fault classes (definition fault) is the most common individual class but that the majority of faults are classified with at least one control-flow fault class. Our proposed classification can be applied to other fault datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Fault Detection and Control Systems
