Coincidental Correctness in the Defects4J Benchmark
Rawad Abou Assi, Chadi Trad, Marwan Maalouf, and Wes Masri

TL;DR
This study investigates the prevalence and characteristics of coincidental correctness in the Defects4J benchmark, analyzing its impact across different testing levels and infection paths to improve fault localization and debugging.
Contribution
The paper provides the first comprehensive analysis of coincidental correctness in Defects4J, addressing its prevalence, influencing factors, and infection dynamics in real-world benchmark data.
Findings
CC is prevalent in Defects4J.
Testing levels influence CC occurrence.
Peculiar infection paths are induced by CC tests.
Abstract
Coincidental correctness (CC) arises when a defective program produces the correct output despite the fact that the defect within was exercised. Researchers have recognized the negative impact of coincidental correctness, and the authors have previously conducted a study demonstrating its prevalence in test suites. However, that study was limited to system tests and small subjects seeded with artificial defects. In this paper, we conduct a wider scope study of CC that addresses the following research questions in the context of the Defects4J benchmark: RQ1: Is CC prevalent in Defects4J? RQ2: Is CC affected by the testing levels in Defects4J? RQ3: Do CC tests induce peculiar infection paths in Defects4J? RQ4: Are the infections likely to be nullified within or outside the buggy method? ....
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Radiation Effects in Electronics · Parallel Computing and Optimization Techniques
