Coincidental Correctness in the Defects4J Benchmark

Rawad Abou Assi; Chadi Trad; Marwan Maalouf; and Wes Masri

arXiv:1808.09233·cs.SE·January 31, 2019

Coincidental Correctness in the Defects4J Benchmark

Rawad Abou Assi, Chadi Trad, Marwan Maalouf, and Wes Masri

PDF

Open Access 1 Repo

TL;DR

This study investigates the prevalence and characteristics of coincidental correctness in the Defects4J benchmark, analyzing its impact across different testing levels and infection paths to improve fault localization and debugging.

Contribution

The paper provides the first comprehensive analysis of coincidental correctness in Defects4J, addressing its prevalence, influencing factors, and infection dynamics in real-world benchmark data.

Findings

01

CC is prevalent in Defects4J.

02

Testing levels influence CC occurrence.

03

Peculiar infection paths are induced by CC tests.

Abstract

Coincidental correctness (CC) arises when a defective program produces the correct output despite the fact that the defect within was exercised. Researchers have recognized the negative impact of coincidental correctness, and the authors have previously conducted a study demonstrating its prevalence in test suites. However, that study was limited to system tests and small subjects seeded with artificial defects. In this paper, we conduct a wider scope study of CC that addresses the following research questions in the context of the Defects4J benchmark: RQ1: Is CC prevalent in Defects4J? RQ2: Is CC affected by the testing levels in Defects4J? RQ3: Do CC tests induce peculiar infection paths in Defects4J? RQ4: Are the infections likely to be nullified within or outside the buggy method? ....

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aub-software-testing/d4j_cc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Radiation Effects in Electronics · Parallel Computing and Optimization Techniques