Test suite effectiveness metric evaluation: what do we know and what should we do?
Peng Zhang, Yang Wang, Xutong Liu, Yibiao Yang, Yanhui Li, Lin Chen,, Ziyuan Wang, Chang-ai Sun, Yuming Zhou

TL;DR
This paper introduces a framework called ASSENT to evaluate and compare test suite effectiveness metrics accurately by establishing clear ground truths and benchmark test suites, revealing that mutation scores are most effective and that MTEs tend to overestimate effectiveness.
Contribution
The paper proposes the ASSENT framework for rigorous evaluation of test suite effectiveness metrics, addressing inconsistencies and clarifying the meaning of real faults.
Findings
Mutation score metrics are most aligned with real faults.
Using mutants overestimates effectiveness by more than 20%.
ASSENT enables accurate comparison of effectiveness metrics.
Abstract
Comparing test suite effectiveness metrics has always been a research hotspot. However, prior studies have different conclusions or even contradict each other for comparing different test suite effectiveness metrics. The problem we found most troubling to our community is that researchers tend to oversimplify the description of the ground truth they use. For example, a common expression is that "we studied the correlation between real faults and the metric to evaluate (MTE)". However, the meaning of "real faults" is not clear-cut. As a result, there is a need to scrutinize the meaning of "real faults". Without this, it will be half-knowledgeable with the conclusions. To tackle this challenge, we propose a framework ASSENT (evAluating teSt Suite EffectiveNess meTrics) to guide the follow-up research. In nature, ASSENT consists of three fundamental components: ground truth, benchmark test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Software Engineering Techniques and Practices
