Using mutation testing to measure behavioural test diversity
Francisco Gomes de Oliveira Neto, Felix Dobslaw, Robert Feldt

TL;DR
This paper introduces mutation testing-based measures to quantify behavioral test diversity, improving test prioritization effectiveness by outperforming artifact-based methods across multiple open-source projects.
Contribution
It proposes novel mutation testing-derived metrics for behavioral diversity, addressing limitations of history-based approaches and enhancing test suite prioritization.
Findings
b-div measures outperform a-div and random selection
Average APFD increase of 19% to 31% across projects
Effective in prioritizing tests for fault detection
Abstract
Diversity has been proposed as a key criterion to improve testing effectiveness and efficiency.It can be used to optimise large test repositories but also to visualise test maintenance issues and raise practitioners' awareness about waste in test artefacts and processes. Even though these diversity-based testing techniques aim to exercise diverse behavior in the system under test (SUT), the diversity has mainly been measured on and between artefacts (e.g., inputs, outputs or test scripts). Here, we introduce a family of measures to capture behavioural diversity (b-div) of test cases by comparing their executions and failure outcomes. Using failure information to capture the SUT behaviour has been shown to improve effectiveness of history-based test prioritisation approaches. However, history-based techniques require reliable test execution logs which are often not available or can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
