FlakiMe: Laboratory-Controlled Test Flakiness Impact Assessment. A Case Study on Mutation Testing and Program Repair
Maxime Cordy, Renaud Rwemalika, Mike Papadakis, Mark Harman

TL;DR
This paper introduces FlakiMe, a platform for controlled assessment of test flakiness, revealing its significant impact on mutation testing and program repair, with insights into mitigation strategies.
Contribution
We present FlakiMe, a novel laboratory platform enabling controlled study of test flakiness effects on testing and repair techniques, providing new insights and mitigation approaches.
Findings
5% flakiness affects mutation score modestly (2-4%)
Flakiness can completely hinder program repair in 50% of cases
Minimal user feedback can reduce flakiness impact
Abstract
Much research on software testing makes an implicit assumption that test failures are deterministic such that they always witness the presence of the same defects. However, this assumption is not always true because some test failures are due to so-called flaky tests, i.e., tests with non-deterministic outcomes. Unfortunately, flaky tests have major implications for testing and test-dependent activities such as mutation testing and automated program repair. To deal with this issue, we introduce a test flakiness assessment and experimentation platform, called FlakiMe, that supports the seeding of a (controllable) degree of flakiness into the behaviour of a given test suite. Thereby, FlakiMe equips researchers with ways to investigate the impact of test flakiness on their techniques under laboratory-controlled conditions. We use FlakiME to report results and insights from case studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software Reliability and Analysis Research
