What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection
Annibale Panichella, Cynthia C. S. Liem

TL;DR
This paper critically examines how mutation testing for machine learning aligns with classical mutation testing principles, highlighting conceptual gaps and proposing improvements for better methodological consistency.
Contribution
The work offers a critical reflection on current ML mutation testing approaches, identifying misalignments with classical mutation testing theories and suggesting actionable improvements.
Findings
Current ML mutation testing blurs production and test code distinctions.
Classical mutation testing hypotheses do not directly apply to ML development.
Proposed action points aim to improve alignment with classical mutation testing paradigms.
Abstract
Mutation testing is a well-established technique for assessing a test suite's quality by injecting artificial faults into production code. In recent years, mutation testing has been extended to machine learning (ML) systems, and deep learning (DL) in particular; researchers have proposed approaches, tools, and statistically sound heuristics to determine whether mutants in DL systems are killed or not. However, as we will argue in this work, questions can be raised to what extent currently used mutation testing techniques in DL are actually in line with the classical interpretation of mutation testing. We observe that ML model development resembles a test-driven development (TDD) process, in which a training algorithm (`programmer') generates a model (program) that fits the data points (test data) to labels (implicit assertions), up to a certain threshold. However, considering proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research
