An Empirical Study of the Realism of Mutants in Deep Learning

Zaheed Ahmed; Philip Makedonski; Jens Grabowski

arXiv:2512.16741·cs.SE·December 19, 2025

An Empirical Study of the Realism of Mutants in Deep Learning

Zaheed Ahmed, Philip Makedonski, Jens Grabowski

PDF

Open Access

TL;DR

This paper empirically compares the realism of pre-training and post-training mutants in deep learning, finding pre-training mutants more closely resemble real faults but at higher computational cost.

Contribution

It introduces a statistical framework to evaluate the behavioral similarity of mutants to real faults in deep learning, providing the first empirical comparison of mutation approaches.

Findings

01

Pre-training mutants show higher realism than post-training mutants.

02

Pre-training mutants have stronger coupling and behavioral similarity to real faults.

03

Pre-training mutation is computationally expensive, prompting the need for more efficient operators.

Abstract

Mutation analysis is a well-established technique for assessing test quality in the traditional software development paradigm by injecting artificial faults into programs. Its application to deep learning (DL) has expanded beyond classical testing to support tasks such as fault localization, repair, data generation, and model robustness evaluation. The core assumption is that mutants behave similarly to real faults, an assumption well established in traditional software systems but largely unverified for DL. This study presents the first empirical comparison of pre-training and post-training mutation approaches in DL with respect to realism. We introduce a statistical framework to quantify their coupling strength and behavioral similarity to real faults using publicly available bugs datasets: CleanML, DeepFD, DeepLocalize, and defect4ML. Mutants are generated using state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability