DeepMutation: Mutation Testing of Deep Learning Systems
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu,, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, Yadong Wang

TL;DR
DeepMutation introduces a mutation testing framework for deep learning systems, enabling evaluation of test data quality by injecting faults at data and model levels to assess robustness and generality.
Contribution
It proposes novel mutation operators for deep learning, both at source and model levels, tailored to evaluate test data quality in DL systems.
Findings
Effective fault injection demonstrated on MNIST and CIFAR-10 datasets.
Test data quality can be quantitatively assessed through fault detection.
Framework enhances understanding of test suite effectiveness for DL models.
Abstract
Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Adversarial Robustness in Machine Learning
