Distortion and Faults in Machine Learning Software
Shin Nakajima

TL;DR
This paper hypothesizes that faults in machine learning programs manifest as distortions in trained models, and proposes measuring these distortions to detect hidden faults, demonstrated through MNIST dataset examples.
Contribution
It introduces a novel hypothesis linking model distortions to faults in learning programs and proposes a measurement approach for fault detection.
Findings
Distortions in trained models can indicate hidden faults.
Measuring model distortions helps in quality assurance of DNN software.
Demonstrated with MNIST dataset examples.
Abstract
Machine learning software, deep neural networks (DNN) software in particular, discerns valuable information from a large dataset, a set of data. Outcomes of such DNN programs are dependent on the quality of both learning programs and datasets. Unfortunately, the quality of datasets is difficult to be defined, because they are just samples. The quality assurance of DNN software is difficult, because resultant trained machine learning models are unknown prior to its development, and the validation is conducted indirectly in terms of prediction performance. This paper introduces a hypothesis that faults in the learning programs manifest themselves as distortions in trained machine learning models. Relative distortion degrees measured with appropriate observer functions may indicate that there are some hidden faults. The proposal is demonstrated with example cases of the MNIST dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
