Distortion and Faults in Machine Learning Software

Shin Nakajima

arXiv:1911.11596·cs.LG·November 27, 2019

Distortion and Faults in Machine Learning Software

Shin Nakajima

PDF

Open Access

TL;DR

This paper hypothesizes that faults in machine learning programs manifest as distortions in trained models, and proposes measuring these distortions to detect hidden faults, demonstrated through MNIST dataset examples.

Contribution

It introduces a novel hypothesis linking model distortions to faults in learning programs and proposes a measurement approach for fault detection.

Findings

01

Distortions in trained models can indicate hidden faults.

02

Measuring model distortions helps in quality assurance of DNN software.

03

Demonstrated with MNIST dataset examples.

Abstract

Machine learning software, deep neural networks (DNN) software in particular, discerns valuable information from a large dataset, a set of data. Outcomes of such DNN programs are dependent on the quality of both learning programs and datasets. Unfortunately, the quality of datasets is difficult to be defined, because they are just samples. The quality assurance of DNN software is difficult, because resultant trained machine learning models are unknown prior to its development, and the validation is conducted indirectly in terms of prediction performance. This paper introduces a hypothesis that faults in the learning programs manifest themselves as distortions in trained machine learning models. Relative distortion degrees measured with appropriate observer functions may indicate that there are some hidden faults. The proposal is demonstrated with example cases of the MNIST dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification