Systematic Training and Testing for Machine Learning Using Combinatorial   Interaction Testing

Tyler Cody; Erin Lanus; Daniel D. Doyle; Laura Freeman

arXiv:2201.12428·cs.LG·February 1, 2022·1 cites

Systematic Training and Testing for Machine Learning Using Combinatorial Interaction Testing

Tyler Cody, Erin Lanus, Daniel D. Doyle, Laura Freeman

PDF

Open Access

TL;DR

This paper explores the use of combinatorial coverage to systematically select and evaluate training and testing data for machine learning models, demonstrating its effectiveness in stress testing, robustness, and domain adaptation using the MNIST dataset.

Contribution

It adapts combinatorial interaction testing for data selection in machine learning, focusing on input-output features rather than model internals, and addresses prior criticisms of coverage methods.

Findings

01

Combinatorial coverage can identify stress points in ML models.

02

Training sets selected via combinatorial coverage improve model robustness.

03

Coverage metrics are effective even without access to model internals.

Abstract

This paper demonstrates the systematic use of combinatorial coverage for selecting and characterizing test and training sets for machine learning models. The presented work adapts combinatorial interaction testing, which has been successfully leveraged in identifying faults in software testing, to characterize data used in machine learning. The MNIST hand-written digits data is used to demonstrate that combinatorial coverage can be used to select test sets that stress machine learning model performance, to select training sets that lead to robust model performance, and to select data for fine-tuning models to new domains. Thus, the results posit combinatorial coverage as a holistic approach to training and testing for machine learning. In contrast to prior work which has focused on the use of coverage in regard to the internal of neural networks, this paper considers coverage over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Machine Learning and Data Classification