Data Synthesis for Testing Black-Box Machine Learning Models

Diptikalyan Saha; Aniya Aggarwal; Sandeep Hans

arXiv:2111.02161·cs.LG·November 4, 2021

Data Synthesis for Testing Black-Box Machine Learning Models

Diptikalyan Saha, Aniya Aggarwal, Sandeep Hans

PDF

Open Access

TL;DR

This paper introduces an automated framework for synthesizing realistic test data to evaluate black-box machine learning models, aiming to improve testing coverage and trustworthiness.

Contribution

It presents a novel, model-agnostic data synthesis method for testing ML models with user-controllable, realistic data to enhance testing effectiveness.

Findings

01

Effective in increasing test coverage

02

Generates realistic, user-controllable data

03

Demonstrated success across multiple models

Abstract

The increasing usage of machine learning models raises the question of the reliability of these models. The current practice of testing with limited data is often insufficient. In this paper, we provide a framework for automated test data synthesis to test black-box ML/DL models. We address an important challenge of generating realistic user-controllable data with model agnostic coverage criteria to test a varied set of properties, essentially to increase trust in machine learning models. We experimentally demonstrate the effectiveness of our technique.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Software Testing and Debugging Techniques