Defuse: Harnessing Unrestricted Adversarial Examples for Debugging   Models Beyond Test Accuracy

Dylan Slack; Nathalie Rauschmayr; Krishnaram Kenthapadi

arXiv:2102.06162·cs.LG·February 12, 2021·1 cites

Defuse: Harnessing Unrestricted Adversarial Examples for Debugging Models Beyond Test Accuracy

Dylan Slack, Nathalie Rauschmayr, Krishnaram Kenthapadi

PDF

Open Access

TL;DR

Defuse is a novel method that automatically discovers, categorizes, and corrects model errors beyond test data by generating misclassifications using generative models, thereby improving model robustness and trustworthiness.

Contribution

The paper introduces Defuse, a technique that leverages generative models to find and fix unseen model errors, enhancing debugging beyond traditional test accuracy measures.

Findings

01

Defuse identifies specific error regions in generative model latent space.

02

It categorizes errors into high-level bugs for targeted correction.

03

Applying Defuse improves classifier accuracy on discovered errors while preserving test performance.

Abstract

We typically compute aggregate statistics on held-out test data to assess the generalization of machine learning models. However, statistics on test data often overstate model generalization, and thus, the performance of deployed machine learning models can be variable and untrustworthy. Motivated by these concerns, we develop methods to automatically discover and correct model errors beyond those available in the data. We propose Defuse, a method that generates novel model misclassifications, categorizes these errors into high-level model bugs, and efficiently labels and fine-tunes on the errors to correct them. To generate misclassified data, we propose an algorithm inspired by adversarial machine learning techniques that uses a generative model to find naturally occurring instances misclassified by a model. Further, we observe that the generative models have regions in their latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification