Defuse: Harnessing Unrestricted Adversarial Examples for Debugging Models Beyond Test Accuracy
Dylan Slack, Nathalie Rauschmayr, Krishnaram Kenthapadi

TL;DR
Defuse is a novel method that automatically discovers, categorizes, and corrects model errors beyond test data by generating misclassifications using generative models, thereby improving model robustness and trustworthiness.
Contribution
The paper introduces Defuse, a technique that leverages generative models to find and fix unseen model errors, enhancing debugging beyond traditional test accuracy measures.
Findings
Defuse identifies specific error regions in generative model latent space.
It categorizes errors into high-level bugs for targeted correction.
Applying Defuse improves classifier accuracy on discovered errors while preserving test performance.
Abstract
We typically compute aggregate statistics on held-out test data to assess the generalization of machine learning models. However, statistics on test data often overstate model generalization, and thus, the performance of deployed machine learning models can be variable and untrustworthy. Motivated by these concerns, we develop methods to automatically discover and correct model errors beyond those available in the data. We propose Defuse, a method that generates novel model misclassifications, categorizes these errors into high-level model bugs, and efficiently labels and fine-tunes on the errors to correct them. To generate misclassified data, we propose an algorithm inspired by adversarial machine learning techniques that uses a generative model to find naturally occurring instances misclassified by a model. Further, we observe that the generative models have regions in their latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
