Understanding Classifier Mistakes with Generative Models
La\"etitia Shao, Yang Song, Stefano Ermon

TL;DR
This paper uses generative models of classifier features to identify and analyze instances where deep neural networks are likely to make mistakes, including adversarial and out-of-distribution samples.
Contribution
It introduces a novel generative approach to detect potential classifier failures without relying on class labels, applicable to semi-supervised models.
Findings
Errors occur when features have low probability under the generative model.
The detection method effectively identifies test set mistakes, adversarial, and OOD samples.
Approach is label-agnostic and versatile across different failure types.
Abstract
Although deep neural networks are effective on supervised learning tasks, they have been shown to be brittle. They are prone to overfitting on their training distribution and are easily fooled by small adversarial perturbations. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. We propose a generative model of the features extracted by a classifier, and show using rigorous hypothesis testing that errors tend to occur when features are assigned low-probability by our model. From this observation, we develop a detection criteria for samples on which a classifier is likely to fail at test time. In particular, we test against three different sources of classification failures: mistakes made on the test set due to poor model generalization, adversarial samples and out-of-distribution samples. Our approach is agnostic to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications
