TL;DR
This paper presents a novel black-box search method using adversarial perturbations to efficiently identify high-confidence errors in neural network classifiers, revealing more mistakes than expected based on confidence levels.
Contribution
It introduces a new adversarial distance-based search technique for discovering high-confidence errors in black-box models, improving error detection efficiency.
Findings
The method finds errors at rates higher than expected given confidence.
It is query-efficient and effective in black-box settings.
Empirical results demonstrate improved error discovery over baseline methods.
Abstract
Given a deep neural network image classification model that we treat as a black box, and an unlabeled evaluation dataset, we develop an efficient strategy by which the classifier can be evaluated. Randomly sampling and labeling instances from an unlabeled evaluation dataset allows traditional performance measures like accuracy, precision, and recall to be estimated. However, random sampling may miss rare errors for which the model is highly confident in its prediction, but wrong. These high-confidence errors can represent costly mistakes, and therefore should be explicitly searched for. Past works have developed search techniques to find classification errors above a specified confidence threshold, but ignore the fact that errors should be expected at confidence levels anywhere below 100\%. In this work, we investigate the problem of finding errors at rates greater than expected given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
