Understanding out-of-distribution accuracies through quantifying difficulty of test samples
Berfin Simsek, Melissa Hall, Levent Sagun

TL;DR
This paper introduces a confusion score metric to quantify image difficulty, helping to understand and predict the accuracy drop of neural networks on out-of-distribution datasets by analyzing test image characteristics.
Contribution
The paper proposes a label-free confusion score to measure image difficulty and links it to OOD accuracy drops, providing a new way to analyze model robustness.
Findings
High confusion scores correlate with accuracy drops on OOD data.
Images with high confusion scores often lack clear class-specific features.
The confusion score enables predicting OOD accuracy using only ID test labels.
Abstract
Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imagenet}. To understand why a variety of models consistently make more mistakes in the OOD datasets, we propose a new metric to quantify the difficulty of the test images (either ID or OOD) that depends on the interaction of the training dataset and the model. In particular, we introduce \textit{confusion score} as a label-free measure of image difficulty which quantifies the amount of disagreement on a given test image based on the class conditional probabilities estimated by an ensemble of trained models. Using the confusion score, we investigate CIFAR-10 and its OOD derivatives. Next, by partitioning test and OOD datasets via their confusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
