What classifiers know what they don't?
Mohamed Ishmael Belghazi, David Lopez-Paz

TL;DR
This paper introduces UIMNET, a large-scale benchmark for evaluating uncertainty estimates in deep image classifiers, addressing the gap in realistic, high-scale testing for out-of-distribution detection.
Contribution
The paper presents UIMNET, a comprehensive, open-source benchmark with implementations of multiple algorithms and metrics for realistic evaluation of uncertainty in image classification.
Findings
Ensembles of ERM classifiers perform best for uncertainty estimation.
Single MIMO classifiers are competitive alternatives.
UIMNET facilitates reproducible and extensible research in uncertainty estimation.
Abstract
Being uncertain when facing the unknown is key to intelligent decision making. However, machine learning algorithms lack reliable estimates about their predictive uncertainty. This leads to wrong and overly-confident decisions when encountering classes unseen during training. Despite the importance of equipping classifiers with uncertainty estimates ready for the real world, prior work has focused on small datasets and little or no class discrepancy between training and testing data. To close this gap, we introduce UIMNET: a realistic, ImageNet-scale test-bed to evaluate predictive uncertainty estimates for deep image classifiers. Our benchmark provides implementations of eight state-of-the-art algorithms, six uncertainty measures, four in-domain metrics, three out-domain metrics, and a fully automated pipeline to train, calibrate, ensemble, select, and evaluate models. Our test-bed is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
