Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov

TL;DR
This paper critically examines in-domain uncertainty estimation in deep learning image classification, highlighting pitfalls in current metrics and demonstrating that many advanced ensembling methods are effectively equivalent to simple ensembles of few networks.
Contribution
It introduces the deep ensemble equivalent score (DEE) to better evaluate ensembling techniques and reveals that many sophisticated methods are similar to small ensembles in performance.
Findings
Existing metrics for in-domain uncertainty have significant pitfalls.
Many advanced ensembling techniques are equivalent to small ensembles in test performance.
The DEE score provides new insights into ensembling effectiveness.
Abstract
Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point out pitfalls of existing metrics. Avoiding these pitfalls, we perform a broad study of different ensembling techniques. To provide more insight in this study, we introduce the deep ensemble equivalent score (DEE) and show that many sophisticated ensembling techniques are equivalent to an ensemble of only few independently trained networks in terms of test performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI
MethodsTest
