What Does It Take to Build a Performant Selective Classifier?
Stephan Rabanser, Nicolas Papernot

TL;DR
This paper analyzes the limitations of selective classifiers, decomposes the sources of their performance gap from an ideal oracle, and provides practical insights for designing more effective models.
Contribution
It introduces a finite-sample decomposition of the selective-classification gap into five sources and offers actionable guidelines for improving selective classifier performance.
Findings
Bayes noise and model capacity significantly impact the gap.
Rich, feature-aware calibrators improve score ordering.
Data shift introduces a separate slack requiring robust training.
Abstract
Selective classifiers improve model reliability by abstaining on inputs the model deems uncertain. However, few practical approaches achieve the gold-standard performance of a perfect-ordering oracle that accepts examples exactly in order of correctness. Our work formalizes this shortfall as the selective-classification gap and present the first finite-sample decomposition of this gap to five distinct sources of looseness: Bayes noise, approximation error, ranking error, statistical noise, and implementation- or shift-induced slack. Crucially, our analysis reveals that monotone post-hoc calibration -- often believed to strengthen selective classifiers -- has limited impact on closing this gap, since it rarely alters the model's underlying score ranking. Bridging the gap therefore requires scoring mechanisms that can effectively reorder predictions rather than merely rescale them. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
