Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
Alvin Heng, Harold Soh

TL;DR
This paper introduces a likelihood ratio-based approach for optimal selective classification, especially effective under covariate shift, improving reliability by abstaining from uncertain predictions across vision and language tasks.
Contribution
It applies the Neyman--Pearson lemma to develop new selection functions for selective classification, unifying existing methods and enhancing performance under covariate shift.
Findings
Likelihood ratio-based methods outperform baselines
Effective under covariate shift in vision and language tasks
Provides a unified framework for selective classification
Abstract
Selective classification enhances the reliability of predictive models by allowing them to abstain from making uncertain predictions. In this work, we revisit the design of optimal selection functions through the lens of the Neyman--Pearson lemma, a classical result in statistics that characterizes the optimal rejection rule as a likelihood ratio test. We show that this perspective not only unifies the behavior of several post-hoc selection baselines, but also motivates new approaches to selective classification which we propose here. A central focus of our work is the setting of covariate shift, where the input distribution at test time differs from that at training. This realistic and challenging scenario remains relatively underexplored in the context of selective classification. We evaluate our proposed methods across a range of vision and language tasks, including both supervised…
Peer Reviews
Decision·ICLR 2026 Poster
- The authors using NP lemma to combine several existing baseline methods is simple and intuitive. - The authors proposed method - linear combination of distance based and logic based methods is simple and interesting.
- Theorem 2 relies on strong assumptions that the covariance distribution conditioned on the prediction is a gaussian. Theorem 3 relies on k tending to infinity which is not practical. - The authors do not provide intuitive understanding of in which cases, their proposed method should perform well compared to the baseline.
This paper provides a unified framework based on the Neyman-Pearson lemma that captures existing methods (which are often treated as ad-hoc). The paper is fairly well-written and uses proper mathematical notation. The empirical results are strong.
I think the optimality of Neyman-Pearson is a bit overstated, since optimality depends crucially on the distributional assumptions being valid.
1. The problem of abstaining rather than making incorrect predictions is an important practical problem 2. The authors offer a framework to unify previous and newly proposed confidence scoring functions. Relevance to the NP lemma is an insightful observation 3. The paper provides formal arguments (i.e., proofs) on optimality of different scores 4. Evaluation on different datasets shows usefulness of the proposed scores 5. The paper is clearly presented. There are minor issues, but overall the pa
1. On several occasions, justification of assumptions and theoretical constructs is not clear. First, it is not clear why p(y) should remain unchanged. It changes if relative frequencies of classes change. Also, it is not clear why exactly this assumption is required. Second, the practical implications of Lemma 2 are not clear. Third, Theorem 1 uses symbol "<<", which informally means "much smaller", but does not have any formal meaning 2. The newly introduced scores are not fundamentally new, s
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater resources management and optimization
MethodsFocus
