COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees
Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu

TL;DR
COIN introduces a statistically grounded framework for selective question answering in foundation models, ensuring controlled false discovery rates while maximizing answer retention and applicability across various tasks.
Contribution
The paper presents COIN, a novel uncertainty-guarding selection method that calibrates thresholds for FDR control, improving the reliability and efficiency of model-generated answers.
Findings
COIN effectively controls FDR with high probability.
It significantly increases sample retention compared to existing methods.
The approach is robust across different tasks and data limitations.
Abstract
Uncertainty quantification (UQ) for foundation models is essential to identify and mitigate potential hallucinations in automatically generated text. However, heuristic UQ approaches lack formal guarantees for key metrics such as the false discovery rate (FDR) in selective prediction. Previous work adopts the split conformal prediction (SCP) framework to ensure desired coverage of admissible answers by constructing prediction sets, but these sets often contain incorrect candidates, limiting their practical utility. To address this, we propose COIN, an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question under user-specified FDR constraints. COIN estimates the empirical error rate on a calibration set and applies confidence interval methods such as Clopper-Pearson to establish a high-probability upper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Bayesian Modeling and Causal Inference · Reservoir Engineering and Simulation Methods
MethodsSparse Evolutionary Training
