COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees

Zhiyuan Wang; Jinhao Duan; Qingni Wang; Xiaofeng Zhu; Tianlong Chen; Xiaoshuang Shi; Kaidi Xu

arXiv:2506.20178·cs.CL·June 26, 2025

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees

Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu

PDF

Open Access 1 Video

TL;DR

COIN introduces a statistically grounded framework for selective question answering in foundation models, ensuring controlled false discovery rates while maximizing answer retention and applicability across various tasks.

Contribution

The paper presents COIN, a novel uncertainty-guarding selection method that calibrates thresholds for FDR control, improving the reliability and efficiency of model-generated answers.

Findings

01

COIN effectively controls FDR with high probability.

02

It significantly increases sample retention compared to existing methods.

03

The approach is robust across different tasks and data limitations.

Abstract

Uncertainty quantification (UQ) for foundation models is essential to identify and mitigate potential hallucinations in automatically generated text. However, heuristic UQ approaches lack formal guarantees for key metrics such as the false discovery rate (FDR) in selective prediction. Previous work adopts the split conformal prediction (SCP) framework to ensure desired coverage of admissible answers by constructing prediction sets, but these sets often contain incorrect candidates, limiting their practical utility. To address this, we propose COIN, an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question under user-specified FDR constraints. COIN estimates the empirical error rate on a calibration set and applies confidence interval methods such as Clopper-Pearson to establish a high-probability upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees· underline

Taxonomy

TopicsTopic Modeling · Bayesian Modeling and Causal Inference · Reservoir Engineering and Simulation Methods

MethodsSparse Evolutionary Training