TL;DR
This paper introduces I-CALM, a prompt-based framework that incentivizes confidence-aware abstention in LLMs to reduce hallucinations without model retraining, by eliciting confidence, rewarding abstention, and applying normative principles.
Contribution
The paper presents a novel prompt-only approach that improves factual answer reliability by encouraging abstention based on confidence, without modifying the underlying model.
Findings
Confidence-eliciting prompts reduce false-answer rate.
Normative principles further improve abstention accuracy.
Framework trades coverage for increased answer reliability.
Abstract
Large language models (LLMs) frequently produce confident but incorrect answers, partly because common binary scoring conventions reward answering over honestly expressing uncertainty. We study whether prompt-only interventions -- explicitly announcing reward schemes for answer-versus-abstain decisions plus humility-oriented normative principles -- can reduce hallucination risk without modifying the model. Our focus is epistemic abstention on factual questions with a verifiable answer, where current LLMs often fail to abstain despite being uncertain about their answers. We first assess self-reported verbal confidence as a usable uncertainty signal, showing stability under prompt paraphrasing and reasonable calibration against a token-probability baseline. We then study I-CALM, a prompt-based framework that (i) elicits verbal confidence, (ii) partially rewards abstention through explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
