Retrieval-Augmented Linguistic Calibration
Yi-Fan Yeh, Linwei Tao, Minjing Dong, Tao Huang, Jialin Yu, Philip Torr, Chang Xu

TL;DR
This paper introduces a distributional framework for linguistic confidence calibration, proposes a new metric called Faithfulness Divergence, and presents RALC, a retrieval-augmented rewriting pipeline that improves faithfulness and calibration in language models.
Contribution
It develops a novel distributional approach to linguistic confidence, introduces FD as a new evaluation metric, and presents RALC, a post-hoc calibration method that enhances model faithfulness and calibration.
Findings
RALC improves faithfulness by up to 66%
RALC enhances calibration accuracy by up to 58%
The distributional framework captures interpretation variability effectively
Abstract
Linguistic cues such as "I believe" and "probably" offer an intuitive interface for communicating confidence, yet a generalisable, principled calibration framework for linguistic confidence expressions remains underexplored. In particular, co-occurring linguistic cues, contextual variation, and subjective audience interpretation pose unique challenges. We therefore model linguistic confidence as a distribution over plausible perceived probability values that a statement is correct, capturing interpretation variability that scalar representations discard. Within this distributional framework, we introduce faithfulness as a complementary evaluation dimension and present Faithfulness Divergence (FD), an information-theoretic metric quantifying the surprise induced in audience beliefs upon truth revelation. Building on these foundations, we present Retrieval-Augmented Linguistic Calibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
