When to Trust LLMs: Aligning Confidence with Response Quality
Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun,, Jinyang Gao, Huawei Shen, Bolin Ding

TL;DR
This paper introduces CONQORD, a reinforcement learning method that aligns LLM confidence with response quality, improving trustworthiness and guiding when to rely on LLM outputs in critical applications.
Contribution
It proposes a novel reinforcement learning approach with a dual reward function to align LLM confidence with response quality, enhancing reliability and decision-making.
Findings
CONQORD improves confidence-quality alignment significantly.
Aligned confidence helps determine when to trust LLM responses.
Method prevents over-cautious behavior in LLMs.
Abstract
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Privacy-Preserving Technologies in Data · Law, AI, and Intellectual Property
MethodsALIGN
