When to Trust LLMs: Aligning Confidence with Response Quality

Shuchang Tao; Liuyi Yao; Hanxing Ding; Yuexiang Xie; Qi Cao; Fei Sun,; Jinyang Gao; Huawei Shen; Bolin Ding

arXiv:2404.17287·cs.CL·October 1, 2024·1 cites

When to Trust LLMs: Aligning Confidence with Response Quality

Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun,, Jinyang Gao, Huawei Shen, Bolin Ding

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CONQORD, a reinforcement learning method that aligns LLM confidence with response quality, improving trustworthiness and guiding when to rely on LLM outputs in critical applications.

Contribution

It proposes a novel reinforcement learning approach with a dual reward function to align LLM confidence with response quality, enhancing reliability and decision-making.

Findings

01

CONQORD improves confidence-quality alignment significantly.

02

Aligned confidence helps determine when to trust LLM responses.

03

Method prevents over-cautious behavior in LLMs.

Abstract

Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taoshuchang/conqord
pytorchOfficial

Videos

When to Trust LLMs: Aligning Confidence with Response Quality· underline

Taxonomy

TopicsArtificial Intelligence in Law · Privacy-Preserving Technologies in Data · Law, AI, and Intellectual Property

MethodsALIGN