LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Elias Stengel-Eskin, Peter Hase, Mohit Bansal

TL;DR
LACIE is a finetuning method for large language models that improves their confidence calibration by modeling listener acceptance, leading to more trustworthy answers and better human-aligned confidence signals.
Contribution
The paper introduces LACIE, a novel listener-aware finetuning approach that enhances confidence calibration in LLMs by considering listener acceptance, improving trustworthiness and truthfulness.
Findings
Models calibrated with LACIE accept fewer incorrect answers.
LACIE-trained models better signal confidence and hedge when unsure.
Human evaluation shows improved trustworthiness and calibration.
Abstract
When answering questions, LLMs can convey not only an answer, but a level of confidence about the answer being correct. This includes explicit confidence markers (e.g. giving a numeric score) as well as implicit markers, like an authoritative tone or elaborating with additional knowledge. For LLMs to be trustworthy knowledge sources, the confidence they convey should match their actual expertise; however, most current models tend towards overconfidence. To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We cast calibration as preference optimization, creating data via a two-agent game, where a speaker model's outputs are judged by a simulated listener. We then finetune three LLMs (Mistral-7B,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
