Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences
Vaishnavi Shrivastava, Ananya Kumar, Percy Liang

TL;DR
This paper introduces a relative confidence estimation method for language models that compares questions against each other to produce more reliable confidence scores, outperforming traditional absolute methods.
Contribution
It proposes a novel relative confidence estimation approach using rank aggregation techniques, improving confidence reliability in language models over existing absolute methods.
Findings
Relative confidence estimation outperforms absolute methods in reliability.
Achieves 3.5% higher AUC in selective classification.
Effective across multiple state-of-the-art language models and diverse tasks.
Abstract
Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary. Asking a language model to assess its confidence ("Score your confidence from 0-1.") is a natural way of evaluating its uncertainty. However, models struggle to provide absolute assessments of confidence (i.e. judging confidence in answering a question independent of other questions) and the coarse-grained scores they produce are not useful for evaluating the correctness of their answers. We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence ("Which question are you more confident in answering correctly?"). Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
