Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
Mat\'eo Mahaut, Laura Aina, Paula Czarnowska, Momchil Hardalov, Thomas, M\"uller, Llu\'is M\`arquez

TL;DR
This paper systematically compares various factual confidence estimators for LLMs, revealing that trained hidden-state probes are most reliable but less accessible, and highlights the instability of LLM confidence across semantically equivalent inputs.
Contribution
It provides a comprehensive empirical comparison of confidence estimators and introduces an experimental framework for fair evaluation of LLM factual confidence methods.
Findings
Hidden-state probes are most reliable confidence estimators.
LLM confidence is often unstable across semantically equivalent inputs.
The study offers a systematic comparison framework for confidence estimators.
Abstract
Large Language Models (LLMs) tend to be unreliable in the factuality of their answers. To address this problem, NLP researchers have proposed a range of techniques to estimate LLM's confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one another. To fill this gap, we present a survey and empirical comparison of estimators of factual confidence. We define an experimental framework allowing for fair comparison, covering both fact-verification and question answering. Our experiments across a series of LLMs indicate that trained hidden-state probes provide the most reliable confidence estimates, albeit at the expense of requiring access to weights and training data. We also conduct a deeper assessment of factual confidence by measuring the consistency of model behavior under meaning-preserving variations in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing · Law, AI, and Intellectual Property
