Factual Confidence of LLMs: on Reliability and Robustness of Current   Estimators

Mat\'eo Mahaut; Laura Aina; Paula Czarnowska; Momchil Hardalov; Thomas; M\"uller; Llu\'is M\`arquez

arXiv:2406.13415·cs.CL·November 28, 2024

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators

Mat\'eo Mahaut, Laura Aina, Paula Czarnowska, Momchil Hardalov, Thomas, M\"uller, Llu\'is M\`arquez

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper systematically compares various factual confidence estimators for LLMs, revealing that trained hidden-state probes are most reliable but less accessible, and highlights the instability of LLM confidence across semantically equivalent inputs.

Contribution

It provides a comprehensive empirical comparison of confidence estimators and introduces an experimental framework for fair evaluation of LLM factual confidence methods.

Findings

01

Hidden-state probes are most reliable confidence estimators.

02

LLM confidence is often unstable across semantically equivalent inputs.

03

The study offers a systematic comparison framework for confidence estimators.

Abstract

Large Language Models (LLMs) tend to be unreliable in the factuality of their answers. To address this problem, NLP researchers have proposed a range of techniques to estimate LLM's confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one another. To fill this gap, we present a survey and empirical comparison of estimators of factual confidence. We define an experimental framework allowing for fair comparison, covering both fact-verification and question answering. Our experiments across a series of LLMs indicate that trained hidden-state probes provide the most reliable confidence estimates, albeit at the expense of requiring access to weights and training data. We also conduct a deeper assessment of factual confidence by measuring the consistency of model behavior under meaning-preserving variations in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/factual-confidence-of-llms
pytorchOfficial

Videos

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators· underline

Taxonomy

TopicsAdvanced Research in Systems and Signal Processing · Law, AI, and Intellectual Property