Can LLMs Express Their Uncertainty? An Empirical Evaluation of   Confidence Elicitation in LLMs

Miao Xiong; Zhiyuan Hu; Xinyang Lu; Yifei Li; Jie Fu; Junxian He,; Bryan Hooi

arXiv:2306.13063·cs.CL·March 19, 2024·51 cites

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He,, Bryan Hooi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper systematically evaluates black-box confidence elicitation methods for large language models, revealing their tendencies, improvements with scale, and potential strategies to enhance uncertainty estimation without internal model access.

Contribution

It introduces a comprehensive framework for black-box confidence elicitation in LLMs and benchmarks various prompting, sampling, and aggregation strategies across multiple models and tasks.

Findings

01

LLMs tend to be overconfident when verbalizing confidence.

02

Scaling up models improves calibration and failure prediction.

03

Proposed strategies can mitigate overconfidence and narrow the gap with white-box methods.

Abstract

Empowering large language models to accurately express confidence in their answers is essential for trustworthy decision-making. Previous confidence elicitation methods, which primarily rely on white-box access to internal model information or model fine-tuning, have become less suitable for LLMs, especially closed-source commercial APIs. This leads to a growing need to explore the untapped area of black-box approaches for LLM uncertainty estimation. To better break down the problem, we define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency. We then benchmark these methods on two key tasks-confidence calibration and failure prediction-across five types of datasets (e.g., commonsense and arithmetic reasoning) and five widely-used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miaoxiong2320/llm-uncertainty
noneOfficial

Videos

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs· slideslive

Taxonomy

TopicsArtificial Intelligence in Law · Financial Distress and Bankruptcy Prediction

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Softmax · Dropout · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer