Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence   Scores from Language Models Fine-Tuned with Human Feedback

Katherine Tian; Eric Mitchell; Allan Zhou; Archit Sharma; Rafael; Rafailov; Huaxiu Yao; Chelsea Finn; Christopher D. Manning

arXiv:2305.14975·cs.CL·October 25, 2023·5 cites

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael, Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

PDF

Open Access 1 Repo

TL;DR

This paper evaluates methods for obtaining well-calibrated confidence scores from RLHF-finetuned language models, finding that verbalized confidences are often more accurate than raw probabilities across several benchmarks.

Contribution

It provides a comprehensive assessment of confidence elicitation strategies for RLHF-LMs, highlighting the effectiveness of verbalized confidences over raw probabilities.

Findings

01

Verbalized confidences are better calibrated than conditional probabilities.

02

Using verbalized confidences reduces expected calibration error by about 50%.

03

Results are demonstrated on TriviaQA, SciQ, and TruthfulQA benchmarks.

Abstract

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are remarkably well-calibrated. However, the most widely-used LMs are fine-tuned with reinforcement learning from human feedback (RLHF-LMs), and some studies have suggested that RLHF-LMs produce conditional probabilities that are very poorly calibrated. In light of this perceived weakness, we conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs. For RLHF-LMs such as ChatGPT, GPT-4, and Claude, we find that verbalized confidences emitted as output tokens are typically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iinemo/lm-polygraph
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Adam · Label Smoothing · Position-Wise Feed-Forward Layer · Residual Connection