Calibrating Verbalized Probabilities for Large Language Models

Cheng Wang; Gyuri Szarvas; Georges Balazs; Pavel Danchenko; Patrick; Ernst

arXiv:2410.06707·cs.CL·October 10, 2024

Calibrating Verbalized Probabilities for Large Language Models

Cheng Wang, Gyuri Szarvas, Georges Balazs, Pavel Danchenko, Patrick, Ernst

PDF

Open Access

TL;DR

This paper introduces a method for calibrating verbalized probabilities from large language models, improving their reliability in discriminative tasks by using an invert softmax trick to estimate logits for better calibration.

Contribution

It presents the invert softmax trick to approximate logits from verbalized probabilities, enabling effective calibration of LLM outputs for classification tasks.

Findings

01

LLMs can generate class probability distributions reliably.

02

The invert softmax trick improves calibration accuracy.

03

Calibration enhances LLMs' performance in discriminative tasks.

Abstract

Calibrating verbalized probabilities presents a novel approach for reliably assessing and leveraging outputs from black-box Large Language Models (LLMs). Recent methods have demonstrated improved calibration by applying techniques like Platt scaling or temperature scaling to the confidence scores generated by LLMs. In this paper, we explore the calibration of verbalized probability distributions for discriminative tasks. First, we investigate the capability of LLMs to generate probability distributions over categorical labels. We theoretically and empirically identify the issue of re-softmax arising from the scaling of verbalized probabilities, and propose using the invert softmax trick to approximate the "logit" by inverting verbalized probabilities. Through extensive evaluation on three public datasets, we demonstrate: (1) the robust capability of LLMs in generating class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsWhy is Venmo saying something went wrong? — Identify the Issue! · Softmax