Calibrated Large Language Models for Binary Question Answering

Patrizio Giovannotti; Alexander Gammerman

arXiv:2407.01122·cs.CL·July 2, 2024

Calibrated Large Language Models for Binary Question Answering

Patrizio Giovannotti, Alexander Gammerman

PDF

Open Access

TL;DR

This paper introduces a new calibration method using the inductive Venn–Abers predictor for large language models in binary question answering, improving probability accuracy and trustworthiness.

Contribution

It presents a novel calibration approach with IVAP for LLMs, outperforming temperature scaling in binary classification tasks.

Findings

01

IVAP achieves better calibration than temperature scaling.

02

The method maintains high predictive quality.

03

Results are demonstrated on the BoolQ dataset with Llama 2.

Abstract

Quantifying the uncertainty of predictions made by large language models (LLMs) in binary text classification tasks remains a challenge. Calibration, in the context of LLMs, refers to the alignment between the model's predicted probabilities and the actual correctness of its predictions. A well-calibrated model should produce probabilities that accurately reflect the likelihood of its predictions being correct. We propose a novel approach that utilizes the inductive Venn--Abers predictor (IVAP) to calibrate the probabilities associated with the output tokens corresponding to the binary labels. Our experiments on the BoolQ dataset using the Llama 2 model demonstrate that IVAP consistently outperforms the commonly used temperature scaling method for various label token choices, achieving well-calibrated probabilities while maintaining high predictive quality. Our findings contribute to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Speech and dialogue systems

MethodsLLaMA