A Study on the Calibration of In-context Learning

Hanlin Zhang; Yi-Fan Zhang; Yaodong Yu; Dhruv Madeka; Dean Foster,; Eric Xing; Himabindu Lakkaraju; Sham Kakade

arXiv:2312.04021·cs.CL·March 29, 2024·1 cites

A Study on the Calibration of In-context Learning

Hanlin Zhang, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster,, Eric Xing, Himabindu Lakkaraju, Sham Kakade

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how in-context learning affects model calibration in language models, revealing that calibration varies with the number of examples and prompting methods, and proposing recalibration techniques to improve reliability.

Contribution

It provides a comprehensive analysis of calibration behavior in in-context learning and introduces a scaling-binning method for better calibration of language models.

Findings

01

Calibration initially worsens with more in-context examples

02

Fine-tuning and chain-of-thought prompting can cause miscalibration

03

Scaling-binning calibrator reduces calibration errors effectively

Abstract

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs). We study in-context learning (ICL), a prevalent method for adapting static LMs through tailored prompts, and examine the balance between performance and calibration across a broad spectrum of natural language understanding and reasoning tasks. Through comprehensive experiments, we observe that, with an increasing number of ICL examples, models initially exhibit increased miscalibration before achieving better calibration and miscalibration tends to arise in low-shot settings. Moreover, we find that methods aimed at improving usability, such as fine-tuning and chain-of-thought (CoT) prompting, can lead to miscalibration and unreliable natural language explanations. Furthermore, we explore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hlzhang109/icl-calibration
pytorchOfficial

Videos

A Study on the Calibration of In-context Learning· underline

Taxonomy

TopicsEducation and Learning Interventions