Credence Calibration Game? Calibrating Large Language Models through Structured Play

Ke Fang; Tianyi Zhao; Lu Cheng

arXiv:2508.14390·cs.CL·August 21, 2025

Credence Calibration Game? Calibrating Large Language Models through Structured Play

Ke Fang, Tianyi Zhao, Lu Cheng

PDF

Open Access

TL;DR

This paper introduces a novel prompt-based calibration framework for large language models that uses structured interactions and feedback to improve the alignment of confidence estimates with actual correctness, without requiring additional supervision.

Contribution

It proposes a game-inspired, prompt-based calibration method that dynamically enhances LLM confidence calibration through feedback and natural language summaries.

Findings

01

Consistent improvement in calibration metrics across models.

02

Effective game-based prompting strategy for LLM calibration.

03

No need for extra supervision or parameter updates.

Abstract

As Large Language Models (LLMs) are increasingly deployed in decision-critical domains, it becomes essential to ensure that their confidence estimates faithfully correspond to their actual correctness. Existing calibration methods have primarily focused on post-hoc adjustments or auxiliary model training; however, many of these approaches necessitate additional supervision or parameter updates. In this work, we propose a novel prompt-based calibration framework inspired by the Credence Calibration Game. Our method establishes a structured interaction loop wherein LLMs receive feedback based on the alignment of their predicted confidence with correctness. Through feedback-driven prompting and natural language summaries of prior performance, our framework dynamically improves model calibration. Extensive experiments across models and game configurations demonstrate consistent improvements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling