Confidence Estimation for LLMs in Multi-turn Interactions

Caiqi Zhang; Ruihan Yang; Xiaochen Zhu; Chengzu Li; Tiancheng Hu; Yijiang River Dong; Deqing Yang; Nigel Collier

arXiv:2601.02179·cs.CL·May 15, 2026

Confidence Estimation for LLMs in Multi-turn Interactions

Caiqi Zhang, Ruihan Yang, Xiaochen Zhu, Chengzu Li, Tiancheng Hu, Yijiang River Dong, Deqing Yang, Nigel Collier

PDF

1 Repo

TL;DR

This paper systematically studies confidence estimation in multi-turn LLM interactions, introducing new metrics and a paradigm to evaluate calibration and monotonicity, revealing challenges and proposing a promising logit-based probe.

Contribution

It is the first to analyze confidence estimation in multi-turn conversations, proposing a formal framework, novel metrics, and a new probe method for better calibration and evidence tracking.

Findings

01

Widely-used confidence techniques struggle with calibration in multi-turn dialogues.

02

The proposed P(Sufficient) probe effectively tracks evidence accumulation.

03

New metrics like InfoECE provide better evaluation of confidence calibration.

Abstract

While confidence estimation is a promising direction for mitigating hallucinations in Large Language Models (LLMs), current research overwhelmingly focuses on single-turn settings. The dynamics of model confidence in multi-turn conversations, where context accumulates and ambiguity is progressively resolved, remain largely unexplored. This work presents the first systematic study of confidence estimation in multi-turn interactions, establishing a formal evaluation framework grounded in two key desiderata: per-turn calibration and monotonicity of confidence as more information becomes available. To facilitate this, we introduce novel metrics, including a length-normalized Expected Calibration Error (InfoECE), and a new "Hinter-Guesser" paradigm for generating controlled evaluation datasets. Our experiments reveal that widely-used confidence techniques struggle with calibration and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caiqizh/multi-turn-conf
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.