Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

Shuo Liu; Ding Liu; Shi-Ju Ran

arXiv:2605.16824·cs.LG·May 19, 2026

Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

Shuo Liu, Ding Liu, Shi-Ju Ran

PDF

TL;DR

This paper demonstrates that token-level confidence trajectories in large language models encode meaningful signals about reasoning correctness, enabling improved evaluation and aggregation without external tools.

Contribution

It introduces a geometric analysis of confidence trajectories and NeuralConf, a new method for correctness estimation based solely on confidence data.

Findings

01

Confidence trajectories separate correct from incorrect reasoning traces.

02

Stronger clustering of correct and incorrect traces correlates with higher correctness.

03

Tail confidence signals carry key information for correctness evaluation.

Abstract

Large language models (LLMs) generate not only reasoning text, but also token-level confidence trajectories that record how uncertainty evolves during inference. Whether these trajectories are relevant to reasoning correctness remains unclear. Here we show that confidence trajectories encode a content-agnostic confidence geometry associated with trace-level final-answer correctness. Using only token-level confidence values, without access to the input question, reasoning text, hidden states, or external verifiers, we find that low-dimensional representations of confidence trajectories separate correct from incorrect reasoning traces. Across GSM8K, MATH, and MMLU, this geometric separation is quantitatively linked to downstream predictability: stronger clustering of correct and incorrect traces, measured by the Davies--Bouldin index, consistently corresponds to higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.