Tracing Uncertainty in Language Model "Reasoning"

Nils Gr\"unefeld; Bertram H{\o}jer; Philipp Mondorf; Barbara Plank; Anna Rogers; Christian Hardmeier; Stefan Heinrich; Jes Frellsen

arXiv:2605.07776·cs.LG·May 11, 2026

Tracing Uncertainty in Language Model "Reasoning"

Nils Gr\"unefeld, Bertram H{\o}jer, Philipp Mondorf, Barbara Plank, Anna Rogers, Christian Hardmeier, Stefan Heinrich, Jes Frellsen

PDF

TL;DR

This paper investigates how uncertainty quantification can reveal the dynamics of language model reasoning, enabling early detection of correct or incorrect outputs through uncertainty profile analysis.

Contribution

It introduces a novel uncertainty trace profile method that predicts answer correctness with high accuracy across multiple language models.

Findings

01

Uncertainty profiles predict correctness with AUROC up to 0.807.

02

Early tokens contain enough information to predict correctness with AUROC 0.801.

03

Correct traces show a steeper, less linear decline in uncertainty.

Abstract

Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens of uncertainty quantification by treating the "reasoning" traces, the intermediate token sequences generated by LMs, as evolving model states. We summarize each trace by an uncertainty trace profile: a small set of features describing the shape of the uncertainty signal over its trace, such as its slope and linearity. We find that across five LMs evaluated on GSM8K and ProntoQA, these profiles predict whether a trace yields a correct final answer with AUROC up to 0.807, improving markedly on recent related work. We reach AUROC 0.801 using only the first few hundred tokens of full traces, suggesting that errors can be detected early in the generation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.