Probing the Trajectories of Reasoning Traces in Large Language Models
Marthe Ballon, Brecht Verbeken, Vincent Ginis, Andres Algaba

TL;DR
This paper introduces a systematic protocol to analyze how large language models develop reasoning traces, revealing that accuracy improves with more reasoning tokens and that models can backtrack from errors, informing safer deployment strategies.
Contribution
The study presents a novel trajectory probing protocol to evaluate reasoning trace evolution in LLMs, providing insights into accuracy, decision-making, and reliability during reasoning processes.
Findings
Accuracy increases with more reasoning tokens.
Models can backtrack from incorrect partial traces.
Content relevance, not length, drives decision improvements.
Abstract
Large language models (LLMs) increasingly solve difficult problems by producing "reasoning traces" before emitting a final response. However, it remains unclear how accuracy and decision commitment evolve along a reasoning trajectory, and whether intermediate trace segments provide answer-relevant information beyond generic length or stylistic effects. Here, we propose a protocol to systematically probe the trajectories of reasoning traces in LLMs by 1) generating a model's reasoning trace, 2) truncating it at fixed token-percentiles, and 3) injecting each partial trace back into the model (or a different model) to measure the induced distribution over answer choices via next-token probabilities. We apply this protocol to the open-source Qwen3-4B/-8B/-14B and gpt-oss-20b/-120b models across the multiple-choice GPQA Diamond and MMLU-Pro benchmarks. We find that accuracy and decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
