Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

Xinyan Jiang; Ninghao Liu; Di Wang; Lijie Hu

arXiv:2603.10384·cs.AI·May 5, 2026

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu

PDF

TL;DR

TRACED introduces a geometric framework to evaluate LLM reasoning by analyzing reasoning traces through progress and stability, revealing structural insights beyond scalar probabilities.

Contribution

It presents a novel, theoretically grounded method that decomposes reasoning into geometric components, improving robustness and interpretability of LLM evaluation.

Findings

01

High progress and stability correlate with correct reasoning.

02

Hallucinations show low progress and high curvature fluctuations.

03

TRACED outperforms existing benchmarks in robustness.

Abstract

Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reasoning traces into Progress (displacement) and Stability (curvature), we reveal a distinct topological divergence: correct reasoning manifests as high-progress, stable trajectories, whereas hallucinations are characterized by low-progress, unstable patterns (stalled displacement with high curvature fluctuations). Leveraging these signatures, our probabilistic framework achieves competitive performance and superior robustness across diverse benchmarks. Crucially, TRACED bridges geometry and cognition by mapping high curvature to ''Hesitation Loops'' and displacement to ''Certainty Accumulation'', offering a physical lens to decode the internal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.