Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu

TL;DR
TRACED introduces a geometric framework to evaluate LLM reasoning by analyzing reasoning traces through progress and stability, revealing structural insights beyond scalar probabilities.
Contribution
It presents a novel, theoretically grounded method that decomposes reasoning into geometric components, improving robustness and interpretability of LLM evaluation.
Findings
High progress and stability correlate with correct reasoning.
Hallucinations show low progress and high curvature fluctuations.
TRACED outperforms existing benchmarks in robustness.
Abstract
Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reasoning traces into Progress (displacement) and Stability (curvature), we reveal a distinct topological divergence: correct reasoning manifests as high-progress, stable trajectories, whereas hallucinations are characterized by low-progress, unstable patterns (stalled displacement with high curvature fluctuations). Leveraging these signatures, our probabilistic framework achieves competitive performance and superior robustness across diverse benchmarks. Crucially, TRACED bridges geometry and cognition by mapping high curvature to ''Hesitation Loops'' and displacement to ''Certainty Accumulation'', offering a physical lens to decode the internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
