Reasoning Models Don't Just Think Longer, They Move Differently
Anders Gj{\o}lbye, Lars Kai Hansen, Sanmi Koyejo

TL;DR
This paper investigates whether reasoning-trained language models follow different internal trajectories when solving harder problems, revealing domain-dependent differences in their reasoning processes after adjusting for generation length.
Contribution
It introduces a length correction method for trajectory analysis and demonstrates that reasoning training influences internal model dynamics in a domain-specific manner.
Findings
Corrected trajectory geometry correlates with problem difficulty across domains.
Reasoning-trained models show more direct and less heterogeneous trajectories in code tasks.
Length correction is essential for meaningful trajectory analysis.
Abstract
Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
