Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning
Hamed Damirchi, Ignacio Meza De la Jara, Ehsan Abbasnejad, Afshar Shamsi, Zhen Zhang, Javen Shi

TL;DR
This paper introduces Truth as a Trajectory (TaT), a novel method analyzing layer-wise geometric displacement in transformer models to better understand and distinguish valid reasoning from spurious behavior in large language models.
Contribution
TaT shifts the focus from static activations to dynamic layer-wise trajectories, revealing geometric invariants that improve explainability of LLM reasoning processes.
Findings
TaT outperforms traditional probing methods in identifying valid reasoning.
Trajectory analysis reduces reliance on surface-level lexical cues.
Effective across various architectures and benchmarks.
Abstract
Existing explainability methods for Large Language Models (LLMs) typically treat hidden states as static points in activation space, assuming that correct and incorrect inferences can be separated using representations from an individual layer. However, these activations are saturated with polysemantic features, leading to linear probes learning surface-level lexical patterns rather than underlying reasoning structures. We introduce Truth as a Trajectory (TaT), which models the transformer inference as an unfolded trajectory of iterative refinements, shifting analysis from static activations to layer-wise geometric displacement. By analyzing displacement of representations across layers, TaT uncovers geometric invariants that distinguish valid reasoning from spurious behavior. We evaluate TaT across dense and Mixture-of-Experts (MoE) architectures on benchmarks spanning commonsense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Computational and Text Analysis Methods
