Unveiling the Reasoning Process of Large Language Models
Junjie Zhang, Zhen Shen, Xisong Dong, Gang Xiong

TL;DR
This paper investigates how large language models transform token-level information into abstract relational structures during reasoning, revealing a middle-layer stage crucial for this process.
Contribution
It provides a layer-wise analysis of reasoning in language models, identifying a middle-layer stage responsible for converting token information into relational representations.
Findings
Middle layers reorganize features into rule-level representations.
Middle-layer states occupy lower-dimensional manifolds.
Removing middle-layer components significantly impacts accuracy.
Abstract
Large language models often reason beyond surface tokens, but the internal stage at which token-level information becomes abstract relational structure remains unclear. We investigate this question by analyzing how attention heads and layers transform information during autoregressive reasoning. Across mathematical and symbolic reasoning tasks, we observe a consistent layer-wise division of labor: outer layers mainly preserve and route input-related features, whereas middle layers reorganize them into more transferable rule-level representations. This interpretation is supported by representation geometry: middle-layer states occupy lower-dimensional manifolds and show stronger alignment across disjoint vocabularies that instantiate the same symbolic rules. It is further supported by causal interventions: removing middle-layer components identified by our interaction-based criterion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
