Layer Specialization Underlying Compositional Reasoning in Transformers

Jing Liu

arXiv:2510.17469·cs.LG·October 21, 2025

Layer Specialization Underlying Compositional Reasoning in Transformers

Jing Liu

PDF

Open Access

TL;DR

This paper investigates how transformers develop layered, specialized representations that enable compositional reasoning, using a probabilistic grammar model and analyzing training dynamics and internal structures.

Contribution

It reveals the progressive emergence of layer specialization in transformers that correlates with their ability to generalize in compositional tasks.

Findings

01

Layer specialization emerges gradually during training.

02

Transformers develop hierarchical, structured internal representations.

03

Performance improves with task complexity and number of in-context examples.

Abstract

Transformers exhibit compositional reasoning on sequences not observed during training, a capability often attributed to in-context learning (ICL) and skill composition. We investigate this phenomenon using the Random Hierarchy Model (RHM), a probabilistic context-free grammar that generates sequences through recursive rule application. Models are trained on subsets of sequences and evaluated across four generalization conditions: memorization, in-distribution generalization, out-of-distribution generalization with the same rules, and cross-layer transfer. Behaviorally, performance improves systematically with task complexity and the number of in-context examples, with out-of-distribution tasks requiring substantially more examples than in-distribution scenarios. Mechanistically, we identify a progressive emergence of layer specialization during training that correlates with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Child and Animal Learning Development · Domain Adaptation and Few-Shot Learning