Emergent effects of scaling on the functional hierarchies within large language models
Paul C. Bogdan

TL;DR
This paper investigates how large language models encode hierarchical information across layers, revealing both supporting evidence and surprising deviations from traditional hierarchical views, especially as models scale up.
Contribution
The study provides new insights into the emergent effects of scaling on the functional hierarchies within large language models, highlighting fluctuations and layer coordination.
Findings
Hierarchical encoding of semantics is partly supported in small models.
Large models show fluctuations in abstraction levels across layers.
Adjacent layers coordinate differently depending on the scale and context.
Abstract
Large language model (LLM) architectures are often described as functionally hierarchical: Early layers process syntax, middle layers begin to parse semantics, and late layers integrate information. The present work revisits these ideas. This research submits simple texts to an LLM (e.g., "A church and organ") and extracts the resulting activations. Then, for each layer, support vector machines and ridge regressions are fit to predict a text's label and thus examine whether a given layer encodes some information. Analyses using a small model (Llama-3.2-3b; 28 layers) partly bolster the common hierarchical perspective: Item-level semantics are most strongly represented early (layers 2-7), then two-item relations (layers 8-12), and then four-item analogies (layers 10-15). Afterward, the representation of items and simple relations gradually decreases in deeper layers that focus on more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Focus
