Emergent effects of scaling on the functional hierarchies within large   language models

Paul C. Bogdan

arXiv:2501.07359·cs.CL·January 14, 2025

Emergent effects of scaling on the functional hierarchies within large language models

Paul C. Bogdan

PDF

TL;DR

This paper investigates how large language models encode hierarchical information across layers, revealing both supporting evidence and surprising deviations from traditional hierarchical views, especially as models scale up.

Contribution

The study provides new insights into the emergent effects of scaling on the functional hierarchies within large language models, highlighting fluctuations and layer coordination.

Findings

01

Hierarchical encoding of semantics is partly supported in small models.

02

Large models show fluctuations in abstraction levels across layers.

03

Adjacent layers coordinate differently depending on the scale and context.

Abstract

Large language model (LLM) architectures are often described as functionally hierarchical: Early layers process syntax, middle layers begin to parse semantics, and late layers integrate information. The present work revisits these ideas. This research submits simple texts to an LLM (e.g., "A church and organ") and extracts the resulting activations. Then, for each layer, support vector machines and ridge regressions are fit to predict a text's label and thus examine whether a given layer encodes some information. Analyses using a small model (Llama-3.2-3b; 28 layers) partly bolster the common hierarchical perspective: Item-level semantics are most strongly represented early (layers 2-7), then two-item relations (layers 8-12), and then four-item analogies (layers 10-15). Afterward, the representation of items and simple relations gradually decreases in deeper layers that focus on more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus