Anatomical Heterogeneity in Transformer Language Models

Tomasz Wietrzykowski

arXiv:2603.19348·cs.LG·March 23, 2026

Anatomical Heterogeneity in Transformer Language Models

Tomasz Wietrzykowski

PDF

Open Access

TL;DR

This paper reveals significant anatomical heterogeneity in transformer language models, showing that different layers have varying importance and training needs, which can be exploited to reduce training costs.

Contribution

It provides empirical evidence of layer heterogeneity in transformer models and introduces a growth training method that allocates resources based on layer importance for efficiency.

Findings

01

Layer weights follow a regular oscillatory pattern with high predictability.

02

Layer importance varies over a 10^7 range, affecting model performance.

03

Growth Transformer Training reduces training costs by ~54% while maintaining quality.

Abstract

Current transformer language models are trained with uniform computational budgets across all layers, implicitly assuming layer homogeneity. We challenge this assumption through empirical analysis of SmolLM2-135M, a 30-layer, 135M-parameter causal language model, using five diagnostic metrics: weight predictability (R2), ablation degradation, recovery speed, weight manipulation robustness, and structural analysis. We find profound anatomical heterogeneity: (1) Layer weights follow strong mathematical regularity (R2 = 0.91) with a universal oscillatory delta pattern (correlation ~= -0.50), yet predicted weights cause catastrophic failure due to nonlinear error accumulation. (2) Layer importance spans a 10^7 range, from a critical core (L8-11, up to +63,419% PPL degradation) to anti-layers (L14, L17) whose removal improves performance. (3) Recovery speed correlates with layer importance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices