Inverse Depth Scaling From Most Layers Being Similar

Yizhou Liu; Sara Kangaslahti; Ziming Liu; Jeff Gore

arXiv:2602.05970·cs.LG·February 6, 2026

Inverse Depth Scaling From Most Layers Being Similar

Yizhou Liu, Sara Kangaslahti, Ziming Liu, Jeff Gore

PDF

Open Access

TL;DR

This paper investigates how depth influences loss in large language models, revealing an inverse proportionality that suggests layers act similarly, which impacts how we should design more efficient architectures.

Contribution

It provides the first detailed analysis of how depth affects loss in LLMs, showing layers behave similarly and proposing architectural changes for better efficiency.

Findings

01

Loss scales inversely with depth in LLMs

02

Layers tend to be functionally similar, reducing error via ensemble effects

03

Current residual architectures may limit the potential of depth

Abstract

Neural scaling laws relate loss to model size in large language models (LLMs), yet depth and width may contribute to performance differently, requiring more detailed studies. Here, we quantify how depth affects loss via analysis of LLMs and toy residual networks. We find loss scales inversely proportional to depth in LLMs, probably due to functionally similar layers reducing error through ensemble averaging rather than compositional learning or discretizing smooth dynamics. This regime is inefficient yet robust and may arise from the architectural bias of residual networks and target functions incompatible with smooth dynamics. The findings suggest that improving LLM efficiency may require architectural innovations to encourage compositional use of depth.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques