Superposition Yields Robust Neural Scaling

Yizhou Liu; Ziming Liu; Jeff Gore

arXiv:2505.10465·cs.LG·May 5, 2026

Superposition Yields Robust Neural Scaling

Yizhou Liu, Ziming Liu, Jeff Gore

PDF

1 Repo 1 Video

TL;DR

This paper investigates how representation superposition in large language models influences neural scaling laws, revealing that strong superposition leads to inverse loss scaling with model size, and confirming this in open-source models.

Contribution

It introduces the idea that representation superposition is a key factor in neural scaling laws and demonstrates how it affects loss scaling across different model regimes.

Findings

01

Loss scales inversely with model dimension under strong superposition.

02

Open-source LLMs operate in the strong superposition regime.

03

Chinchilla scaling laws are consistent with superposition-driven scaling.

Abstract

The success of today's large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a power law with model size, remains unclear. We propose that representation superposition, meaning that LLMs represent more features than they have dimensions, can be a key contributor to loss and cause neural scaling. Based on Anthropic's toy model, we use weight decay to control the degree of superposition, allowing us to systematically study how loss scales with model size. When superposition is weak, the loss follows a power law only if data feature frequencies are power-law distributed. In contrast, under strong superposition, the loss generically scales inversely with model dimension across a broad class of frequency distributions, due to geometric overlaps between representation vectors. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuyz0/superpositionscaling
github

Videos

Superposition Yields Robust Neural Scaling· slideslive