Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

Minh-Toan Nguyen; Jean Barbier

arXiv:2605.10395·stat.ML·May 13, 2026

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

Minh-Toan Nguyen, Jean Barbier

PDF

TL;DR

This paper analyzes the limits of learning hierarchical features in wide neural networks, revealing phase transitions in feature recoverability and deriving sharp scaling laws for generalization error.

Contribution

It introduces a theoretical framework with fixed-point equations to characterize feature learnability and generalization error scaling in extensive-width networks.

Findings

01

Feature learnability occurs via sharp phase transitions as data increases.

02

Effective width $k_c$ unifies different scaling regimes of generalization error.

03

Empirical results show Adam-trained models near $k_c$ achieve optimal scaling laws.

Abstract

We study the information-theoretic limits of learning a one-hidden-layer teacher network with hierarchical features from noisy queries, in the context of knowledge transfer to a smaller student model. We work in the high-dimensional regime where the teacher width $k$ scales linearly with the input dimension $d$ -- a setting that captures large-but-finite-width networks and has only recently become analytically tractable. Using a heuristic leave-one-out decoupling argument, validated numerically throughout, we derive asymptotically sharp characterizations of the Bayes-optimal generalization error and individual feature overlaps via a system of closed fixed-point equations. These equations reveal that feature learnability is governed by a sequence of sharp phase transitions: as data grows, teacher features become recoverable sequentially, each through a discontinuous jump in overlap. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.