Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
Arie Wortsman-Zurich, Hugo Tabanelli, Yatin Dandi, Florent Krzakala, Bruno Loureiro

TL;DR
This paper introduces a hierarchical model demonstrating how scaling laws naturally emerge from feature learning in multi-layer networks, with sequential feature recovery and explicit error decay.
Contribution
It presents a simple mechanism and spectral algorithm that achieve improved scaling and recover features sequentially in a hierarchical setting.
Findings
Spectral algorithm improves scaling over shallow methods.
Strong features are detected at small sample sizes.
Prediction error decays following an explicit power law.
Abstract
We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
