Optimal scaling laws in learning hierarchical multi-index models
Leonardo Defilippis, Florent Krzakala, Bruno Loureiro, Antoine Maillard

TL;DR
This paper develops a precise theoretical framework for understanding how two-layer neural networks learn hierarchical multi-index targets, revealing phase transitions and optimal spectral estimators that explain observed scaling laws and phenomena.
Contribution
It introduces exact information-theoretic scaling laws for hierarchical learning and demonstrates that a spectral estimator achieves optimal rates in a representation-limited regime.
Findings
Spectral estimator achieves optimal subspace recovery and prediction error rates.
Hierarchical features are learned sequentially through phase transitions.
The theory explains plateau phenomena and spectral structures in shallow networks.
Abstract
In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
