Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery
Yifei Wang, Yixuan Hua, Emmanuel Cand\'es, Mert Pilanci

TL;DR
This paper demonstrates that overparameterized two-layer ReLU neural networks tend to learn simple, sparse models that can be exactly recovered under certain conditions, explaining their good generalization even with noisy labels.
Contribution
It introduces a convex optimization perspective to analyze ReLU networks, establishing isometry conditions for exact recovery and revealing phase transitions in model recovery.
Findings
ReLU networks learn sparse, simple models that generalize well.
Exact recovery of planted neurons is possible under isometry conditions.
Phase transition in recovery success depends on sample-to-dimension ratio.
Abstract
The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference · Geophysical and Geoelectrical Methods
MethodsWeight Decay
