On the Power and Limitations of Random Features for Understanding Neural Networks
Gilad Yehudai, Ohad Shamir

TL;DR
This paper reviews the theoretical understanding of over-parameterized neural networks and random features, highlighting their capabilities and fundamental limitations in learning simple and complex functions.
Contribution
It provides a clear analysis of random features in one-hidden-layer networks and demonstrates their limitations in learning even a single ReLU neuron with standard Gaussian inputs.
Findings
Random features cannot learn a single ReLU neuron unless the network size is exponentially large.
Over-parameterization enables neural networks to behave as if some components are fixed at initialization.
Random features are inherently limited in explaining the success of neural networks in practice.
Abstract
Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as if those components are essentially fixed at their initial random values. In fact, fixing these explicitly leads to the well-known approach of learning with random features. In other words, these techniques imply that we can successfully learn with neural networks, whenever we can successfully learn with random features. In this paper, we first review these techniques, providing a simple and self-contained analysis for one-hidden-layer networks. We then argue that despite the impressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
Methods*Communicated@Fast*How Do I Communicate to Expedia?
