On the Power and Limitations of Random Features for Understanding Neural   Networks

Gilad Yehudai; Ohad Shamir

arXiv:1904.00687·cs.LG·March 1, 2022·26 cites

On the Power and Limitations of Random Features for Understanding Neural Networks

Gilad Yehudai, Ohad Shamir

PDF

Open Access

TL;DR

This paper reviews the theoretical understanding of over-parameterized neural networks and random features, highlighting their capabilities and fundamental limitations in learning simple and complex functions.

Contribution

It provides a clear analysis of random features in one-hidden-layer networks and demonstrates their limitations in learning even a single ReLU neuron with standard Gaussian inputs.

Findings

01

Random features cannot learn a single ReLU neuron unless the network size is exponentially large.

02

Over-parameterization enables neural networks to behave as if some components are fixed at initialization.

03

Random features are inherently limited in explaining the success of neural networks in practice.

Abstract

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as if those components are essentially fixed at their initial random values. In fact, fixing these explicitly leads to the well-known approach of learning with random features. In other words, these techniques imply that we can successfully learn with neural networks, whenever we can successfully learn with random features. In this paper, we first review these techniques, providing a simple and self-contained analysis for one-hidden-layer networks. We then argue that despite the impressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms

Methods*Communicated@Fast*How Do I Communicate to Expedia?