Nearly Minimal Over-Parametrization of Shallow Neural Networks
Armin Eftekhari, ChaeHwan Song, Volkan Cevher

TL;DR
This paper proves that shallow neural networks can be trained to fit data with only linear over-parametrization, using simple gradient descent, surpassing previous quadratic requirements and extending to broader learning problems.
Contribution
It establishes that linear over-parametrization suffices for training shallow networks with gradient descent, moving beyond the lazy regime and applicable to various learning tasks.
Findings
Linear over-parametrization is sufficient for fitting training data.
Simple gradient descent can effectively train over-parameterized shallow networks.
The framework extends to other learning problems beyond shallow networks.
Abstract
A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Neural Network Applications
