Nearly Minimal Over-Parametrization of Shallow Neural Networks

Armin Eftekhari; ChaeHwan Song; Volkan Cevher

arXiv:1910.03948·cs.LG·October 30, 2019

Nearly Minimal Over-Parametrization of Shallow Neural Networks

Armin Eftekhari, ChaeHwan Song, Volkan Cevher

PDF

Open Access

TL;DR

This paper proves that shallow neural networks can be trained to fit data with only linear over-parametrization, using simple gradient descent, surpassing previous quadratic requirements and extending to broader learning problems.

Contribution

It establishes that linear over-parametrization suffices for training shallow networks with gradient descent, moving beyond the lazy regime and applicable to various learning tasks.

Findings

01

Linear over-parametrization is sufficient for fitting training data.

02

Simple gradient descent can effectively train over-parameterized shallow networks.

03

The framework extends to other learning problems beyond shallow networks.

Abstract

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Neural Network Applications