Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial   Convergence and SQ Lower Bounds

Santosh Vempala; John Wilmes

arXiv:1805.02677·cs.LG·May 28, 2019·6 cites

Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds

Santosh Vempala, John Wilmes

PDF

Open Access

TL;DR

This paper analyzes the convergence and complexity of gradient descent for training one-hidden-layer neural networks with polynomial approximation guarantees, frequency learning insights, and statistical query lower bounds.

Contribution

It provides polynomial convergence guarantees for gradient descent on one-hidden-layer networks and explains frequency learning order, supported by nearly matching SQ lower bounds.

Findings

01

GD converges to the best polynomial approximation of the target function.

02

Gradient descent learns lower frequency Fourier components before higher ones.

03

SQ lower bounds show the complexity of learning such networks is inherently high.

Abstract

We study the complexity of training neural network models with one hidden nonlinear activation layer and an output weighted sum layer. We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs. We give an agnostic learning guarantee for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error (in $2$ -norm) of the best approximation of the target function using a polynomial of degree at most $k$ . Moreover, for any $k$ , the size of the network and number of iterations needed are both bounded by $n^{O (k)} lo g (1/ ϵ)$ . In particular, this applies to training networks of unbiased sigmoids and ReLUs. We also rigorously explain the empirical finding that gradient descent discovers lower frequency Fourier components before higher frequency components. We complement this result with nearly matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Neural Networks and Applications