Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds
Santosh Vempala, John Wilmes

TL;DR
This paper analyzes the convergence and complexity of gradient descent for training one-hidden-layer neural networks with polynomial approximation guarantees, frequency learning insights, and statistical query lower bounds.
Contribution
It provides polynomial convergence guarantees for gradient descent on one-hidden-layer networks and explains frequency learning order, supported by nearly matching SQ lower bounds.
Findings
GD converges to the best polynomial approximation of the target function.
Gradient descent learns lower frequency Fourier components before higher ones.
SQ lower bounds show the complexity of learning such networks is inherently high.
Abstract
We study the complexity of training neural network models with one hidden nonlinear activation layer and an output weighted sum layer. We analyze Gradient Descent applied to learning a bounded target function on real-valued inputs. We give an agnostic learning guarantee for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error (in -norm) of the best approximation of the target function using a polynomial of degree at most . Moreover, for any , the size of the network and number of iterations needed are both bounded by . In particular, this applies to training networks of unbiased sigmoids and ReLUs. We also rigorously explain the empirical finding that gradient descent discovers lower frequency Fourier components before higher frequency components. We complement this result with nearly matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Neural Networks and Applications
