Optimization-Based Separations for Neural Networks
Itay Safran, Jason D. Lee

TL;DR
This paper demonstrates that deeper neural networks can be efficiently trained to learn certain functions, providing the first provable separation showing the advantage of depth in neural network optimization and approximation.
Contribution
It proves that gradient descent can efficiently learn radial functions with a depth-2 neural network, establishing a concrete optimization-based separation for neural network depth benefits.
Findings
Gradient descent efficiently learns radial functions with depth-2 networks.
Deeper architectures can have provable optimization advantages over shallower ones.
Certain functions cannot be learned efficiently by shallow networks or standard algorithms.
Abstract
Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known results in which the deeper architecture leverages this advantage into a provable optimization guarantee. We prove that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training. By building on and refining existing techniques for approximation lower bounds of neural networks with a single layer of non-linearities, we show that there are -dimensional radial distributions on the data such that ball…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
