SGD Learns the Conjugate Kernel Class of the Network

Amit Daniely

arXiv:1702.08503·cs.LG·May 23, 2017·81 cites

SGD Learns the Conjugate Kernel Class of the Network

Amit Daniely

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent (SGD) can efficiently learn functions within the conjugate kernel space of certain deep neural networks, providing the first polynomial-time guarantees for networks deeper than two layers.

Contribution

It establishes the first polynomial-time learning guarantee for standard SGD on deep networks of more than two layers, connecting neural network training to kernel methods.

Findings

01

SGD learns functions in the conjugate kernel space of the network.

02

SGD guarantees polynomial-time learning of constant degree polynomials.

03

SGD on large networks can learn any continuous function.

Abstract

We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth more that two. As corollaries, it follows that for neural networks of any depth between $2$ and $lo g (n)$ , SGD is guaranteed to learn, in polynomial time, constant degree polynomials with polynomially bounded coefficients. Likewise, it follows that SGD on large enough networks can learn any continuous function (not in polynomial time), complementing classical expressivity results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications

MethodsStochastic Gradient Descent