Global Convergence of SGD On Two Layer Neural Nets
Pulkit Gopalani, Anirbit Mukherjee

TL;DR
This paper proves global convergence of stochastic gradient descent for two-layer neural networks with smooth, bounded activations, providing convergence bounds and rates that are independent of network size.
Contribution
It introduces a novel analysis framework using Frobenius norm regularized loss functions that are 'Villani functions', enabling size-independent convergence guarantees for SGD on two-layer nets.
Findings
SGD converges globally for certain initializations.
Exponential convergence rate for continuous-time SGD with smooth unbounded activations.
Regularization needed is independent of network size.
Abstract
In this note, we consider appropriately regularized empirical risk of depth nets with any number of gates and show bounds on how the empirical loss evolves for SGD iterates on it -- for arbitrary data and if the activation is adequately smooth and bounded like sigmoid and tanh. This in turn leads to a proof of global convergence of SGD for a special class of initializations. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives. Most critically the amount of regularization required for our analysis is independent of the size of the net.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsStochastic Gradient Descent
