Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Pulkit Gopalani, Samyak Jha, Anirbit Mukherjee

TL;DR
This paper proves the global convergence of stochastic gradient descent (SGD) for training two-layer neural networks with logistic loss, covering arbitrary data and various activation functions, including sigmoid, tanh, and SoftPlus.
Contribution
It provides the first provable convergence guarantees for SGD on two-layer neural networks with logistic loss for arbitrary data and smooth activations.
Findings
SGD converges to global minima for two-layer nets with logistic loss.
Exponential convergence rate established for continuous-time SGD with smooth unbounded activations.
Existence of Frobenius norm regularized logistic loss functions as Villani functions enables analysis.
Abstract
In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
MethodsStochastic Gradient Descent
