Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

Pulkit Gopalani; Samyak Jha; Anirbit Mukherjee

arXiv:2309.09258·cs.LG·March 19, 2024

Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

Pulkit Gopalani, Samyak Jha, Anirbit Mukherjee

PDF

Open Access

TL;DR

This paper proves the global convergence of stochastic gradient descent (SGD) for training two-layer neural networks with logistic loss, covering arbitrary data and various activation functions, including sigmoid, tanh, and SoftPlus.

Contribution

It provides the first provable convergence guarantees for SGD on two-layer neural networks with logistic loss for arbitrary data and smooth activations.

Findings

01

SGD converges to global minima for two-layer nets with logistic loss.

02

Exponential convergence rate established for continuous-time SGD with smooth unbounded activations.

03

Existence of Frobenius norm regularized logistic loss functions as Villani functions enables analysis.

Abstract

In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms

MethodsStochastic Gradient Descent