Exponential escape efficiency of SGD from sharp minima in non-stationary   regime

Hikaru Ibayashi; Masaaki Imaizumi

arXiv:2111.04004·cs.LG·March 22, 2022

Exponential escape efficiency of SGD from sharp minima in non-stationary regime

Hikaru Ibayashi, Masaaki Imaizumi

PDF

Open Access 1 Repo

TL;DR

This paper develops a new theoretical framework using Large Deviation Theory to show that SGD escapes sharp minima exponentially fast even before reaching stationarity, explaining its effectiveness in training neural networks.

Contribution

It introduces a novel theory for SGD escape efficiency in non-stationary regimes, extending understanding beyond stationary distribution assumptions.

Findings

01

SGD escapes sharp minima exponentially fast in non-stationary regimes

02

The theory applies to both continuous and discrete SGD

03

Experimental results support the theoretical predictions

Abstract

We show that stochastic gradient descent (SGD) escapes from sharp minima exponentially fast even before SGD reaches stationary distribution. SGD has been a de-facto standard training algorithm for various machine learning tasks. However, there still exists an open question as to why SGDs find highly generalizable parameters from non-convex target functions, such as the loss function of neural networks. An "escape efficiency" has been an attractive notion to tackle this question, which measures how SGD efficiently escapes from sharp minima with potentially low generalization performance. Despite its importance, the notion has the limitation that it works only when SGD reaches a stationary distribution after sufficient updates. In this paper, we develop a new theory to investigate escape efficiency of SGD with Gaussian noise, by introducing the Large Deviation Theory for dynamical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibayashi-hikaru/msml_experiments
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsStochastic Gradient Descent