Stochastic gradient descent with noise of machine learning type. Part   II: Continuous time analysis

Stephan Wojtowytsch

arXiv:2106.02588·cs.LG·September 16, 2021·6 cites

Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

Stephan Wojtowytsch

PDF

Open Access

TL;DR

This paper analyzes stochastic gradient descent with machine learning-specific noise in a continuous time framework, revealing how the noise influences the preference for flat minima in neural network training.

Contribution

It introduces a continuous time model for SGD with machine learning noise and demonstrates how this noise regime affects the selection of flat minima differently from traditional models.

Findings

01

SGD with machine learning noise favors different flat minima.

02

The noise regime impacts the optimization trajectory.

03

Continuous time analysis reveals new insights into minima selection.

Abstract

The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent