On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Xiaowu Dai; Yuhua Zhu

arXiv:2112.00987·cs.LG·December 3, 2021

On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective

Xiaowu Dai, Yuhua Zhu

PDF

Open Access

TL;DR

This paper models stochastic gradient descent as stochastic differential equations to analyze how batch size influences convergence to sharp or flat minima, revealing that in the long run, solutions tend to favor flatter minima regardless of batch size.

Contribution

It introduces a Fokker-Planck perspective to analyze SGD dynamics, providing new insights into the effects of batch size on minima sharpness and convergence rates.

Findings

01

SGD tends to converge to flatter minima over time

02

Convergence rate depends on batch size

03

Empirical validation across datasets and models

Abstract

We study the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD). We approximate the mini-batch SGD and the momentum SGD as stochastic differential equations (SDEs). We exploit the continuous formulation of SDE and the theory of Fokker-Planck equations to develop new results on the escaping phenomenon and the relationship with large batch and sharp minima. In particular, we find that the stochastic process solution tends to converge to flatter minima regardless of the batch size in the asymptotic regime. However, the convergence rate is rigorously proven to depend on the batch size. These results are validated empirically with various datasets and models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Thermodynamics and Statistical Mechanics · Stochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent