Is Stochastic Gradient Descent Effective? A PDE Perspective on Machine Learning processes

Davide Barbieri; Matteo Bonforte; Peio Ibarrondo

arXiv:2501.08425·cs.LG·May 13, 2025

Is Stochastic Gradient Descent Effective? A PDE Perspective on Machine Learning processes

Davide Barbieri, Matteo Bonforte, Peio Ibarrondo

PDF

Open Access

TL;DR

This paper models stochastic gradient descent (SGD) in machine learning using PDEs, revealing how it concentrates around local minima and escapes suboptimal points, especially in non-convex and degenerate cases.

Contribution

It provides a PDE-based analysis of SGD's dynamics in non-convex settings, including new bounds on escape times and convergence behavior under degenerate diffusion.

Findings

01

SGD concentrates near local minima in the drift regime.

02

Stochastic fluctuations enable escape from suboptimal minima.

03

New bounds on Mean Exit Time for non-convex, degenerate cases.

Abstract

In this paper we analyze the behaviour of the stochastic gradient descent (SGD), a widely used method in supervised learning for optimizing neural network weights via a minimization of non-convex loss functions. Since the pioneering work of E, Li and Tai (2017), the underlying structure of such processes can be understood via parabolic PDEs of Fokker-Planck type, which are at the core of our analysis. Even if Fokker-Planck equations have a long history and a extensive literature, almost nothing is known when the potential is non-convex or when the diffusion matrix is degenerate, and this is the main difficulty that we face in our analysis. We identify two different regimes: in the initial phase of SGD, the loss function drives the weights to concentrate around the nearest local minimum. We refer to this phase as the drift regime and we provide quantitative estimates on this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent · Diffusion