A geometric interpretation of stochastic gradient descent using diffusion metrics
R. Fioresi, P. Chaudhari, S. Soatto

TL;DR
This paper offers a geometric perspective on stochastic gradient descent by modeling its trajectories as geodesics in a metric space derived from the diffusion matrix, linking it to concepts from General Relativity.
Contribution
It introduces a novel geometric framework for understanding SGD trajectories through diffusion-based metrics, connecting optimization dynamics with differential geometry and physics.
Findings
SGD trajectories can be modeled as geodesics in a diffusion metric space.
The diffusion metrics encode the anisotropic noise in SGD.
An explicit example is provided for a two-layer neural network.
Abstract
Stochastic gradient descent (SGD) is a key ingredient in the training of deep neural networks and yet its geometrical significance appears elusive. We study a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from the diffusion matrix. These metrics encode information about the highly non-isotropic gradient noise in SGD. We establish a parallel with General Relativity models, where the role of the electromagnetic field is played by the gradient of the loss function. We compute an example of a two layer network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
