SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

Ildus Sadrtdinov; Ivan Klimov; Ekaterina Lobacheva; Dmitry Vetrov

arXiv:2505.23489·cs.LG·May 30, 2025

SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

Ildus Sadrtdinov, Ivan Klimov, Ekaterina Lobacheva, Dmitry Vetrov

PDF

1 Repo

TL;DR

This paper offers a thermodynamic interpretation of stochastic gradient descent (SGD) in neural network training, framing it as free energy minimization influenced by learning rate and model parameterization.

Contribution

It introduces a novel thermodynamic perspective on SGD, linking learning rate to temperature and explaining convergence behavior in underparameterized and overparameterized models.

Findings

01

UP models follow free energy minimization with increasing temperature at higher LRs.

02

OP models' temperature drops to zero at low LRs, leading to direct loss minimization.

03

The difference is due to the signal-to-noise ratio of stochastic gradients near optima.

Abstract

We present a thermodynamic interpretation of the stationary behavior of stochastic gradient descent (SGD) under fixed learning rates (LRs) in neural network training. We show that SGD implicitly minimizes a free energy function $F = U - T S$ , balancing training loss $U$ and the entropy of the weights distribution $S$ , with temperature $T$ determined by the LR. This perspective offers a new lens on why high LRs prevent training from converging to the loss minima and how different LRs lead to stabilization at different loss levels. We empirically validate the free energy framework on both underparameterized (UP) and overparameterized (OP) models. UP models consistently follow free energy minimization, with temperature increasing monotonically with LR, while for OP models, the temperature effectively drops to zero at low LRs, causing SGD to minimize the loss directly and converge to an optimum.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isadrtdinov/sgd-free-energy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent