Machine learning in and out of equilibrium
Shishir Adhikari, Alkan Kabak\c{c}{\i}o\u{g}lu, Alexander Strang,, Deniz Yuret, Michael Hinczewski

TL;DR
This paper applies a statistical physics framework to analyze the nonequilibrium dynamics of neural network training, revealing universal fluctuation theorems and proposing a new sampling algorithm that outperforms existing methods.
Contribution
It introduces a Fokker-Planck approach to neural network training, demonstrating nonequilibrium fluctuation theorems and developing a novel stochastic gradient Langevin dynamics variant.
Findings
Fluctuation theorems hold in neural network training dynamics.
Stationary state properties depend on training details like minibatch sampling.
The proposed SGWORLD algorithm accelerates Bayesian posterior sampling.
Abstract
The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space -- for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium, exhibiting persistent currents in the space of network parameters. As in its physical analogues, the current is associated with an entropy production rate for any given training trajectory. The stationary distribution of these rates obeys the integral and detailed fluctuation theorems -- nonequilibrium generalizations of the second law of thermodynamics. We validate these relations in two numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Advanced Thermodynamics and Statistical Mechanics · Gaussian Processes and Bayesian Inference
MethodsDiffusion · Stochastic Gradient Descent · Focus
