Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training
Wei Bu, Uri Kol, Ziming Liu

TL;DR
This paper explores the evolution of neural network weights during training using Fokker-Planck equations, deriving related PDEs and comparing theoretical predictions with empirical data in auto-encoders.
Contribution
It introduces a novel application of Fokker-Planck equations to model neural network training dynamics and derives related PDEs like Callan-Symanzik and Kardar-Parisi-Zhang.
Findings
Fokker-Planck effectively models weight distribution evolution.
Derived PDEs provide new insights into training dynamics.
Empirical data aligns with theoretical predictions.
Abstract
The dynamical evolution of a neural network during training has been an incredibly fascinating subject of study. First principal derivation of generic evolution of variables in statistical physics systems has proved useful when used to describe training dynamics conceptually, which in practice means numerically solving equations such as Fokker-Planck equation. Simulating entire networks inevitably runs into the curse of dimensionality. In this paper, we utilize Fokker-Planck to simulate the probability density evolution of individual weight matrices in the bottleneck layers of a simple 2-bottleneck-layered auto-encoder and compare the theoretical evolutions against the empirical ones by examining the output data distributions. We also derive physically relevant partial differential equations such as Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation we have.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Advanced Thermodynamics and Statistical Mechanics
