Stochastic weight matrix dynamics during learning and Dyson Brownian motion
Gert Aarts, Biagio Lucini, Chanju Park

TL;DR
This paper models the evolution of weight matrices in learning algorithms as Dyson Brownian motion, linking stochasticity to learning parameters and revealing universal spectral features through random matrix theory.
Contribution
It introduces a novel framework connecting weight matrix updates to Dyson Brownian motion, providing new insights into the spectral properties during learning.
Findings
Weight matrix updates follow Dyson Brownian motion dynamics.
The stochasticity level relates to learning rate and batch size.
Universal spectral features like Wigner semicircle are observed.
Abstract
We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion, thereby inheriting many features of random matrix theory. We relate the level of stochasticity to the ratio of the learning rate and the mini-batch size, providing more robust evidence to a previously conjectured scaling relationship. We discuss universal and non-universal features in the resulting Coulomb gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model and in the (near-)solvable case of the Gaussian restricted Boltzmann machine.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
