Stochastic weight matrix dynamics during learning and Dyson Brownian   motion

Gert Aarts; Biagio Lucini; Chanju Park

arXiv:2407.16427·cond-mat.dis-nn·January 10, 2025

Stochastic weight matrix dynamics during learning and Dyson Brownian motion

Gert Aarts, Biagio Lucini, Chanju Park

PDF

TL;DR

This paper models the evolution of weight matrices in learning algorithms as Dyson Brownian motion, linking stochasticity to learning parameters and revealing universal spectral features through random matrix theory.

Contribution

It introduces a novel framework connecting weight matrix updates to Dyson Brownian motion, providing new insights into the spectral properties during learning.

Findings

01

Weight matrix updates follow Dyson Brownian motion dynamics.

02

The stochasticity level relates to learning rate and batch size.

03

Universal spectral features like Wigner semicircle are observed.

Abstract

We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion, thereby inheriting many features of random matrix theory. We relate the level of stochasticity to the ratio of the learning rate and the mini-batch size, providing more robust evidence to a previously conjectured scaling relationship. We discuss universal and non-universal features in the resulting Coulomb gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model and in the (near-)solvable case of the Gaussian restricted Boltzmann machine.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.