Models of Heavy-Tailed Mechanistic Universality
Liam Hodgkinson, Zhichao Wang, and Michael W. Mahoney

TL;DR
This paper introduces a new random matrix model to explain the heavy-tailed spectral behaviors observed in neural networks, linking these phenomena to training dynamics and model structure, and suggesting they are fundamental to deep learning success.
Contribution
It proposes the high-temperature Marchenko-Pastur ensemble as a theoretical framework to understand heavy-tailed universality in neural networks, connecting spectral properties to training and data factors.
Findings
Heavy-tailed spectral densities arise from data correlations, training temperature, and eigenvector entropy.
The model explains neural scaling laws and optimizer trajectories.
Heavy tails are linked to phases of neural network training.
Abstract
Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians, Hessians, and weight matrices has led to the introduction of the concept of heavy-tailed mechanistic universality (HT-MU). Multiple lines of empirical evidence suggest a robust correlation between heavy-tailed metrics and model performance, indicating that HT-MU may be a fundamental aspect of deep learning efficacy. Here, we propose a general family of random matrix models -- the high-temperature Marchenko-Pastur (HTMP) ensemble -- to explore attributes that give rise to heavy-tailed behavior in trained neural networks. Under this model, spectral densities with power laws on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTheoretical and Computational Physics · Spectral Theory in Mathematical Physics · Magnetism in coordination complexes
