From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Vignesh Kothapalli, Tianyu Pang, Shenyang Deng, Zongmin Liu, Yaoqing Yang

TL;DR
This paper models the emergence of heavy-tailed spectral densities in neural network weights, revealing how training conditions like learning rates influence spectral shape and potentially improve generalization.
Contribution
It introduces a theoretical framework for understanding heavy tails in neural network spectra without gradient noise, incorporating optimizer-dependent learning rates.
Findings
Heavy tails correlate with early training phases and larger learning rates.
Learning rates influence the spectral shape, affecting generalization.
First analysis of heavy tails in a noise-free neural network setting.
Abstract
Training strategies for modern deep neural networks (NNs) tend to induce a heavy-tailed (HT) empirical spectral density (ESD) in the layer weights. While previous efforts have shown that the HT phenomenon correlates with good generalization in large NNs, a theoretical explanation of its occurrence is still lacking. Especially, understanding the conditions which lead to this phenomenon can shed light on the interplay between generalization and weight spectra. Our work aims to bridge this gap by presenting a simple, rich setting to model the emergence of HT ESD. In particular, we present a theory-informed setup for 'crafting' heavy tails in the ESD of two-layer NNs and present a systematic analysis of the HT ESD emergence without any gradient noise. This is the first work to analyze a noise-free setting, and we also incorporate optimizer (GD/Adam) dependent (large) learning rates into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum chaos and dynamical systems
MethodsAdam
