Spectral Evolution and Invariance in Linear-width Neural Networks
Zhichao Wang, Andrew Engel, Anand Sarwate, Ioana Dumitriu, Tony Chiang

TL;DR
This paper studies the spectral properties of linear-width neural networks, revealing invariance in spectra during training and linking spectral features to learning dynamics and generalization, with implications for understanding neural network training.
Contribution
It provides a theoretical and empirical analysis of spectral invariance and evolution in linear-width neural networks, connecting spectral properties to training dynamics and feature learning.
Findings
Spectra are invariant during training with small learning rates.
Large learning rates lead to outlier eigenvalues aligned with data structure.
Heavy tail spectral behavior emerges after adaptive gradient training.
Abstract
We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the spectra of weight in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates; we provide a theoretical justification for this observation and prove the invariance of the bulk spectra for both conjugate and neural tangent kernels. We demonstrate similar characteristics when training with stochastic gradient descent with small learning rates. When the learning rate is large, we exhibit the emergence of an outlier whose corresponding eigenvector is aligned with the training data structure. We also show that after adaptive gradient training, where a lower test error and feature learning emerge, both weight and kernel matrices exhibit heavy tail behavior.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
