Universal characteristics of deep neural network loss surfaces from random matrix theory
Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph, Najnudel, Diego Granziol

TL;DR
This paper explores the universal properties of deep neural network loss surfaces and Hessian spectra using random matrix theory, revealing insights into spectral outliers and the effects on gradient descent algorithms.
Contribution
It introduces a framework applying random matrix universality to neural network Hessians, providing new understanding of loss surface characteristics and optimization dynamics.
Findings
Universal spectral outliers in neural network Hessians
Random matrix local laws influence gradient descent preconditioning
Insights into loss surface geometry from statistical physics
Abstract
This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Statistical Mechanics and Entropy
