Universal characteristics of deep neural network loss surfaces from   random matrix theory

Nicholas P Baskerville; Jonathan P Keating; Francesco Mezzadri; Joseph; Najnudel; Diego Granziol

arXiv:2205.08601·math-ph·April 18, 2025

Universal characteristics of deep neural network loss surfaces from random matrix theory

Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph, Najnudel, Diego Granziol

PDF

Open Access

TL;DR

This paper explores the universal properties of deep neural network loss surfaces and Hessian spectra using random matrix theory, revealing insights into spectral outliers and the effects on gradient descent algorithms.

Contribution

It introduces a framework applying random matrix universality to neural network Hessians, providing new understanding of loss surface characteristics and optimization dynamics.

Findings

01

Universal spectral outliers in neural network Hessians

02

Random matrix local laws influence gradient descent preconditioning

03

Insights into loss surface geometry from statistical physics

Abstract

This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Statistical Mechanics and Entropy