The asymptotic spectrum of the Hessian of DNN throughout training
Arthur Jacot, Franck Gabriel, Cl\'ement Hongler

TL;DR
This paper analyzes the spectral properties of the Hessian matrix in deep neural networks throughout training, using the Neural Tangent Kernel to provide precise asymptotic characterizations at initialization and during training.
Contribution
It offers a comprehensive spectral analysis of the Hessian in DNNs, connecting NTK dynamics with Hessian asymptotics at different training stages.
Findings
Full spectral characterization of the Hessian at initialization and during training when NTK is fixed.
First two moments of the Hessian at initialization in the mean-field limit where NTK evolves.
Insights into the Hessian's behavior that can inform optimization and generalization in DNNs.
Abstract
The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-field limit, where the NTK is not fixed during training, we describe the first two moments of the Hessian at initialization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
MethodsNeural Tangent Kernel
