Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Levent Sagun, Leon Bottou, Yann LeCun

TL;DR
This paper investigates the eigenvalues of the Hessian matrix in deep learning models, revealing a two-part distribution that reflects over-parameterization and data dependence, providing insights into the loss landscape.
Contribution
It offers empirical analysis of Hessian eigenvalues before and after training, highlighting the structure of the eigenvalue distribution in deep learning models.
Findings
Eigenvalue distribution consists of a bulk around zero and scattered edges.
Bulk indicates high over-parameterization of the model.
Edges depend on input data characteristics.
Abstract
We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
