Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Levent Sagun; Leon Bottou; Yann LeCun

arXiv:1611.07476·cs.LG·October 6, 2017·120 cites

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Levent Sagun, Leon Bottou, Yann LeCun

PDF

Open Access

TL;DR

This paper investigates the eigenvalues of the Hessian matrix in deep learning models, revealing a two-part distribution that reflects over-parameterization and data dependence, providing insights into the loss landscape.

Contribution

It offers empirical analysis of Hessian eigenvalues before and after training, highlighting the structure of the eigenvalue distribution in deep learning models.

Findings

01

Eigenvalue distribution consists of a bulk around zero and scattered edges.

02

Bulk indicates high over-parameterization of the model.

03

Edges depend on input data characteristics.

Abstract

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference