Negative eigenvalues of the Hessian in deep neural networks

Guillaume Alain; Nicolas Le Roux; Pierre-Antoine Manzagol

arXiv:1902.02366·cs.LG·February 8, 2019·23 cites

Negative eigenvalues of the Hessian in deep neural networks

Guillaume Alain, Nicolas Le Roux, Pierre-Antoine Manzagol

PDF

Open Access

TL;DR

This paper investigates the non-convex loss landscape of deep neural networks by analyzing Hessian eigenvalues, emphasizing the significance of negative eigenvalues and their impact on training dynamics.

Contribution

It provides a detailed analysis of the Hessian eigenvalues in deep networks and explores the effects of negative eigenvalues on optimization.

Findings

01

Negative eigenvalues are prevalent in deep network loss landscapes.

02

Handling negative eigenvalues can improve training stability.

03

The Hessian's eigendecomposition reveals insights into non-convexity.

Abstract

The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research. In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Matrix Theory and Algorithms