An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani, Shankar Krishnan, Ying Xiao

TL;DR
This paper introduces a new spectral analysis tool to study Hessian eigenvalues during neural network training, revealing how spectral features influence optimization and differ across normalization methods.
Contribution
It develops scalable methods for Hessian spectrum estimation and provides new insights into the spectral dynamics and their impact on neural network optimization.
Findings
Large isolated eigenvalues appear rapidly in non-batch normalized networks.
Gradient concentration occurs in eigenspaces associated with large eigenvalues.
Batch normalization suppresses the emergence of these spectral features.
Abstract
To understand the dynamics of optimization in deep neural networks, we develop a tool to study the evolution of the entire Hessian spectrum throughout the optimization process. Using this, we study a number of hypotheses concerning smoothness, curvature, and sharpness in the deep learning literature. We then thoroughly analyze a crucial structural feature of the spectra: in non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In batch normalized networks, these two effects are almost absent. We characterize these effects, and explain how they affect optimization speed through both theory and experiments. As part of this work, we adapt advanced tools from numerical linear algebra that allow scalable and accurate estimation of the entire Hessian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Model Reduction and Neural Networks · Image and Signal Denoising Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
