PyHessian: Neural Networks Through the Lens of the Hessian
Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

TL;DR
PyHessian is a scalable framework that enables fast computation of second-order information in neural networks, providing insights into the loss landscape and factors affecting trainability.
Contribution
It introduces a new scalable framework for computing Hessian information in neural networks, supporting distributed systems and enabling detailed loss landscape analysis.
Findings
Batch Normalization does not always smooth the loss landscape
Residual connections impact the curvature of the loss landscape
The framework allows detailed spectral analysis of neural network Hessians
Abstract
We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open source. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape smoother, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Model Reduction and Neural Networks
MethodsBatch Normalization
