Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan

TL;DR
This paper reveals a hierarchical structure in the derivatives of deep neural network logits that explains outliers in the Hessian spectrum, offering a new way to approximate its principal subspace without eigenanalysis.
Contribution
It identifies a two-way additive structure in gradient means that causes Hessian outliers and proposes an efficient approximation method for the Hessian's principal subspace.
Findings
Outliers in Hessian spectrum are due to a hierarchical structure in gradient means.
The proposed averaging method accurately approximates the Hessian's principal subspace.
The structure and approximation method are validated across multiple datasets and architectures.
Abstract
We consider deep classifying neural networks. We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis
