Measurements of Three-Level Hierarchical Structure in the Outliers in   the Spectrum of Deepnet Hessians

Vardan Papyan

arXiv:1901.08244·cs.LG·January 25, 2019·26 cites

Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

Vardan Papyan

PDF

Open Access 1 Repo

TL;DR

This paper reveals a hierarchical structure in the derivatives of deep neural network logits that explains outliers in the Hessian spectrum, offering a new way to approximate its principal subspace without eigenanalysis.

Contribution

It identifies a two-way additive structure in gradient means that causes Hessian outliers and proposes an efficient approximation method for the Hessian's principal subspace.

Findings

01

Outliers in Hessian spectrum are due to a hierarchical structure in gradient means.

02

The proposed averaging method accurately approximates the Hessian's principal subspace.

03

The structure and approximation method are validated across multiple datasets and architectures.

Abstract

We consider deep classifying neural networks. We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deep-lab/DeepnetHessian
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis