ViViT: Curvature access through the generalized Gauss-Newton's low-rank   structure

Felix Dangel; Lukas Tatzel; Philipp Hennig

arXiv:2106.02624·cs.LG·February 11, 2022·1 cites

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Felix Dangel, Lukas Tatzel, Philipp Hennig

PDF

Open Access 5 Repos

TL;DR

ViViT introduces a novel curvature model leveraging the low-rank structure of the generalized Gauss-Newton matrix, enabling efficient eigenvalue computations and detailed analysis of noise effects in neural network training.

Contribution

It presents a new method, ViViT, that efficiently accesses the GGN's low-rank structure without approximations, facilitating scalable curvature analysis.

Findings

01

Efficient computation of eigenvalues and eigenvectors of the GGN.

02

ViViT scales well with network size and complexity.

03

Noise impacts the structural properties of the GGN during training.

Abstract

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. We demonstrate this by conducting performance benchmarks and substantiate ViViT's usefulness by studying the impact of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Tensor decomposition and applications · Gaussian Processes and Bayesian Inference