ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
Felix Dangel, Lukas Tatzel, Philipp Hennig

TL;DR
ViViT introduces a novel curvature model leveraging the low-rank structure of the generalized Gauss-Newton matrix, enabling efficient eigenvalue computations and detailed analysis of noise effects in neural network training.
Contribution
It presents a new method, ViViT, that efficiently accesses the GGN's low-rank structure without approximations, facilitating scalable curvature analysis.
Findings
Efficient computation of eigenvalues and eigenvectors of the GGN.
ViViT scales well with network size and complexity.
Noise impacts the structural properties of the GGN during training.
Abstract
Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication via automatic differentiation or Kronecker-factored block diagonal approximations do not consider noise in the mini-batch. We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations. It allows for efficient computation of eigenvalues, eigenvectors, as well as per-sample first- and second-order directional derivatives. The representation is computed in parallel with gradients in one backward pass and offers a fine-grained cost-accuracy trade-off, which allows it to scale. We demonstrate this by conducting performance benchmarks and substantiate ViViT's usefulness by studying the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Tensor decomposition and applications · Gaussian Processes and Bayesian Inference
