Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear Algebra
Constantin Octavian Puiu

TL;DR
This paper introduces a randomized approach to accelerate K-FAC by exploiting the rapid eigen-spectrum decay of Kronecker factors, significantly reducing computation time while maintaining performance.
Contribution
We propose a novel randomized eigenmode truncation method for K-FAC, reducing complexity from cubic to quadratic in layer width and improving training speed.
Findings
2.5x reduction in per-epoch training time
3.3x reduction in time to target accuracy
Comparable performance to SENG on CIFAR10 with VGG16_bn
Abstract
K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Importantly, the spectrum decay happens over a constant number of modes irrespectively of the layer width. This allows us to reduce the time complexity of K-FAC from cubic to quadratic in layer width,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced SAR Imaging Techniques · Seismic Imaging and Inversion Techniques
