A Kronecker-factored approximate Fisher matrix for convolution layers
Roger Grosse, James Martens

TL;DR
This paper introduces KFC, a Kronecker-factored approximation of the Fisher matrix for convolutional neural networks, enabling faster second-order optimization with efficiency comparable to SGD.
Contribution
The paper proposes KFC, a novel structured approximation to the Fisher matrix for convolutional layers, improving natural gradient methods' efficiency and invariance properties.
Findings
KFC enables several times faster training than SGD.
Training with KFC requires 10-20 times fewer iterations.
KFC maintains important curvature information with efficient updates.
Abstract
Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Face and Expression Recognition · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent · Convolution
