Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Thomas George, C\'esar Laurent, Xavier Bouthillier, Nicolas Ballas,, Pascal Vincent

TL;DR
This paper introduces a novel approximation for natural gradient descent that tracks diagonal variance in a Kronecker-factored eigenbasis, leading to faster optimization in deep networks compared to KFAC.
Contribution
The paper proposes a new approximation method for natural gradient descent that outperforms KFAC by tracking variance in a specialized eigenbasis, enabling efficient partial updates.
Findings
Improved optimization speed over KFAC in deep networks
Effective approximation in a Kronecker-factored eigenbasis
Provably better than existing KFAC method
Abstract
Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
