Fast Approximate Natural Gradient Descent in a Kronecker-factored   Eigenbasis

Thomas George; C\'esar Laurent; Xavier Bouthillier; Nicolas Ballas,; Pascal Vincent

arXiv:1806.03884·cs.LG·July 27, 2021·37 cites

Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis

Thomas George, C\'esar Laurent, Xavier Bouthillier, Nicolas Ballas,, Pascal Vincent

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel approximation for natural gradient descent that tracks diagonal variance in a Kronecker-factored eigenbasis, leading to faster optimization in deep networks compared to KFAC.

Contribution

The paper proposes a new approximation method for natural gradient descent that outperforms KFAC by tracking variance in a specialized eigenbasis, enabling efficient partial updates.

Findings

01

Improved optimization speed over KFAC in deep networks

02

Effective approximation in a Kronecker-factored eigenbasis

03

Provably better than existing KFAC method

Abstract

Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings