Limitations of the Empirical Fisher Approximation for Natural Gradient   Descent

Frederik Kunstner; Lukas Balles; Philipp Hennig

arXiv:1905.12558·cs.LG·June 9, 2020·41 cites

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

Frederik Kunstner, Lukas Balles, Philipp Hennig

PDF

Open Access 1 Repo

TL;DR

This paper critically examines the empirical Fisher approximation in natural gradient descent, demonstrating that it often fails to capture true second-order information and can lead to undesirable optimization behaviors.

Contribution

The authors provide a theoretical and empirical analysis showing the limitations of the empirical Fisher as an approximation to the Fisher information matrix.

Findings

01

Empirical Fisher does not reliably approximate the Fisher or Hessian.

02

Conditions for empirical Fisher to approximate Fisher are rarely met in practice.

03

Using empirical Fisher can lead to undesirable optimization effects.

Abstract

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fkunstner/limitations-empirical-fisher
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Gaussian Processes and Bayesian Inference

MethodsAdam