Rethinking Exponential Averaging of the Fisher

Constantin Octavian Puiu

arXiv:2204.04718·cs.LG·July 1, 2022

Rethinking Exponential Averaging of the Fisher

Constantin Octavian Puiu

PDF

Open Access 1 Repo

TL;DR

This paper critically examines the use of exponential averaging in curvature-matrix estimates for ML optimization, introduces a new theoretical framework, and proposes improved algorithms that outperform existing methods like K-FAC on MNIST.

Contribution

It establishes a theoretical connection between exponential averaging and quadratic regularized models, and introduces the KLD-WRM family of algorithms with practical instantiations.

Findings

01

KLD-WRM algorithms outperform K-FAC on MNIST.

02

Theoretical link between EA-CM and Wake of Quadratic models.

03

Proposes new algorithms with practical benefits.

Abstract

In optimization for Machine learning (ML), it is typical that curvature-matrix (CM) estimates rely on an exponential average (EA) of local estimates (giving EA-CM algorithms). This approach has little principled justification, but is very often used in practice. In this paper, we draw a connection between EA-CM algorithms and what we call a "Wake of Quadratic regularized models". The outlined connection allows us to understand what EA-CM algorithms are doing from an optimization perspective. Generalizing from the established connection, we propose a new family of algorithms, "KL-Divergence Wake-Regularized Models" (KLD-WRM). We give three different practical instantiations of KLD-WRM, and show numerically that these outperform K-FAC on MNIST.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ConstantinPuiu/Rethinking-EA-of-the-Fisher
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Medical Image Segmentation Techniques · Gaussian Processes and Bayesian Inference