Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

YuXin Li; Felix Dangel; Derek Tam; Colin Raffel

arXiv:2507.18807·cs.LG·July 28, 2025

Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

YuXin Li, Felix Dangel, Derek Tam, Colin Raffel

PDF

Open Access

TL;DR

This paper proposes a computationally efficient method called Squisher that approximates the Fisher Information Matrix's diagonal by reusing the squared gradient accumulator from adaptive optimizers like Adam, achieving similar accuracy with less cost.

Contribution

The paper introduces Squisher, a novel approach that recycles existing squared gradient accumulators to approximate the Fisher diagonal, reducing computational overhead.

Findings

01

Squisher performs comparably to the true Fisher diagonal across multiple applications.

02

Squisher outperforms baseline approximation methods.

03

Empirical analysis clarifies the differences between Squisher and the Fisher diagonal.

Abstract

The diagonal of a model's Fisher Information Matrix (the "Fisher diagonal") has frequently been used as a way to measure parameter sensitivity. Typically, the Fisher diagonal is estimated via squared sampled gradients of the model's likelihood with respect to its parameters, averaged over a few hundred or thousand examples -- a process which incurs nontrivial computational costs. At the same time, adaptive gradient methods like the ubiquitous Adam optimizer compute a moving average of the squared gradient over the course of training. This paper therefore explores whether an approximation of the Fisher diagonal can be obtained "for free" by recycling the squared gradient accumulator that has already been computed over the course of training. Through a comprehensive set of experiments covering five applications of the Fisher diagonal, we demonstrate that the "Squisher" (SQUared gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Metaheuristic Optimization Algorithms Research