# Online Natural Gradient as a Kalman Filter

**Authors:** Yann Ollivier

arXiv: 1703.00209 · 2018-08-29

## TL;DR

This paper demonstrates that natural gradient descent in statistical learning can be exactly interpreted as a Kalman filter, providing new insights into hyperparameters and extending to recurrent models.

## Contribution

It establishes a rigorous equivalence between natural gradient methods and Kalman filtering, including for recurrent models, offering new theoretical insights.

## Key findings

- Natural gradient corresponds to Kalman filtering in i.i.d. cases.
- Joint Kalman filter over states and parameters aligns with natural gradient on RTRL.
- Provides interpretations for natural gradient hyperparameters.

## Abstract

We cast Amari's natural gradient in statistical learning as a specific case of Kalman filtering. Namely, applying an extended Kalman filter to estimate a fixed unknown parameter of a probabilistic model from a series of observations, is rigorously equivalent to estimating this parameter via an online stochastic natural gradient descent on the log-likelihood of the observations.   In the i.i.d. case, this relation is a consequence of the "information filter" phrasing of the extended Kalman filter. In the recurrent (state space, non-i.i.d.) case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models.   This exact algebraic correspondence provides relevant interpretations for natural gradient hyperparameters such as learning rates or initialization and regularization of the Fisher information matrix.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.00209/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1703.00209/full.md

---
Source: https://tomesphere.com/paper/1703.00209