On the Dynamics of Inference and Learning
David S. Berman, Jonathan J. Heckman, Marc Klinger

TL;DR
This paper models Bayesian inference as a continuous dynamical system, analyzing its behavior and learning rates, and compares it to neural network training dynamics on benchmark datasets.
Contribution
It introduces a differential equation framework for Bayesian updating, linking inference dynamics to information geometry and neural network training.
Findings
Learning rate follows a 1/T power-law when the Cramér-Rao bound is saturated.
Hidden variables add a driving term to the inference flow equation.
Neural network training exhibits similar power-law behavior in final loss regimes.
Abstract
Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple power-law, with a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Statistical Mechanics and Entropy
