Laplace's rule of succession in information geometry
Yann Ollivier

TL;DR
This paper explores the geometric interpretation of Laplace's rule of succession and related Bayesian priors within exponential families, showing how Bayesian predictors can be approximated by simpler methods in information geometry.
Contribution
It establishes a connection between Bayesian predictors and information-theoretic predictors in exponential families, enabling approximation without complex integrations.
Findings
Bayesian predictors can be approximated by averaging MLE and SML predictors.
The Jeffreys prior corresponds to the 'add-one-half' rule.
Approximation simplifies Bayesian prediction in exponential families.
Abstract
Laplace's "add-one" rule of succession modifies the observed frequencies in a sequence of heads and tails by adding one to the observed counts. This improves prediction by avoiding zero probabilities and corresponds to a uniform Bayesian prior on the parameter. The canonical Jeffreys prior corresponds to the "add-one-half" rule. We prove that, for exponential families of distributions, such Bayesian predictors can be approximated by taking the average of the maximum likelihood predictor and the \emph{sequential normalized maximum likelihood} predictor from information theory. Thus in this case it is possible to approximate Bayesian predictors without the cost of integrating or sampling in parameter space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Statistical Mechanics and Entropy · Gaussian Processes and Bayesian Inference
