Laplace's rule of succession in information geometry

Yann Ollivier

arXiv:1503.04304·cs.IT·March 17, 2015·1 cites

Laplace's rule of succession in information geometry

Yann Ollivier

PDF

Open Access

TL;DR

This paper explores the geometric interpretation of Laplace's rule of succession and related Bayesian priors within exponential families, showing how Bayesian predictors can be approximated by simpler methods in information geometry.

Contribution

It establishes a connection between Bayesian predictors and information-theoretic predictors in exponential families, enabling approximation without complex integrations.

Findings

01

Bayesian predictors can be approximated by averaging MLE and SML predictors.

02

The Jeffreys prior corresponds to the 'add-one-half' rule.

03

Approximation simplifies Bayesian prediction in exponential families.

Abstract

Laplace's "add-one" rule of succession modifies the observed frequencies in a sequence of heads and tails by adding one to the observed counts. This improves prediction by avoiding zero probabilities and corresponds to a uniform Bayesian prior on the parameter. The canonical Jeffreys prior corresponds to the "add-one-half" rule. We prove that, for exponential families of distributions, such Bayesian predictors can be approximated by taking the average of the maximum likelihood predictor and the \emph{sequential normalized maximum likelihood} predictor from information theory. Thus in this case it is possible to approximate Bayesian predictors without the cost of integrating or sampling in parameter space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Statistical Mechanics and Entropy · Gaussian Processes and Bayesian Inference