Estimating the Missing Mass, Partition Function or Evidence for a Case of Sampling from a Discrete Set
Bastiaan J. Braams

TL;DR
This paper introduces a self-consistent, model-free estimator for the missing mass and partition function in discrete sampling, leveraging revealed probability masses and combining Bayesian, likelihood, and moment methods.
Contribution
It develops a novel estimator that is Rao-Blackwellized and self-consistent, applicable to discrete sets with known probability masses, and provides a comprehensive analysis using multiple statistical approaches.
Findings
Estimator is self-consistent and Rao-Blackwellized.
Provides explicit expressions combining Bayesian and likelihood methods.
Analyzes the model using Bayesian, profile likelihood, and moment matching techniques.
Abstract
We consider the problem of estimating the missing mass, partition function or evidence and its probability distribution in the case that for each sample point in the discrete sample space its (unnormalized) probability mass is revealed. Estimating the missing mass or partition function (evidence) is a well-studied problem for which, in different contexts, the harmonic mean estimator and the Good-Turing (and related) estimators are available. For sampling on a discrete set with revealed probability masses these estimators can be Rao-Blackwellized, leading to self-consistent estimators not involving an auxiliary distribution with known total mass. For the case of sampling from a mixture distribution this offers the perspective of anchoring the estimator at both ends: at the diffuse end (high temperature in statistical physics) via an explicit expression for the total probability mass and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods · Statistical Methods and Bayesian Inference
