An identification problem in an urn and ball model with heavy tailed distributions
Christine Fricker (INRIA Rocquencourt), Fabrice Guillemin, Philippe, Robert (INRIA Rocquencourt)

TL;DR
This paper investigates an urn and ball model with heavy-tailed distributions, focusing on inferring the total number of colors and their distributions from a small sample, using probabilistic bounds and tail behavior analysis.
Contribution
It provides new bounds and methods to estimate the total number of colors and their distributions in heavy-tailed urn models from limited samples.
Findings
Distribution of drawn balls has the same tail as the original distribution.
Bounds for total variation distance between distributions are established.
The tail behavior is preserved in the sampling process.
Abstract
We consider in this paper an urn and ball problem with replacement, where balls are with different colors and are drawn uniformly from a unique urn. The numbers of balls with a given color are i.i.d. random variables with a heavy tailed probability distribution, for instance a Pareto or a Weibull distribution. We draw a small fraction of the total number of balls. The basic problem addressed in this paper is to know to which extent we can infer the total number of colors and the distribution of the number of balls with a given color. By means of Le Cam's inequality and Chen-Stein method, bounds for the total variation norm between the distribution of the number of balls drawn with a given color and the Poisson distribution with the same mean are obtained. We then show that the distribution of the number of balls drawn with a given color has the same tail as that of the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Stochastic processes and statistical mechanics · Statistical Methods and Inference
