Guessing probability distributions from small samples
Thorsten Poeschel, Werner Ebeling, and Helge Rose

TL;DR
This paper introduces a novel method to estimate the statistical properties of unknown probability distributions from small samples by optimizing a guessed distribution, improving accuracy when sample sizes are limited.
Contribution
It presents a new approach for approximating distributions and their properties from small samples, extending beyond traditional frequency-based methods.
Findings
Method reliably estimates distributions from limited data
Optimizes guessed distributions through parameter adjustment
Effective in approximating entropy and other properties
Abstract
We propose a new method for the calculation of the statistical properties, as e.g. the entropy, of unknown generators of symbolic sequences. The probability distribution p(k) of the elements k of a population can be approximated by the frequencies f(k) of a sample provided the sample is long enough so that each element k occurs many times. Our method yields an approximation if this precondition does not hold. For a given f(k) we recalculate the Zipf-ordered probability distribution by optimization of the parameters of a guessed distribution. We demonstrate that our method yields reliable results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
