Entropy Estimates from Insufficient Samplings
P. Grassberger

TL;DR
This paper derives and analyzes entropy estimators for discrete distributions, focusing on their bias and accuracy in different sampling regimes, with explicit formulas for bias correction.
Contribution
It provides analytically derived entropy estimators with explicit bias formulas applicable to finite samples, improving accuracy especially in low sampling regimes.
Findings
Estimators have exponentially small bias in high sampling regimes.
Biases are significantly smaller than other estimators in low sampling regimes.
Analytical formulas enable explicit bias correction for entropy estimation.
Abstract
We present a detailed derivation of some estimators of Shannon entropy for discrete distributions. They hold for finite samples of N points distributed into M "boxes", with N and M -> oo, but N/M < oo. In the high sampling regime (<< 1 points in each box) they have exponentially small biases. In the low sampling regime the errors increase but are still much smaller than for most other estimators. One advantage is that our main estimators are given analytically, with explicitly known analytical formulas for the biases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
