Entropy Estimates from Insufficient Samplings

P. Grassberger

arXiv:physics/0307138·physics.data-an·November 9, 2011·75 cites

Entropy Estimates from Insufficient Samplings

P. Grassberger

PDF

Open Access

TL;DR

This paper derives and analyzes entropy estimators for discrete distributions, focusing on their bias and accuracy in different sampling regimes, with explicit formulas for bias correction.

Contribution

It provides analytically derived entropy estimators with explicit bias formulas applicable to finite samples, improving accuracy especially in low sampling regimes.

Findings

01

Estimators have exponentially small bias in high sampling regimes.

02

Biases are significantly smaller than other estimators in low sampling regimes.

03

Analytical formulas enable explicit bias correction for entropy estimation.

Abstract

We present a detailed derivation of some estimators of Shannon entropy for discrete distributions. They hold for finite samples of N points distributed into M "boxes", with N and M -> oo, but N/M < oo. In the high sampling regime (<< 1 points in each box) they have exponentially small biases. In the low sampling regime the errors increase but are still much smaller than for most other estimators. One advantage is that our main estimators are given analytically, with explicitly known analytical formulas for the biases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Bayesian Methods and Mixture Models