To BEE or not to BEE: Estimating more than Entropy with Biased Entropy Estimators
Ilaria Pia la Torre, David A. Kelly, Hector D. Menendez, David Clark

TL;DR
This paper evaluates 18 biased entropy estimators across various measures and distributions, identifying the Chao-Shen and Chao-Wang-Jost estimators as the most accurate and efficient for software engineering applications.
Contribution
It provides a comprehensive empirical comparison of entropy estimators, highlighting the superior performance of Chao-Shen and Chao-Wang-Jost estimators across different conditions.
Findings
Chao-Shen and Chao-Wang-Jost estimators converge faster to true entropy.
These estimators outperform others in accuracy with increasing sample sizes.
Performance is consistent regardless of domain size and entropy measure.
Abstract
Entropy estimation plays a significant role in biology, economics, physics, communication engineering and other disciplines. It is increasingly used in software engineering, e.g. in software confidentiality, software testing, predictive analysis, machine learning, and software improvement. However accurate estimation is demonstrably expensive in many contexts, including software. Statisticians have consequently developed biased estimators that aim to accurately estimate entropy on the basis of a sample. In this paper we apply 18 widely employed entropy estimators to Shannon measures useful to the software engineer: entropy, mutual information and conditional mutual information. Moreover, we investigate how the estimators are affected by two main influential factors: sample size and domain size. Our experiments range over a large set of randomly generated joint probability distributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
