To BEE or not to BEE: Estimating more than Entropy with Biased Entropy   Estimators

Ilaria Pia la Torre; David A. Kelly; Hector D. Menendez; David Clark

arXiv:2501.11395·cs.IT·January 22, 2025

To BEE or not to BEE: Estimating more than Entropy with Biased Entropy Estimators

Ilaria Pia la Torre, David A. Kelly, Hector D. Menendez, David Clark

PDF

Open Access

TL;DR

This paper evaluates 18 biased entropy estimators across various measures and distributions, identifying the Chao-Shen and Chao-Wang-Jost estimators as the most accurate and efficient for software engineering applications.

Contribution

It provides a comprehensive empirical comparison of entropy estimators, highlighting the superior performance of Chao-Shen and Chao-Wang-Jost estimators across different conditions.

Findings

01

Chao-Shen and Chao-Wang-Jost estimators converge faster to true entropy.

02

These estimators outperform others in accuracy with increasing sample sizes.

03

Performance is consistent regardless of domain size and entropy measure.

Abstract

Entropy estimation plays a significant role in biology, economics, physics, communication engineering and other disciplines. It is increasingly used in software engineering, e.g. in software confidentiality, software testing, predictive analysis, machine learning, and software improvement. However accurate estimation is demonstrably expensive in many contexts, including software. Statisticians have consequently developed biased estimators that aim to accurately estimate entropy on the basis of a sample. In this paper we apply 18 widely employed entropy estimators to Shannon measures useful to the software engineer: entropy, mutual information and conditional mutual information. Moreover, we investigate how the estimators are affected by two main influential factors: sample size and domain size. Our experiments range over a large set of randomly generated joint probability distributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference