A simple sketching algorithm for entropy estimation
Peter Clifford (1), Ioana Ada Cosma (2) ((1) University of Oxford,, (2) University of Ottawa)

TL;DR
This paper introduces a simple, efficient sketching algorithm for estimating Shannon entropy in high-frequency data streams, leveraging Renyi entropy and stable distributions to improve accuracy and computational efficiency.
Contribution
The authors develop a novel asymptotically unbiased log-mean estimator for Shannon entropy that works with a single pass and provides strong error bounds, improving practical entropy estimation.
Findings
Estimator has exponentially decreasing tail bounds on error probability.
Achieves asymptotic relative efficiency of 0.932.
Computational complexity is near-optimal.
Abstract
We consider the problem of approximating the empirical Shannon entropy of a high-frequency data stream under the relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure of entropy is the Renyi entropy that depends on a constant alpha. This quantity can be estimated efficiently and unbiasedly from a low-dimensional synopsis called an alpha-stable data sketch via the method of compressed counting. An approximation to the Shannon entropy can be obtained from the Renyi entropy by taking alpha sufficiently close to 1. However, practical guidelines for parameter calibration with respect to alpha are lacking. We avoid this problem by showing that the random variables used in estimating the Renyi entropy can be transformed to have a proper distributional limit as alpha approaches 1: the maximally skewed, strictly stable distribution with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
