A Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting
Ping Li

TL;DR
This paper demonstrates that Compressed Counting significantly improves the estimation of entropy-related measures in data streams, especially near , but practical implementation faces challenges due to large sample size requirements.
Contribution
The study empirically validates the effectiveness of Compressed Counting for entropy estimation, highlighting its advantages over traditional methods near , and discusses practical limitations.
Findings
CC dramatically improves entropy estimation as approaches 1.
Symmetric stable random projections require enormous samples for small deviations.
Practical entropy estimation near remains challenging due to sample size constraints.
Abstract
Compressed Counting (CC)} was recently proposed for approximating the th frequency moments of data streams, for . Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections}, especially as . A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measurement and often serves a crucial "feature" for data mining. The R\'enyi entropy and the Tsallis entropy are functions of the th frequency moments; and both approach the Shannon entropy as . A recent theoretical work suggested using the th frequency moment to approximate the Shannon entropy with and very small (e.g., ). In this study, we experiment using CC to estimate frequency moments, R\'enyi entropy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Database Systems and Queries · Data Management and Algorithms
