On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting
Ping Li

TL;DR
This paper introduces a practical algorithm for estimating the p-th frequency moment in data streams, significantly reducing sample complexity near p=1 and simplifying entropy estimation.
Contribution
The paper presents an improved algorithm for entropy estimation with near-constant sample complexity, surpassing previous bounds and making the problem more practical.
Findings
Sample complexity is essentially O(1) near p=1
Algorithm outperforms previous O(1/eps^2) bounds
Experiments verify ease of entropy estimation
Abstract
Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near p=1. This is a very large improvement over the previously believed O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of entropy estimation an easy task, as verified by the experiments included in the appendix.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Time Series Analysis and Forecasting · Data Stream Mining Techniques
