On Practical Algorithms for Entropy Estimation and the Improved Sample   Complexity of Compressed Counting

Ping Li

arXiv:1004.3782·cs.DS·March 14, 2015·1 cites

On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting

Ping Li

PDF

Open Access

TL;DR

This paper introduces a practical algorithm for estimating the p-th frequency moment in data streams, significantly reducing sample complexity near p=1 and simplifying entropy estimation.

Contribution

The paper presents an improved algorithm for entropy estimation with near-constant sample complexity, surpassing previous bounds and making the problem more practical.

Findings

01

Sample complexity is essentially O(1) near p=1

02

Algorithm outperforms previous O(1/eps^2) bounds

03

Experiments verify ease of entropy estimation

Abstract

Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near p=1. This is a very large improvement over the previously believed O(1/eps^2) bound. The proposed algorithm makes the long-standing problem of entropy estimation an easy task, as verified by the experiments included in the appendix.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Time Series Analysis and Forecasting · Data Stream Mining Techniques