Unbiased Insights: Optimal Streaming Algorithms for $\ell_p$ Sampling, the Forget Model, and Beyond
Honghao Lin, Hoai-An Nguyen, William Swartworth, David P. Woodruff

TL;DR
This paper introduces space-efficient streaming algorithms for $_p$ sampling and frequency moment estimation, including in models with deletions and continuous sampling, solving open problems and extending to entropy and arbitrary functions.
Contribution
It provides nearly optimal space algorithms for $_p$ sampling and $F_p$ estimation in advanced streaming models, including the forget and suffix-prefix deletion models, with broad applicability.
Findings
Achieves nearly space-optimal $_p$ samplers for $p eq 2$
Develops unbiased estimators for $F_p$ in models with deletions
Extends techniques to entropy estimation and arbitrary functions
Abstract
We study sampling and frequency moment estimation in a single-pass insertion-only data stream. For , we present a nearly space-optimal approximate sampler that uses bits of space and for , we present a sampler with space complexity . This space complexity is optimal for and improves upon prior work by a factor. We further extend our construction to a continuous sampler, which outputs a valid sample index at every point during the stream. Leveraging these samplers, we design nearly unbiased estimators for in data streams that include forget operations, which reset individual element frequencies and introduce significant non-linear challenges. As a result, we obtain near-optimal algorithms for estimating for all in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
