Succinct Sampling on Streams
Vladimir Braverman, Rafail Ostrovsky, Carlo Zaniolo

TL;DR
This paper introduces succinct algorithms for streaming data sampling that use optimal worst-case memory, enabling efficient solutions for various sliding window problems like frequency moments and triangle counting.
Contribution
It demonstrates the feasibility of succinct sampling algorithms for all variants of streaming models, providing the first worst-case memory guarantees for several problems.
Findings
Succinct sampling algorithms are possible for all streaming variants.
First solutions with provable worst-case memory guarantees for multiple sliding window problems.
Algorithms work with both with and without replacement, and for bursty or one-at-a-time arrivals.
Abstract
A streaming model is one where data items arrive over long period of time, either one item at a time or in bursts. Typical tasks include computing various statistics over a sliding window of some fixed time-horizon. What makes the streaming model interesting is that as the time progresses, old items expire and new ones arrive. One of the simplest and central tasks in this model is sampling. That is, the task of maintaining up to uniformly distributed items from a current time-window as old items expire and new ones arrive. We call sampling algorithms {\bf succinct} if they use provably optimal (up to constant factors) {\bf worst-case} memory to maintain items (either with or without replacement). We stress that in many applications structures that have {\em expected} succinct representation as the time progresses are not sufficient, as small probability events eventually happen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Stream Mining Techniques
