Truly Perfect Samplers for Data Streams and Sliding Windows
Rajesh Jayaram, David P. Woodruff, Samson Zhou

TL;DR
This paper introduces the concept of truly perfect data stream samplers that produce exact probability distributions, addressing limitations of previous approximate methods and exploring their complexity in streaming and sliding window models.
Contribution
It initiates the study of truly perfect samplers with exact distributions in data streams and analyzes their complexity in various streaming models.
Findings
First truly perfect samplers for data streams are proposed.
Analysis of the space complexity for perfect sampling algorithms.
Comparison with approximate sampling methods shows advantages in privacy and accuracy.
Abstract
In the -sampling problem, the goal is to output an index of a vector , such that for all coordinates , \[\textbf{Pr}[i=j] = (1 \pm \epsilon) \frac{G(f_j)}{\sum_{k\in[n]} G(f_k)} + \gamma,\] where is some non-negative function. If and , the sampler is called perfect. In the data stream model, is defined implicitly by a sequence of updates to its coordinates, and the goal is to design such a sampler in small space. Jayaram and Woodruff (FOCS 2018) gave the first perfect samplers in turnstile streams, where , using space for . However, to date all known sampling algorithms are not truly perfect, since their output distribution is only point-wise close to the true distribution. This small error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Data Stream Mining Techniques · Privacy-Preserving Technologies in Data
