Weighted Sampling Without Replacement from Data Streams
Vladimir Braverman, Rafail Ostrovsky, and Gregory Vorsanger

TL;DR
This paper introduces Cascade Sampling, a method that reduces k-sampling without replacement to k-sampling with replacement, ensuring precise results even with finite precision arithmetic in data stream algorithms.
Contribution
It presents a novel reduction technique called Cascade Sampling that addresses finite precision issues in weighted sampling without replacement from data streams.
Findings
Cascade Sampling ensures accurate weighted sampling without replacement with finite precision arithmetic.
The method provides a reliable alternative to previous algorithms that required exact computations.
It extends the applicability of weighted sampling algorithms in practical, finite-precision computing environments.
Abstract
Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis (IPL 2006) presented an algorithm for weighted sampling without replacement from data streams. Their algorithm works under the assumption of precise computations over the interval [0,1]. Cohen and Kaplan (VLDB 2008) used similar methods for their bottom-k sketches. Efraimidis and Spirakis ask as an open question whether using finite precision arithmetic impacts the accuracy of their algorithm. In this paper we show a method to avoid this problem by providing a precise reduction from k-sampling without replacement to k-sampling with replacement. We call the resulting method Cascade Sampling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Stream Mining Techniques · Data Management and Algorithms
