Subset Sampling and Its Extensions
Jinchao Huang, Sibo Wang

TL;DR
This paper introduces efficient data structures for subset sampling problems, including dynamic, I/O-efficient, and range sampling variants, with optimal or near-optimal query and update times.
Contribution
It presents novel dynamic and I/O-efficient data structures for subset sampling, extending to range sampling with improved performance guarantees.
Findings
Dynamic data structure achieves O(1+μ_S) query time.
I/O-efficient algorithm handles large datasets with optimal I/O complexity.
Range subset sampling extension supports range queries with efficient updates.
Abstract
This paper studies the \emph{subset sampling} problem. The input is a set of records together with a function that assigns each record a probability . A query returns a random subset of , where each record is sampled into independently with probability . The goal is to store in a data structure to answer queries efficiently. If fits in memory, the problem is interesting when is dynamic. We develop a dynamic data structure with expected \emph{query} time, space and amortized expected \emph{update}, \emph{insert} and \emph{delete} time, where . The query time and space are optimal. If does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Advanced Image and Video Retrieval Techniques
