Sampling in Space Restricted Settings
Anup Bhattacharya, Davis Issac, Ragesh Jaiswal, Amit Kumar

TL;DR
This paper investigates space-efficient algorithms for sampling in large data settings, providing new methods for maintaining random samples in streaming and query models with tight bounds on space and accuracy.
Contribution
It introduces space-efficient algorithms for sampling in streaming and query models, achieving near-optimal bounds and extending to weighted elements with approximation guarantees.
Findings
Maintains a random sample in streaming with O(log n) bits of space.
Provides approximate weighted sampling with tight bounds.
Extends sampling techniques to weighted data with error tolerance.
Abstract
Space efficient algorithms play a central role in dealing with large amount of data. In such settings, one would like to analyse the large data using small amount of "working space". One of the key steps in many algorithms for analysing large data is to maintain a (or a small number) random sample from the data points. In this paper, we consider two space restricted settings -- (i) streaming model, where data arrives over time and one can use only a small amount of storage, and (ii) query model, where we can structure the data in low space and answer sampling queries. In this paper, we prove the following results in above two settings: - In the streaming setting, we would like to maintain a random sample from the elements seen so far. We prove that one can maintain a random sample using random bits and space, where is the number of elements seen so far. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Management and Algorithms · Machine Learning and Algorithms
