An asymptotically optimal, online algorithm for weighted random sampling with replacement
Micha{\l} Startek

TL;DR
This paper introduces an online, memory-efficient algorithm for weighted random sampling with replacement, capable of handling large or streaming data efficiently and supporting mass-sampling from arbitrary discrete distributions.
Contribution
It presents a novel, asymptotically optimal algorithm for weighted sampling with replacement that operates online with constant memory and linear time complexity.
Findings
Operates in O(n) time even when sample size exceeds population
Requires constant additional memory
Supports mass-sampling from any discrete distribution
Abstract
This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant additional memory, and works in O(n) time (even when s >> n, in which case the algorithm produces a list containing, for every population member, the number of times it has been selected for sample). The algorithm works online, and as such is well-suited to processing streams. In addition, a novel method of mass-sampling from any discrete distribution using the algorithm is presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Optimization and Search Problems · Mobile Crowdsensing and Crowdsourcing
