# Parallel Streaming Random Sampling

**Authors:** Kanat Tangwongsan, Srikanta Tirthapura

arXiv: 1906.04120 · 2019-06-11

## TL;DR

This paper introduces efficient parallel algorithms for random sampling from data streams, applicable to sliding window and infinite window models, with optimal computational, memory, and parallel depth performance.

## Contribution

It develops the first parallel algorithms for minibatch streaming sampling in sliding and infinite window settings, matching sequential efficiency and achieving low parallel depth.

## Key findings

- Work matches fastest sequential algorithms
- Parallel depth is polylogarithmic
- Memory usage is optimal

## Abstract

This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms for minibatch-stream sampling in two settings: (1) sliding window, which draws samples from a prespecified number of most-recently observed elements, and (2) infinite window, which draws samples from all the elements received. Our algorithms are computationally and memory efficient: their work matches the fastest sequential counterpart, their parallel depth is small (polylogarithmic), and their memory usage matches the best known.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.04120/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1906.04120/full.md

---
Source: https://tomesphere.com/paper/1906.04120