# Optimal Random Sampling from Distributed Streams Revisited

**Authors:** Srikanta Tirthapura, David P. Woodruff

arXiv: 1903.12065 · 2019-03-29

## TL;DR

This paper presents an improved algorithm for distributed random sampling that reduces communication and computation costs, achieving optimal message complexity and also enhancing heavy hitter detection across multiple sites.

## Contribution

It introduces a new algorithm for distributed sampling that improves efficiency and provides a matching lower bound, also advancing heavy hitter detection methods.

## Key findings

- Reduced total messages sent compared to prior algorithms
- Achieved asymptotic optimality in message complexity
- Enhanced heavy hitter detection across distributed sites

## Abstract

We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.12065/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1903.12065/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1903.12065/full.md

---
Source: https://tomesphere.com/paper/1903.12065