$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness
Honghao Lin, Zhao Song, David P. Woodruff, Shenghao Xie, Samson Zhou

TL;DR
This paper develops optimal distributed algorithms for perfect $L_p$ sampling and robust $F_p$ moment estimation, enabling efficient, adversarially-robust data stream analysis across multiple servers.
Contribution
It introduces the first optimal algorithms for perfect $L_p$ sampling in distributed streams for all $p \,\geq 1$, and applies these to achieve adversarially-robust distributed monitoring protocols.
Findings
Optimal communication complexity for perfect $L_p$ sampling for all $p\geq 1$.
Robust $F_p$ moment estimation algorithms matching lower bounds.
Near-optimal adversarially-robust protocols for counting, heavy hitters, and distinct elements.
Abstract
In the distributed monitoring model, a data stream over a universe of size is distributed over servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central coordinator. In such settings, the ability to efficiently collect a random sample from the global stream is a powerful primitive, enabling a wide array of downstream tasks such as estimating frequency moments, detecting heavy hitters, or performing sparse recovery. Of particular interest is the task of producing a perfect sample, which given a frequency vector , outputs an index with probability . In this paper, we resolve the problem of perfect sampling for all in the distributed monitoring model. Specifically, our algorithm runs in $k^{p-1} \cdot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
