$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness

Honghao Lin; Zhao Song; David P. Woodruff; Shenghao Xie; Samson Zhou

arXiv:2510.22816·cs.DS·October 28, 2025

$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness

Honghao Lin, Zhao Song, David P. Woodruff, Shenghao Xie, Samson Zhou

PDF

TL;DR

This paper develops optimal distributed algorithms for perfect $L_p$ sampling and robust $F_p$ moment estimation, enabling efficient, adversarially-robust data stream analysis across multiple servers.

Contribution

It introduces the first optimal algorithms for perfect $L_p$ sampling in distributed streams for all $p \,\geq 1$, and applies these to achieve adversarially-robust distributed monitoring protocols.

Findings

01

Optimal communication complexity for perfect $L_p$ sampling for all $p\geq 1$.

02

Robust $F_p$ moment estimation algorithms matching lower bounds.

03

Near-optimal adversarially-robust protocols for counting, heavy hitters, and distinct elements.

Abstract

In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central coordinator. In such settings, the ability to efficiently collect a random sample from the global stream is a powerful primitive, enabling a wide array of downstream tasks such as estimating frequency moments, detecting heavy hitters, or performing sparse recovery. Of particular interest is the task of producing a perfect $L_{p}$ sample, which given a frequency vector $f \in R^{n}$ , outputs an index $i$ with probability $\frac{f _{i}^{p}}{∥ f ∥ _{p}^{p}} + \frac{1}{poly ( n )}$ . In this paper, we resolve the problem of perfect $L_{p}$ sampling for all $p \geq 1$ in the distributed monitoring model. Specifically, our algorithm runs in $k^{p-1} \cdot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.