Streaming Lower Bounds and Asymmetric Set-Disjointness
Shachar Lovett, Jiapeng Zhang

TL;DR
This paper establishes near-optimal space lower bounds for frequency estimation in random-order data streams by extending communication complexity techniques, specifically for the needle problem and asymmetric set-disjointness.
Contribution
It introduces new lower bounds for the needle problem in stochastic streams and develops techniques for asymmetric multi-party set-disjointness, closing gaps in existing bounds.
Findings
Lower bounds match upper bounds up to logarithmic factors.
New techniques for sampling needles in planted models.
Extended communication complexity methods to asymmetric settings.
Abstract
Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the space complexity of the algorithm, when the data streams are adversarial. However, in the case where the data stream is given in a random order, or is stochastic, only weaker lower bounds exist. In this work we close this gap, up to logarithmic factors. In order to do so we consider the needle problem, which is a natural hard problem for frequency estimation studied in (Andoni et al. 2008, Crouch et al. 2016). Here, the goal is to distinguish between two distributions over data streams with samples. The first is uniform over a large enough domain. The second is a planted model; a secret ''needle'' is uniformly chosen, and then each element in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Complexity and Algorithms in Graphs · Privacy-Preserving Technologies in Data
