# Optimal lower bounds for universal relation, and for samplers and   finding duplicates in streams

**Authors:** Michael Kapralov, Jelani Nelson, Jakub Pachocki, Zhengyu Wang, David, P. Woodruff, Mobin Yahyazadeh

arXiv: 1704.00633 · 2017-04-04

## TL;DR

This paper establishes tight lower bounds for the universal relation problem in communication complexity, leading to optimal bounds for sampling and duplicate detection in data streams, using novel proof techniques involving encoding and reductions.

## Contribution

It provides the exact randomized one-way communication complexity of the universal relation problem and introduces two innovative proofs, including a new reduction from Augmented Indexing.

## Key findings

- Lower bounds match upper bounds for the problem.
- Optimal bounds for $	ext{ell}_p$-sampling in turnstile streams.
- Efficient duplicate detection in streaming models.

## Abstract

In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x, y \in\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n,\log(1/\delta)\log^2(\frac n{\log(1/\delta)})\})$ for failure probability $\delta$. Our lower bound holds even if promised $\mathop{support}(y)\subset \mathop{support}(x)$. As a corollary, we obtain optimal lower bounds for $\ell_p$-sampling in strict turnstile streams for $0\le p < 2$, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if promised $x\in\{0,1\}^n$ at all points in the stream.   We give two different proofs of our main result. The first proof demonstrates that any algorithm $\mathcal A$ solving sampling problems in turnstile streams in low memory can be used to encode subsets of $[n]$ of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to $\mathcal A$ throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with $\mathcal A$, which is loosely motivated by techniques in differential privacy. Our second proof is via a novel randomized reduction from Augmented Indexing [MNSW98] which needs to interact with $\mathcal A$ adaptively. To handle the adaptivity we identify certain likely interaction patterns and union bound over them to guarantee correct interaction on all of them. To guarantee correctness, it is important that the interaction hides some of its randomness from $\mathcal A$ in the reduction.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.00633/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1704.00633/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1704.00633/full.md

---
Source: https://tomesphere.com/paper/1704.00633