Perfect $L_p$ Sampling in a Data Stream

Rajesh Jayaram; David P. Woodruff

arXiv:1808.05497·cs.DS·November 12, 2019

Perfect $L_p$ Sampling in a Data Stream

Rajesh Jayaram, David P. Woodruff

PDF

TL;DR

This paper presents the first perfect $L_p$ sampling algorithm in a data stream for $p eq 2$, achieving optimal space complexity and derandomization, resolving longstanding open questions in streaming algorithms.

Contribution

It introduces a perfect $L_p$ sampler with optimal space complexity for $p eq 2$, and provides a general derandomization method for linear sketches.

Findings

01

Achieves $O( ext{log}^2 n ext{log} rac{1}{ ext{delta}})$ bits space for $p eq 2$

02

Matches prior bounds for $p=2$ without dependence on $ u$

03

Can be derandomized with only a $( ext{log} ext{log} n)^2$ space blow-up

Abstract

In this paper, we resolve the one-pass space complexity of $L_{p}$ sampling for $p \in (0, 2)$ . Given a stream of updates (insertions and deletions) to the coordinates of an underlying vector $f \in R^{n}$ , a perfect $L_{p}$ sampler must output an index $i$ with probability $∣ f_{i} ∣^{p} /∥ f ∥_{p}^{p}$ , and is allowed to fail with some probability $δ$ . So far, for $p > 0$ no algorithm has been shown to solve the problem exactly using $poly (lo g n)$ -bits of space. In 2010, Monemizadeh and Woodruff introduced an approximate $L_{p}$ sampler, which outputs $i$ with probability $(1 \pm ν) ∣ f_{i} ∣^{p} /∥ f ∥_{p}^{p}$ , using space polynomial in $ν^{- 1}$ and $lo g (n)$ . The space complexity was later reduced by Jowhari, Sa\u{g}lam, and Tardos to roughly $O (ν^{- p} lo g^{2} n lo g δ^{- 1})$ for $p \in (0, 2)$ , which tightly matches the $Ω (lo g^{2} n lo g δ^{- 1})$ lower bound in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.