Query complexity of heavy hitter estimation

Sahasrajit Sarmasarkar; Kota Srinivas Reddy; and Nikhil Karamchandani

arXiv:2005.14425·cs.IT·February 11, 2021

Query complexity of heavy hitter estimation

Sahasrajit Sarmasarkar, Kota Srinivas Reddy, and Nikhil Karamchandani

PDF

TL;DR

This paper investigates the query complexity for identifying heavy hitters in a distribution using active queries, proposing algorithms and bounds for different query models, including noisy scenarios.

Contribution

It introduces new sequential algorithms and bounds for heavy hitter estimation under two query models, including robustness to noise.

Findings

01

Upper bounds on query complexity for both models.

02

Lower bounds establishing optimality of algorithms.

03

Robust estimators effective under noisy responses.

Abstract

We consider the problem of identifying the subset $S_{P}^{γ}$ of elements in the support of an underlying distribution $P$ whose probability value is larger than a given threshold $γ$ , by actively querying an oracle to gain information about a sequence $X_{1}, X_{2}, \dots$ of $i . i . d .$ samples drawn from $P$ . We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_{i}$ and $(b)$ each query is a pair $(i, j)$ and the oracle gives a binary answer confirming if $X_{i} = X_{j}$ or not. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire history of responses or decide to stop and output an estimate of $S_{P}^{γ}$ , which is required to be correct with some pre-specified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.