Optimal Identity Testing with High Probability

Ilias Diakonikolas; Themis Gouleakis; John Peebles; Eric Price

arXiv:1708.02728·cs.DS·January 17, 2019·1 cites

Optimal Identity Testing with High Probability

Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price

PDF

Open Access

TL;DR

This paper establishes the optimal sample complexity for identity testing with high confidence, showing that traditional amplification methods are suboptimal and introducing a simple, optimal 'plug-in' tester for uniformity testing.

Contribution

The paper proves the optimal sample complexity bounds for identity testing with small delta and introduces a simple, optimal uniformity tester based on thresholding the TV distance.

Findings

01

Optimal sample complexity formula for identity testing with high confidence.

02

Black-box amplification is suboptimal for small delta.

03

A simple 'plug-in' estimator is optimal for uniformity testing.

Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$ , and parameters $0 < ϵ, δ < 1$ , we wish to distinguish, {\em with probability at least $1 - δ$ }, whether the distributions are identical versus $ε$ -far in total variation distance. Most prior work focused on the case that $δ = Ω (1)$ , for which the sample complexity of identity testing is known to be $Θ (n / ϵ^{2})$ . Given such an algorithm, one can achieve arbitrarily small values of $δ$ via black-box amplification, which multiplies the required number of samples by $Θ (lo g (1/ δ))$ . We show that black-box amplification is suboptimal for any $δ = o (1)$ , and give a new identity tester that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs