Optimal Identity Testing with High Probability
Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price

TL;DR
This paper establishes the optimal sample complexity for identity testing with high confidence, showing that traditional amplification methods are suboptimal and introducing a simple, optimal 'plug-in' tester for uniformity testing.
Contribution
The paper proves the optimal sample complexity bounds for identity testing with small delta and introduces a simple, optimal uniformity tester based on thresholding the TV distance.
Findings
Optimal sample complexity formula for identity testing with high confidence.
Black-box amplification is suboptimal for small delta.
A simple 'plug-in' estimator is optimal for uniformity testing.
Abstract
We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution over elements, an explicitly given distribution , and parameters , we wish to distinguish, {\em with probability at least }, whether the distributions are identical versus -far in total variation distance. Most prior work focused on the case that , for which the sample complexity of identity testing is known to be . Given such an algorithm, one can achieve arbitrarily small values of via black-box amplification, which multiplies the required number of samples by . We show that black-box amplification is suboptimal for any , and give a new identity tester that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs
