Sharp Constants in Uniformity Testing via the Huber Statistic

Shivam Gupta; Eric Price

arXiv:2206.10722·stat.ML·June 23, 2022·1 cites

Sharp Constants in Uniformity Testing via the Huber Statistic

Shivam Gupta, Eric Price

PDF

Open Access

TL;DR

This paper analyzes the constants in uniformity testing, introducing a new Huber-based tester that matches the optimal separation and improves sample complexity estimates.

Contribution

It identifies sharp constants in uniformity testing and proposes a novel Huber loss-based tester with optimal separation and improved sample complexity.

Findings

01

The collisions tester achieves a sharp maximal constant in separation.

02

The Huber-based tester matches this separation and has Gaussian tail behavior.

03

Sample complexity is improved to nearly optimal in dominant regimes.

Abstract

Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $ϵ$ -far distribution with $1 - δ$ probability is $n = Θ (\frac{m l o g ( 1/ δ )}{ϵ ^{2}} + \frac{l o g ( 1/ δ )}{ϵ ^{2}})$ , which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to $0$ or $\infty$ . We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Statistical Methods in Clinical Trials

MethodsHuber loss · Test