Optimal Algorithms for Augmented Testing of Discrete Distributions

Maryam Aliakbarpour; Piotr Indyk; Ronitt Rubinfeld; Sandeep Silwal

arXiv:2412.00974·cs.LG·December 3, 2024

Optimal Algorithms for Augmented Testing of Discrete Distributions

Maryam Aliakbarpour, Piotr Indyk, Ronitt Rubinfeld, Sandeep Silwal

PDF

Open Access 1 Video

TL;DR

This paper introduces adaptive algorithms for hypothesis testing of discrete distributions that leverage predictive models to reduce sample complexity, achieving optimal bounds and robustness without prior knowledge of prediction accuracy.

Contribution

It presents novel adaptive algorithms that utilize predicted distributions to improve sample efficiency in hypothesis testing, with proven optimality and robustness.

Findings

01

Sample complexity reduction depends on predictor quality

02

Algorithms adaptively self-adjust to prediction accuracy

03

Experimental results outperform worst-case guarantees

Abstract

We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution $p$ , extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictor's quality, measured by its total variation distance from $p$ . A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimal Algorithms for Augmented Testing of Discrete Distributions· slideslive

Taxonomy

TopicsFault Detection and Control Systems · Advanced Statistical Process Monitoring · Control Systems and Identification