Active Nonparametric Two-Sample Testing by Betting on Heterogeneous Data Sources
Chia-Yu Hsu, Shubhanshu Shekhar

TL;DR
This paper introduces an adaptive nonparametric two-sample testing method that efficiently distinguishes whether data from multiple heterogeneous sources originate from the same distribution, using a betting framework to optimize sampling and testing.
Contribution
It proposes a novel active nonparametric testing procedure combining adaptive source selection with a betting-based framework, applicable under minimal distributional assumptions.
Findings
Controls type-I error at a preset level.
Achieves power-one under the alternative hypothesis.
Provides bounds on expected sample size.
Abstract
We study the problem of active nonparametric sequential two-sample testing over multiple heterogeneous data sources. In each time slot, a decision-maker adaptively selects one of data sources and receives a paired sample generated from that source for testing. The goal is to decide as quickly as possible whether the pairs are generated from the same distribution or not. The gain achieved by such adaptive sampling (in terms of smaller expected stopping time or larger error exponents) has been well-characterized for parametric models via Chernoff's adaptive MLE selection rule [1]. However, analogous results are not known for the case of nonparametric problems, such as two-sample testing, where we place no restrictions on the distributions. Our main contribution is a general active nonparametric testing procedure that combines an adaptive source-selecting strategy within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research
