A label-efficient two-sample test
Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar, Berisha

TL;DR
This paper introduces a three-stage, label-efficient two-sample testing framework that combines classifier-based modeling, a novel bimodal query scheme, and the Friedman-Rafsky test to effectively distinguish between two distributions with minimal label queries.
Contribution
It proposes a new three-stage framework that reduces label requirements for two-sample testing by integrating classifier modeling, a bimodal query scheme, and classical testing.
Findings
The proposed test controls Type I error effectively.
It achieves lower Type II error compared to uniform and certainty-based querying.
Extensive experiments validate the method's efficiency and accuracy.
Abstract
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed \emph{bimodal query} is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Bayesian Modeling and Causal Inference
Methodstravel james
