Active Sequential Two-Sample Testing
Weizhi Li, Prad Kadambi, Pouria Saidi, Karthikeyan Natesan Ramamurthy,, Gautam Dasarathy, Visar Berisha

TL;DR
This paper introduces an active sequential two-sample testing framework that adaptively queries features to efficiently determine if two distributions are identical, especially when labels are costly to obtain.
Contribution
The paper presents the first active sequential testing method that adaptively selects features to improve testing power under label cost constraints.
Findings
Framework produces anytime-valid p-values.
Testing power increases significantly with active querying.
Type I error remains controlled.
Abstract
A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample features) are inexpensive to access, but their group memberships (or labels) are costly. To address the problem, we devise the first \emph{active sequential two-sample testing framework} that not only sequentially but also \emph{actively queries}. Our test statistic is a likelihood ratio where one likelihood is found by maximization over all class priors, and the other is provided by a probabilistic classification model. The classification model is adaptively updated and used to predict where the (unlabelled) features have a high dependency on labels; labeling the ``high-dependency'' features leads to the increased power of the proposed testing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · SARS-CoV-2 detection and testing · Mobile Crowdsensing and Crowdsourcing
MethodsTest
