Advanced Tutorial: Label-Efficient Two-Sample Tests
Weizhi Li, Visar Berisha, Gautam Dasarathy

TL;DR
This paper introduces a label-efficient approach to two-sample testing that reduces the need for costly sample labels while maintaining statistical validity and power, extending active learning concepts to hypothesis testing.
Contribution
It extends active learning techniques to two-sample testing in label-costly scenarios, providing a practical and statistically valid method for high-dimensional data.
Findings
Maintains statistical validity in label-efficient two-sample tests
Achieves high testing power with fewer labeled samples
Applicable to high-dimensional data scenarios
Abstract
Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical distributions. This test is widely used, such as by clinical researchers comparing treatment effectiveness. This tutorial explores two-sample testing in a context where an analyst has many features from two samples, but determining the sample membership (or labels) of these features is costly. In machine learning, a similar scenario is studied in active learning. This tutorial extends active learning concepts to two-sample testing within this \textit{label-costly} setting while maintaining statistical validity and high testing power. Additionally, the tutorial discusses practical applications of these label-efficient two-sample tests.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPharmacy and Medical Practices · Analytical Methods in Pharmaceuticals
