Advanced Tutorial: Label-Efficient Two-Sample Tests

Weizhi Li; Visar Berisha; Gautam Dasarathy

arXiv:2501.03568·cs.LG·January 8, 2025

Advanced Tutorial: Label-Efficient Two-Sample Tests

Weizhi Li, Visar Berisha, Gautam Dasarathy

PDF

Open Access

TL;DR

This paper introduces a label-efficient approach to two-sample testing that reduces the need for costly sample labels while maintaining statistical validity and power, extending active learning concepts to hypothesis testing.

Contribution

It extends active learning techniques to two-sample testing in label-costly scenarios, providing a practical and statistically valid method for high-dimensional data.

Findings

01

Maintains statistical validity in label-efficient two-sample tests

02

Achieves high testing power with fewer labeled samples

03

Applicable to high-dimensional data scenarios

Abstract

Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical distributions. This test is widely used, such as by clinical researchers comparing treatment effectiveness. This tutorial explores two-sample testing in a context where an analyst has many features from two samples, but determining the sample membership (or labels) of these features is costly. In machine learning, a similar scenario is studied in active learning. This tutorial extends active learning concepts to two-sample testing within this \textit{label-costly} setting while maintaining statistical validity and high testing power. Additionally, the tutorial discusses practical applications of these label-efficient two-sample tests.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPharmacy and Medical Practices · Analytical Methods in Pharmaceuticals