A Witness Two-Sample Test
Jonas M. K\"ubler, Wittawat Jitkrittum, Bernhard Sch\"olkopf, Krikamol, Muandet

TL;DR
This paper introduces a new two-sample test based on the Maximum Mean Discrepancy that uses training data to define weights and basis points, improving data efficiency and maintaining control over type-I error.
Contribution
It proposes a novel test that leverages training data for basis and weight selection, enhancing data efficiency and test power compared to existing MMD-based tests.
Findings
The new test is consistent with controlled type-I error.
It achieves comparable or better power than existing tests.
Empirical results on synthetic and real data validate its effectiveness.
Abstract
The Maximum Mean Discrepancy (MMD) has been the state-of-the-art nonparametric test for tackling the two-sample problem. Its statistic is given by the difference in expectations of the witness function, a real-valued function defined as a weighted sum of kernel evaluations on a set of basis points. Typically the kernel is optimized on a training set, and hypothesis testing is performed on a separate test set to avoid overfitting (i.e., control type-I error). That is, the test set is used to simultaneously estimate the expectations and define the basis points, while the training set only serves to select the kernel and is discarded. In this work, we propose to use the training data to also define the weights and the basis points for better data efficiency. We show that 1) the new test is consistent and has a well-controlled type-I error; 2) the optimal witness function is given by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
