Two-Sample Test Based on Classification Probability
Haiyan Cai, Bryan Goggin, Qingtang Jiang

TL;DR
This paper introduces a new nonparametric two-sample test leveraging classification probabilities, demonstrating its effectiveness and efficiency for complex, high-dimensional data through simulations and real-world applications.
Contribution
It proposes a novel two-sample testing method based on classification probabilities, applicable to complex data, and compares its performance with existing tests.
Findings
The proposed test is nonparametric and versatile for high-dimensional data.
It shows higher power and efficiency compared to some existing tests.
Effective in both simulated and real-world datasets.
Abstract
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of classification probabilities from a classifier trained from the samples, a test statistic is proposed. We explain why such a test can be a powerful test and compare its performance in terms of the power and efficiency with those of some other recently proposed tests with simulation and real-life data. The test proposed is nonparametric and can be applied to complex and high dimensional data wherever there is a classifier that provides consistent estimate of the classification probability for such data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Anomaly Detection Techniques and Applications · Advanced Statistical Process Monitoring
