Two-Sample Test Based on Classification Probability

Haiyan Cai; Bryan Goggin; Qingtang Jiang

arXiv:1909.07836·math.ST·September 18, 2019·Stat. Anal. Data Min.

Two-Sample Test Based on Classification Probability

Haiyan Cai, Bryan Goggin, Qingtang Jiang

PDF

Open Access

TL;DR

This paper introduces a new nonparametric two-sample test leveraging classification probabilities, demonstrating its effectiveness and efficiency for complex, high-dimensional data through simulations and real-world applications.

Contribution

It proposes a novel two-sample testing method based on classification probabilities, applicable to complex data, and compares its performance with existing tests.

Findings

01

The proposed test is nonparametric and versatile for high-dimensional data.

02

It shows higher power and efficiency compared to some existing tests.

03

Effective in both simulated and real-world datasets.

Abstract

Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of classification probabilities from a classifier trained from the samples, a test statistic is proposed. We explain why such a test can be a powerful test and compare its performance in terms of the power and efficiency with those of some other recently proposed tests with simulation and real-life data. The test proposed is nonparametric and can be applied to complex and high dimensional data wherever there is a classifier that provides consistent estimate of the classification probability for such data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Anomaly Detection Techniques and Applications · Advanced Statistical Process Monitoring