A Bipartite Ranking Approach to the Two-Sample Problem
Stephan Cl\'emen\c{c}on, Myrto Limnios, Nicolas Vayatis

TL;DR
This paper introduces a novel bipartite ranking method for the two-sample problem that effectively detects distribution differences in high-dimensional data by learning a scoring function, overcoming the curse of dimensionality.
Contribution
It develops a new approach combining ranking algorithms and rank tests, extending univariate methods to multivariate data without severe dimensionality issues.
Findings
Method outperforms existing techniques in high-dimensional settings.
Provides nonasymptotic error bounds for the proposed test.
Experimental results demonstrate superior detection power.
Abstract
The two-sample problem, which consists in testing whether independent samples on are drawn from the same (unknown) distribution, finds applications in many areas. Its study in high-dimension is the subject of much attention, especially because the information acquisition processes at work in the Big Data era often involve various sources, poorly controlled, leading to datasets possibly exhibiting a strong sampling bias. While classic methods relying on the computation of a discrepancy measure between the empirical distributions face the curse of dimensionality, we develop an alternative approach based on statistical learning and extending rank tests, capable of detecting small departures from the null assumption in the univariate case when appropriately designed. Overcoming the lack of natural order on when , it is implemented in two steps.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Advanced Statistical Methods and Models · Data Quality and Management
