A Bipartite Ranking Approach to the Two-Sample Problem

Stephan Cl\'emen\c{c}on; Myrto Limnios; Nicolas Vayatis

arXiv:2302.03592·math.ST·February 9, 2023

A Bipartite Ranking Approach to the Two-Sample Problem

Stephan Cl\'emen\c{c}on, Myrto Limnios, Nicolas Vayatis

PDF

Open Access

TL;DR

This paper introduces a novel bipartite ranking method for the two-sample problem that effectively detects distribution differences in high-dimensional data by learning a scoring function, overcoming the curse of dimensionality.

Contribution

It develops a new approach combining ranking algorithms and rank tests, extending univariate methods to multivariate data without severe dimensionality issues.

Findings

01

Method outperforms existing techniques in high-dimensional settings.

02

Provides nonasymptotic error bounds for the proposed test.

03

Experimental results demonstrate superior detection power.

Abstract

The two-sample problem, which consists in testing whether independent samples on $R^{d}$ are drawn from the same (unknown) distribution, finds applications in many areas. Its study in high-dimension is the subject of much attention, especially because the information acquisition processes at work in the Big Data era often involve various sources, poorly controlled, leading to datasets possibly exhibiting a strong sampling bias. While classic methods relying on the computation of a discrepancy measure between the empirical distributions face the curse of dimensionality, we develop an alternative approach based on statistical learning and extending rank tests, capable of detecting small departures from the null assumption in the univariate case when appropriately designed. Overcoming the lack of natural order on $R^{d}$ when $d \geq 2$ , it is implemented in two steps.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Process Monitoring · Advanced Statistical Methods and Models · Data Quality and Management