A new ranking scheme for modern data and its application to two-sample hypothesis testing
Doudou Zhou, Hao Chen

TL;DR
This paper introduces two novel graph-based ranking methods for complex, high-dimensional, and non-Euclidean data, enabling effective two-sample hypothesis testing with controlled error rates and improved power over existing approaches.
Contribution
It proposes two new types of ranks derived from similarity graphs, extending rank-based testing to complex data types and demonstrating their effectiveness in practical applications.
Findings
New ranks based on similarity graphs perform well in high-dimensional data.
The proposed tests control type-I error effectively under mild conditions.
The methods show superior power compared to existing nonparametric tests.
Abstract
Rank-based approaches are among the most popular nonparametric methods for univariate data in tackling statistical problems such as hypothesis testing due to their robustness and effectiveness. However, they are unsatisfactory for more complex data. In the era of big data, high-dimensional and non-Euclidean data, such as networks and images, are ubiquitous and pose challenges for statistical analysis. Existing multivariate ranks such as component-wise, spatial, and depth-based ranks do not apply to non-Euclidean data and have limited performance for high-dimensional data. Instead of dealing with the ranks of observations, we propose two types of ranks applicable to complex data based on a similarity graph constructed on observations: a graph-induced rank defined by the inductive nature of the graph and an overall rank defined by the weight of edges in the graph. To illustrate their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Complex Network Analysis Techniques · Advanced Statistical Methods and Models
