A weighted edge-count two-sample test for multivariate and object data
Hao Chen, Xu Chen, Yi Su

TL;DR
This paper introduces a weighted edge-count two-sample test for multivariate and object data that improves power and handles unequal sample sizes, applicable to various data types with a suitable similarity measure.
Contribution
It proposes a novel weighted graph-based nonparametric test that addresses sample size imbalance and demonstrates enhanced performance over existing methods.
Findings
Substantial power gains demonstrated in simulations
Asymptotic null distribution derived and validated
Effective application shown on real network data
Abstract
Two-sample tests for multivariate data and non-Euclidean data are widely used in many fields. Parametric tests are mostly restrained to certain types of data that meets the assumptions of the parametric models. In this paper, we study a nonparametric testing procedure that utilizes graphs representing the similarity among observations. It can be applied to any data types as long as an informative similarity measure on the sample space can be defined. The classic test based on a similarity graph has a problem when the two sample sizes are different. We solve the problem by applying appropriate weights to different components of the classic test statistic. The new test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
