Limiting distributions of graph-based test statistics on sparse and dense graphs
Yejiong Zhu, Hao Chen

TL;DR
This paper develops the theoretical understanding of graph-based two-sample tests, covering a spectrum from sparse to dense graphs, which enhances their applicability and performance in high-dimensional data analysis.
Contribution
It extends the asymptotic theory of graph-based tests to include much denser graphs than previously studied, relaxing earlier strong conditions.
Findings
Theoretical results for test statistics on dense graphs.
Validation of test performance across various graph densities.
Broader applicability of graph-based tests in high-dimensional settings.
Abstract
Two-sample tests utilizing a similarity graph on observations are useful for high-dimensional and non-Euclidean data due to their flexibility and good performance under a wide range of alternatives. Existing works mainly focused on sparse graphs, such as graphs with the number of edges in the order of the number of observations, and their asymptotic results imposed strong conditions on the graph that can easily be violated by commonly constructed graphs they suggested. Moreover, the graph-based tests have better performance with denser graphs under many settings. In this work, we establish the theoretical ground for graph-based tests with graphs ranging from those recommended in current literature to much denser ones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Data-Driven Disease Surveillance
