A Robust Framework for Graph-based Two-Sample Tests Using Weights

Yichuan Bai; Lynna Chu

arXiv:2307.12325·stat.ME·June 23, 2025

A Robust Framework for Graph-based Two-Sample Tests Using Weights

Yichuan Bai, Lynna Chu

PDF

Open Access

TL;DR

This paper introduces a robust, graph-based two-sample testing framework that improves reliability and power in high-dimensional data analysis by employing an edge-weighting strategy to mitigate problematic graph structures.

Contribution

It proposes new, robust test statistics for graph-based two-sample tests that are less sensitive to problematic graph structures like hubs, enhancing performance.

Findings

01

Improved test power in high-dimensional settings.

02

Robustness to problematic graph structures demonstrated.

03

Effective in real-world data analysis, e.g., Chicago taxi trips.

Abstract

Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The test statistics are constructed from similarity graphs (such as K-minimum spanning tree), and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures (for example, hubs), as is common for high-dimensional data, this can result in low power and unstable performance among existing graph-based tests. We address this challenge by proposing new test statistics that are robust to problematic structures of the graph and can provide reliable inferences. We employ an edge-weighting strategy using intrinsic characteristics of the graph that are computationally simple and efficient to obtain. The limiting null distribution of the robust test statistics is derived and shown to work well for finite sample sizes. Simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Statistical Methods and Inference · Complex Network Analysis Techniques