Two-Sample Testing with a Graph-Based Total Variation Integral   Probability Metric

Alden Green; Sivaraman Balakrishnan; Ryan J. Tibshirani

arXiv:2409.15628·math.ST·September 25, 2024

Two-Sample Testing with a Graph-Based Total Variation Integral Probability Metric

Alden Green, Sivaraman Balakrishnan, Ryan J. Tibshirani

PDF

Open Access

TL;DR

This paper introduces a graph-based two-sample test using total variation integral probability metrics, demonstrating its optimality for detecting certain distribution differences with theoretical guarantees and empirical validation.

Contribution

It proposes the graph TV test, a novel nonparametric method that is minimax rate-optimal for TV IPM separated alternatives, outperforming traditional tests like chi-squared.

Findings

01

The graph TV test is minimax rate-optimal for TV IPM separation.

02

It is optimal for detecting spatially localized alternatives.

03

Numerical experiments confirm theoretical results.

Abstract

We consider a novel multivariate nonparametric two-sample testing problem where, under the alternative, distributions $P$ and $Q$ are separated in an integral probability metric over functions of bounded total variation (TV IPM). We propose a new test, the graph TV test, which uses a graph-based approximation to the TV IPM as its test statistic. We show that this test, computed with an $ε$ -neighborhood graph and calibrated by permutation, is minimax rate-optimal for detecting alternatives separated in the TV IPM. As an important special case, we show that this implies the graph TV test is optimal for detecting spatially localized alternatives, whereas the $χ^{2}$ test is provably suboptimal. Our theory is supported with numerical experiments on simulated and real data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems