Kernel Two-Sample Tests for Manifold Data
Xiuyuan Cheng, Yao Xie

TL;DR
This paper analyzes the effectiveness of kernel-based two-sample tests for high-dimensional data lying on low-dimensional manifolds, providing theoretical guarantees and practical validation for their power and robustness.
Contribution
The study characterizes the test's power and level for manifold data, extending analysis to noisy and boundary cases, demonstrating no curse of dimensionality for low-dimensional manifolds.
Findings
Test power depends on manifold dimension, sample size, and density divergence.
Kernel two-sample test effectively detects differences on low-dimensional manifolds.
Theoretical guarantees show robustness to noise and boundary effects.
Abstract
We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, when data densities and are supported on a -dimensional sub-manifold embedded in an -dimensional space and are H\"older with order (up to 2) on , we prove a guarantee of the test power for finite sample size that exceeds a threshold depending on , , and the squared -divergence between and on the manifold, and with a properly chosen kernel bandwidth . For small density departures, we show that with large they can be detected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
