A fast and effective kernel two-sample test for large-scale data

Hoseung Song; Hao Chen

arXiv:2110.03118·stat.ME·October 3, 2025

A fast and effective kernel two-sample test for large-scale data

Hoseung Song, Hao Chen

PDF

Open Access

TL;DR

This paper introduces a new kernel two-sample test that is fast, powerful across various alternatives, and robust to high-dimensional data, addressing computational and effectiveness limitations of existing methods.

Contribution

The paper proposes a novel kernel two-sample test that is computationally efficient, effective in high-dimensional settings, and does not rely on data splitting for parameter tuning.

Findings

01

Performs well on synthetic data

02

Effective on real-world large-scale data

03

More robust to high dimensions than existing methods

Abstract

Kernel two-sample tests have been widely used, and the development of efficient methods for high-dimensional, large-scale data is receiving increasing attention in the big data era. However, existing methods, such as the maximum mean discrepancy (MMD) and recently proposed kernel-based tests for large-scale data, are computationally intensive and/or ineffective for some common alternatives in high-dimensional data. In this paper, we propose a new test that exhibits high power across a wide range of alternatives. Furthermore, the new test is more robust to high dimensions than existing methods and does not require optimization procedures for choosing kernel bandwidth and other parameters through data splitting. Numerical studies demonstrate that the new approach performs well on both synthetic and real-world data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Algorithms

MethodsTest