A fast and effective kernel two-sample test for large-scale data
Hoseung Song, Hao Chen

TL;DR
This paper introduces a new kernel two-sample test that is fast, powerful across various alternatives, and robust to high-dimensional data, addressing computational and effectiveness limitations of existing methods.
Contribution
The paper proposes a novel kernel two-sample test that is computationally efficient, effective in high-dimensional settings, and does not rely on data splitting for parameter tuning.
Findings
Performs well on synthetic data
Effective on real-world large-scale data
More robust to high dimensions than existing methods
Abstract
Kernel two-sample tests have been widely used, and the development of efficient methods for high-dimensional, large-scale data is receiving increasing attention in the big data era. However, existing methods, such as the maximum mean discrepancy (MMD) and recently proposed kernel-based tests for large-scale data, are computationally intensive and/or ineffective for some common alternatives in high-dimensional data. In this paper, we propose a new test that exhibits high power across a wide range of alternatives. Furthermore, the new test is more robust to high dimensions than existing methods and does not require optimization procedures for choosing kernel bandwidth and other parameters through data splitting. Numerical studies demonstrate that the new approach performs well on both synthetic and real-world data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Algorithms
MethodsTest
