A Kernel Method for the Two-Sample Problem
Arthur Gretton, Karsten Borgwardt, Malte J. Rasch, Bernhard Scholkopf,, Alexander J. Smola

TL;DR
This paper introduces a kernel-based statistical test for comparing two distributions, leveraging RKHS functions to detect differences efficiently and effectively across various data types.
Contribution
It presents a novel two-sample test framework using RKHS, with practical algorithms and applications to complex data like graphs and databases.
Findings
Effective in attribute matching for databases
First tests for distribution comparison over graphs
Performs well with quadratic and linear time algorithms
Abstract
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Bayesian Modeling and Causal Inference · Machine Learning and Algorithms
