AutoML Two-Sample Test
Jonas M. K\"ubler, Vincent Stimper, Simon Buchholz, Krikamol Muandet,, Bernhard Sch\"olkopf

TL;DR
This paper introduces an AutoML approach for two-sample testing that uses a simple mean discrepancy witness function, achieving competitive results without user input across various distribution shift scenarios.
Contribution
It proposes a novel AutoML framework for two-sample testing based on minimizing squared loss to optimize test power, simplifying application and improving performance.
Findings
Achieves competitive performance on distribution shift benchmarks
Uses a simple mean discrepancy witness function for testing
Provides an open-source Python implementation
Abstract
Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Data Stream Mining Techniques
MethodsTest
