Revisiting Classifier Two-Sample Tests

David Lopez-Paz; Maxime Oquab

arXiv:1610.06545·stat.ML·March 14, 2018·ICLR·59 cites

Revisiting Classifier Two-Sample Tests

David Lopez-Paz, Maxime Oquab

PDF

Open Access 1 Repo

TL;DR

This paper explores classifier-based two-sample tests (C2ST), demonstrating their theoretical properties, competitive performance, and novel applications in evaluating generative models and causal discovery.

Contribution

It provides a comprehensive analysis of C2ST, compares it with existing methods, and introduces new uses in evaluating generative models and causal inference.

Findings

01

C2ST learns data representations effectively.

02

C2ST has a simple null distribution and interpretable test statistics.

03

C2ST performs competitively against state-of-the-art two-sample tests.

Abstract

The goal of two-sample tests is to assess whether two samples, $S_{P} \sim P^{n}$ and $S_{Q} \sim Q^{m}$ , are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary classifiers. In particular, construct a dataset by pairing the $n$ examples in $S_{P}$ with a positive label, and by pairing the $m$ examples in $S_{Q}$ with a negative label. If the null hypothesis " $P = Q$ " is true, then the classification accuracy of a binary classifier on a held-out subset of this dataset should remain near chance-level. As we will show, such Classifier Two-Sample Tests (C2ST) learn a suitable representation of the data on the fly, return test statistics in interpretable units, have a simple null distribution, and their predictive uncertainty allow to interpret where $P$ and $Q$ differ. The goal of this paper is to establish the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lopezpaz/classifier_tests
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Anomaly Detection Techniques and Applications