A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with   Applications to Calibration, Regression Curves, and Simulation-Based   Inference)

Anirban Chatterjee; Ziang Niu; Bhaswar B. Bhattacharya

arXiv:2407.16550·stat.ME·August 30, 2024

A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with Applications to Calibration, Regression Curves, and Simulation-Based Inference)

Anirban Chatterjee, Ziang Niu, Bhaswar B. Bhattacharya

PDF

1 Repo

TL;DR

This paper introduces a kernel-based, nearest-neighbor method for testing differences between two conditional distributions, with applications in calibration, regression, and simulation-based inference, offering consistent, efficient, and versatile statistical tests.

Contribution

It proposes a novel kernel-based measure and a nearly linear time estimator for conditional distribution comparison, with a resampling test that controls Type I error and is applicable to various modern statistical problems.

Findings

01

The method accurately detects differences in conditional distributions in simulations.

02

It effectively assesses neural network calibration on CIFAR-10 data.

03

The approach successfully compares regression functions and validates emulator models.

Abstract

In this paper we introduce a kernel-based measure for detecting differences between two conditional distributions. Using the `kernel trick' and nearest-neighbor graphs, we propose a consistent estimate of this measure which can be computed in nearly linear time (for a fixed number of nearest neighbors). Moreover, when the two conditional distributions are the same, the estimate has a Gaussian limit and its asymptotic variance has a simple form that can be easily estimated from the data. The resulting test attains precise asymptotic level and is universally consistent for detecting differences between two conditional distributions. We also provide a resampling based test using our estimate that applies to the conditional goodness-of-fit problem, which controls Type I error in finite samples and is asymptotically consistent with only a finite number of resamples. A method to de-randomize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anirbanc96/ecmmd-condtwosamp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.