Two-Sample Testing with Missing Data via Energy Distance: Weighting and Imputation Approaches

Danijel G. Aleksi\'c; Bojana Milo\v{s}evi\'c

arXiv:2508.11421·stat.ME·August 18, 2025

Two-Sample Testing with Missing Data via Energy Distance: Weighting and Imputation Approaches

Danijel G. Aleksi\'c, Bojana Milo\v{s}evi\'c

PDF

TL;DR

This paper develops and compares weighted and imputation-based energy distance two-sample tests for data with missing values, providing theoretical null distribution derivations and practical resampling methods.

Contribution

It introduces a modified energy distance test that effectively incorporates all available data with weighting and imputation, along with new bootstrap procedures for accurate p-value estimation.

Findings

01

Weighted and imputation methods control type I error well.

02

Imputation-based bootstrap improves p-value accuracy.

03

Method performance varies with missingness mechanisms and sample sizes.

Abstract

In this paper, we address the problem of two-sample testing in the presence of missing data under a variety of missingness mechanisms. Our focus is on the well-known energy distance-based two-sample test. In addition to the standard complete-case approach, we propose a modification of the test statistic that incorporates all available data, utilizing appropriate weights. The asymptotic null distribution of the test statistic is derived and two resampling procedures for approximating the corresponding p-values are proposed. We also propose a new bootstrap method specifically designed for a test statistic based on samples completed via common imputation methods. Through an extensive simulation study, we compare all methods in terms of type I error control and statistical power across a set of sample sizes, dimensions, distributions, missingness mechanisms, and missingness rates. Based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.