Two-Sample Testing with Missing Data via Energy Distance: Weighting and Imputation Approaches
Danijel G. Aleksi\'c, Bojana Milo\v{s}evi\'c

TL;DR
This paper develops and compares weighted and imputation-based energy distance two-sample tests for data with missing values, providing theoretical null distribution derivations and practical resampling methods.
Contribution
It introduces a modified energy distance test that effectively incorporates all available data with weighting and imputation, along with new bootstrap procedures for accurate p-value estimation.
Findings
Weighted and imputation methods control type I error well.
Imputation-based bootstrap improves p-value accuracy.
Method performance varies with missingness mechanisms and sample sizes.
Abstract
In this paper, we address the problem of two-sample testing in the presence of missing data under a variety of missingness mechanisms. Our focus is on the well-known energy distance-based two-sample test. In addition to the standard complete-case approach, we propose a modification of the test statistic that incorporates all available data, utilizing appropriate weights. The asymptotic null distribution of the test statistic is derived and two resampling procedures for approximating the corresponding p-values are proposed. We also propose a new bootstrap method specifically designed for a test statistic based on samples completed via common imputation methods. Through an extensive simulation study, we compare all methods in terms of type I error control and statistical power across a set of sample sizes, dimensions, distributions, missingness mechanisms, and missingness rates. Based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
