Two-sample Testing with Block-wise Missingness in Multi-source Data
Kejian Zhang, Muxuan Liang, Robert Maile, Doudou Zhou

TL;DR
This paper introduces a novel framework and a specific test, BRISE, for two-sample hypothesis testing in multi-source datasets with block-wise missingness, effectively handling complex missing data mechanisms.
Contribution
The paper proposes the Block-Pattern Enhanced Test framework and the BRISE test, which accommodate block-wise missingness and heterogeneous modalities, with theoretical guarantees and practical validation.
Findings
BRISE controls type-I error rate effectively.
BRISE achieves high power in simulations.
The method performs well on real-world datasets.
Abstract
Multi-source and multi-modal datasets are increasingly common in scientific research, yet they often exhibit block-wise missingness, where entire modalities are systematically absent in some sources or no single source contains all modalities. This structured missingness poses major challenges for two-sample hypothesis testing. Standard approaches, such as imputation or complete-case analysis, may introduce bias or suffer efficiency loss, especially under missingness-not-at-random mechanisms. To address this challenge, we propose the Block-Pattern Enhanced Test, a general framework for constructing two-sample testing statistics that explicitly accounts for block-wise missingness. We show that the framework yields valid tests under a new condition allowing for missing-not-at-random mechanism. Building on this general framework, we further propose the Block-wise Rank In Similarity graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
