PKLM: A flexible MCAR test using Classification
Meta-Lina Spohn, Jeffrey N\"af, Loris Michel, Nicolai, Meinshausen

TL;DR
The paper introduces PKLM, a non-parametric, flexible test for the MCAR assumption that uses random projections and Kullback-Leibler divergence with Random Forests, applicable to high-dimensional and mixed data.
Contribution
It presents a novel, powerful, and easy-to-use MCAR test leveraging random projections and machine learning, with guaranteed finite-sample level control.
Findings
Consistently high power across simulated and real data
Maintains correct type-I error rates
Applicable to high-dimensional and mixed data types
Abstract
We develop a fully non-parametric, easy-to-use, and powerful test for the missing completely at random (MCAR) assumption on the missingness mechanism of a dataset. The test compares distributions of different missing patterns on random projections in the variable space of the data. The distributional differences are measured with the Kullback-Leibler Divergence, using probability Random Forests. We thus refer to it as "Projected Kullback-Leibler MCAR" (PKLM) test. The use of random projections makes it applicable even if very few or no fully observed observations are available or if the number of dimensions is large. An efficient permutation approach guarantees the level for any finite sample size, resolving a major shortcoming of most other available tests. Moreover, the test can be used on both discrete and continuous data. We show empirically on a range of simulated data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Neural Networks and Applications
