PKLM: A flexible MCAR test using Classification

Meta-Lina Spohn; Jeffrey N\"af; Loris Michel; Nicolai; Meinshausen

arXiv:2109.10150·stat.ME·December 1, 2022·1 cites

PKLM: A flexible MCAR test using Classification

Meta-Lina Spohn, Jeffrey N\"af, Loris Michel, Nicolai, Meinshausen

PDF

Open Access 1 Repo

TL;DR

The paper introduces PKLM, a non-parametric, flexible test for the MCAR assumption that uses random projections and Kullback-Leibler divergence with Random Forests, applicable to high-dimensional and mixed data.

Contribution

It presents a novel, powerful, and easy-to-use MCAR test leveraging random projections and machine learning, with guaranteed finite-sample level control.

Findings

01

Consistently high power across simulated and real data

02

Maintains correct type-I error rates

03

Applicable to high-dimensional and mixed data types

Abstract

We develop a fully non-parametric, easy-to-use, and powerful test for the missing completely at random (MCAR) assumption on the missingness mechanism of a dataset. The test compares distributions of different missing patterns on random projections in the variable space of the data. The distributional differences are measured with the Kullback-Leibler Divergence, using probability Random Forests. We thus refer to it as "Projected Kullback-Leibler MCAR" (PKLM) test. The use of random projections makes it applicable even if very few or no fully observed observations are available or if the number of dimensions is large. An efficient permutation approach guarantees the level for any finite sample size, resolving a major shortcoming of most other available tests. Moreover, the test can be used on both discrete and continuous data. We show empirically on a range of simulated data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

missvalteam/pklmtest
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Neural Networks and Applications