A Complete Characterisation of Structured Missingness

James Jackson; Robin Mitra; Niels Hagenbuch; Sarah McGough; Chris; Harbron

arXiv:2307.02650·stat.ME·July 7, 2023

A Complete Characterisation of Structured Missingness

James Jackson, Robin Mitra, Niels Hagenbuch, Sarah McGough, Chris, Harbron

PDF

Open Access

TL;DR

This paper introduces a comprehensive taxonomy for Structured Missingness (SM) in large datasets, extending existing models to account for dependencies among missingness indicators, and demonstrates its impact on inference and prediction.

Contribution

It develops a new framework for characterizing SM where missingness indicators depend on each other and the data, broadening the traditional MCAR, MAR, MNAR taxonomy.

Findings

01

SM significantly affects inference and prediction accuracy

02

Simulations illustrate the impact of SM on data analysis

03

Application to a clinico-genomic database demonstrates real-world relevance

Abstract

Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically assume that variables' missingness indicator vectors $M_{1}$ , $M_{2}$ , ..., $M_{p}$ are independent after conditioning on the relevant portion of the data matrix $X$ . As this is often unsuitable for characterising SM in multivariate settings, we introduce a taxonomy for SM, where each $M_{j}$ can depend on $M_{- j}$ (i.e., all missingness indicator vectors except $M_{j}$ ), in addition to $X$ . We embed this new framework within the well-established decomposition of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Data Mining Algorithms and Applications · Statistical Methods and Bayesian Inference