A Complete Characterisation of Structured Missingness
James Jackson, Robin Mitra, Niels Hagenbuch, Sarah McGough, Chris, Harbron

TL;DR
This paper introduces a comprehensive taxonomy for Structured Missingness (SM) in large datasets, extending existing models to account for dependencies among missingness indicators, and demonstrates its impact on inference and prediction.
Contribution
It develops a new framework for characterizing SM where missingness indicators depend on each other and the data, broadening the traditional MCAR, MAR, MNAR taxonomy.
Findings
SM significantly affects inference and prediction accuracy
Simulations illustrate the impact of SM on data analysis
Application to a clinico-genomic database demonstrates real-world relevance
Abstract
Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically assume that variables' missingness indicator vectors , , ..., are independent after conditioning on the relevant portion of the data matrix . As this is often unsuitable for characterising SM in multivariate settings, we introduce a taxonomy for SM, where each can depend on (i.e., all missingness indicator vectors except ), in addition to . We embed this new framework within the well-established decomposition of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Mining Algorithms and Applications · Statistical Methods and Bayesian Inference
