Prediction with Missing Data: Target Probabilities and Missingness Mechanisms
Pierre Catoire, Robin Genuer, Cecile Proust-Lima

TL;DR
This paper introduces a new framework for understanding missing data in prediction tasks, showing that certain biased methods can be optimal under broader missingness conditions than previously thought.
Contribution
It proposes a formal framework distinguishing prediction targets based on missingness indicators, revealing conditions where consistent prediction is possible beyond MAR assumptions.
Findings
Both prediction targets can be consistently predicted under weaker conditions than MAR.
Biased methods like pattern sub-modeling can achieve optimal prediction under MNAR.
The framework is demonstrated with simulated data and real-world trauma injury prediction.
Abstract
Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply. However, methods considered biased in inference, such as pattern sub-modelling or unconditional imputation, have been shown to achieve optimal predictive performance under any missingness mechanism, including non-MAR (MNAR). To explain this apparent contradiction, we introduce a new formal framework for describing missingness in prediction. Central to this framework is a distinction between two prediction targets, defined according to whether or not the indicator of observation of the predictors is exploited to predict the outcome. This distinction leads to a classification of the missingness mechanisms describing the conditions under which these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTrauma and Emergency Care Studies · Statistical Methods and Bayesian Inference · Sepsis Diagnosis and Treatment
