Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem
Charles F Manski, Michael Gmeiner, Anat Tamburc

TL;DR
This paper critically examines the use of observed covariates for imputing missing data in conditional prediction, revealing that such imputation methods are inconsistent under MAR and suffer from a shrinkage bias, challenging common practices.
Contribution
The paper provides a theoretical analysis showing that simple imputation of missing covariates in conditional prediction is inconsistent under MAR and introduces the concept of a shrinkage problem affecting these estimates.
Findings
Imputed estimates are not consistent under MAR.
Estimates converge to a biased intermediate point.
Imputation causes a shrinkage bias in conditional prediction.
Abstract
Researchers regularly perform conditional prediction using imputed values of missing data. However, applications of imputation often lack a firm foundation in statistical theory. This paper originated when we were unable to find analysis substantiating claims that imputation of missing data has good frequentist properties when data are missing at random (MAR). We focused on the use of observed covariates to impute missing covariates when estimating conditional means of the form E(y|x, w). Here y is an outcome whose realizations are always observed, x is a covariate whose realizations are always observed, and w is a covariate whose realizations are sometimes unobserved. We examine the probability limit of simple imputation estimates of E(y|x, w) as sample size goes to infinity. We find that these estimates are not consistent when covariate data are MAR. To the contrary, the estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Advanced Causal Inference Techniques
