Inference with Imputed Data: The Allure of Making Stuff Up
Charles F. Manski

TL;DR
This paper critically examines the use of random multiple imputation for missing data, highlighting its assumptions, limitations, and the risks of relying on imputed data for inference.
Contribution
It offers a transparent critique of Rubin's RMI approach, analyzing its Bayesian and frequentist foundations and proposing ways to address its potential pitfalls.
Findings
RMI relies on untestable assumptions that may not hold in practice.
Imputation can lead to misleading inferences if assumptions are violated.
The paper suggests strategies to mitigate the risks of making unsupported assumptions.
Abstract
Incomplete observability of data generates an identification problem. There is no panacea for missing data. What one can learn about a population parameter depends on the assumptions one finds credible to maintain. The credibility of assumptions varies with the empirical setting. No specific assumptions can provide a realistic general solution to the problem of inference with missing data. Yet Rubin has promoted random multiple imputation (RMI) as a general way to deal with missing values in public-use data. This recommendation has been influential to empirical researchers who seek a simple fix to the nuisance of missing data. This paper adds to my earlier critiques of imputation. It provides a transparent assessment of the mix of Bayesian and frequentist thinking used by Rubin to argue for RMI. It evaluates random imputation to replace missing outcome or covariate data when the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Survey Methodology and Nonresponse · Census and Population Estimation
