Regression with missing Ys: An improved strategy for analyzing multiply imputed data
Paul T. von Hippel

TL;DR
This paper proposes an improved method called MID for analyzing multiply imputed data with missing dependent variables, reducing noise and increasing efficiency compared to standard approaches.
Contribution
The paper introduces the MID strategy, which improves estimation accuracy by excluding imputed Y values from analysis after imputation, addressing issues with problematic imputations.
Findings
MID reduces noise in estimates compared to standard MI.
MID provides more efficient estimates when imputations are valid.
MID protects against problematic imputed Y values.
Abstract
When fitting a generalized linear model -- such as a linear regression, a logistic regression, or a hierarchical linear model -- analysts often wonder how to handle missing values of the dependent variable Y. If missing values have been filled in using multiple imputation, the usual advice is to use the imputed Y values in analysis. We show, however, that using imputed Ys can add needless noise to the estimates. Better estimates can usually be obtained using a modified strategy that we call multiple imputation, then deletion (MID). Under MID, all cases are used for imputation, but following imputation cases with imputed Y values are excluded from the analysis. When there is something wrong with the imputed Y values, MID protects the estimates from the problematic imputations. And when the imputed Y values are acceptable, MID usually offers somewhat more efficient estimates than an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
