Should a Normal Imputation Model Be Modified to Impute Skewed Variables?
Paul T. von Hippel

TL;DR
This paper investigates whether modifying normal imputation models for skewed variables improves bias reduction, finding that such modifications often do not reliably reduce bias and may worsen it, especially for shape-dependent estimands.
Contribution
The study systematically evaluates various modifications to normal imputation models for skewed data, revealing their limited effectiveness and potential drawbacks.
Findings
Bias is mild for means, standard deviations, and regressions.
Bias can be severe for percentiles and skewness.
Modifications often do not reliably reduce bias and can worsen it.
Abstract
Researchers often impute continuous variables under an assumption of normality, yet many incomplete variables are skewed. We find that imputing skewed continuous variables under a normal model can lead to bias; the bias is usually mild for popular estimands such as means, standard deviations, and linear regression coefficients, but the bias can be severe for more shape-dependent estimands such as percentiles or the coefficient of skewness. We test several methods for adapting a normal imputation model to accommodate skewness, including methods that transform, truncate, or censor (round) normally imputed values, as well as methods that impute values from a quadratic or truncated regression. None of these modifications reliably reduces the biases of the normal model, and some modifications can make the biases much worse. We conclude that, if one has to impute a skewed variable under a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
