Imputation for High-Dimensional Linear Regression
Kabir Aladin Chandrasekher, Ahmed El Alaoui, Andrea Montanari

TL;DR
This paper demonstrates that simple mean imputation combined with standard LASSO methods achieves optimal estimation rates in high-dimensional linear regression with missing data, even when the conditional mean must be estimated from complex models.
Contribution
It shows that straightforward imputation with conditional means, paired with existing procedures like LASSO, retains minimax rates in high-dimensional missing data regression, simplifying previous complex algorithms.
Findings
Imputation with conditional mean preserves minimax rates.
Square-root LASSO remains pivotal under missing data.
Estimators perform well even with approximate conditional means.
Abstract
We study high-dimensional regression with missing entries in the covariates. A common strategy in practice is to \emph{impute} the missing entries with an appropriate substitute and then implement a standard statistical procedure acting as if the covariates were fully observed. Recent literature on this subject proposes instead to design a specific, often complicated or non-convex, algorithm tailored to the case of missing covariates. We investigate a simpler approach where we fill-in the missing entries with their conditional mean given the observed covariates. We show that this imputation scheme coupled with standard off-the-shelf procedures such as the LASSO and square-root LASSO retains the minimax estimation rate in the random-design setting where the covariates are i.i.d.\ sub-Gaussian. We further show that the square-root LASSO remains \emph{pivotal} in this setting. It is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Statistical Methods and Bayesian Inference
MethodsLinear Regression
