Nonparametric imputation by data depth
Pavlo Mozharovskyi, Julie Josse, Francois Husson

TL;DR
This paper introduces a distribution-free, data topology-aware single and multiple imputation method using data depth, which improves imputation accuracy and robustness over traditional techniques, especially for elliptical data distributions.
Contribution
It proposes a novel nonparametric imputation approach based on data depth, extending to multiple imputation, with connections to existing methods like PCA and regression.
Findings
Performs well in simulations and real data comparisons
Offers robustness and asymptotic properties under elliptical symmetry
Implemented as an R package for practical use
Abstract
We present single imputation method for missing values which borrows the idea of data depth---a measure of centrality defined for an arbitrary point of a space with respect to a probability distribution or data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. For each single iteration, imputation reverts to optimization of quadratic, linear, or quasiconcave functions that are solved analytically by linear programming or the Nelder-Mead method. As it accounts for the underlying data topology, the procedure is distribution free, allows imputation close to the data geometry, can make prediction in situations where local imputation (k-nearest neighbors, random forest) cannot, and has attractive robustness and asymptotic properties under elliptical symmetry. It is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Optimal Experimental Design Methods
