Conditional expectation with regularization for missing data imputation
Mai Anh Vu, Thu Nguyen, Tu T. Do, Nhan Phan, Nitesh V. Chawla, P{\aa}l, Halvorsen, Michael A. Riegler, Binh T. Nguyen

TL;DR
This paper introduces DIMV, a scalable, explainable, and effective imputation method for missing data that achieves low RMSE and provides confidence regions, suitable for critical applications like medicine and finance.
Contribution
The paper proposes DIMV, a novel regularized conditional distribution-based imputation algorithm that outperforms state-of-the-art methods in accuracy, scalability, and explainability.
Findings
DIMV achieves lower RMSE than existing methods.
DIMV is fast, scalable, and explainable.
Provides confidence regions for imputed values.
Abstract
Missing data frequently occurs in datasets across various domains, such as medicine, sports, and finance. In many cases, to enable proper and reliable analyses of such data, the missing values are often imputed, and it is necessary that the method used has a low root mean square error (RMSE) between the imputed and the true values. In addition, for some critical applications, it is also often a requirement that the imputation method is scalable and the logic behind the imputation is explainable, which is especially difficult for complex methods that are, for example, based on deep learning. Based on these considerations, we propose a new algorithm named "conditional Distribution-based Imputation of Missing Values with Regularization" (DIMV). DIMV operates by determining the conditional distribution of a feature that has missing entries, using the information from the fully observed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Health, Environment, Cognitive Aging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
