On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets
Luke Oluwaseye Joel, Wesley Doorsamy, Babu Sena Paul

TL;DR
This study compares seven imputation techniques on healthcare datasets with missing values, evaluating their accuracy and impact on feature selection, and finds Missforest and MICE to be the most effective methods.
Contribution
It provides a comprehensive comparison of multiple imputation methods on healthcare data and investigates the effect of feature selection order on imputation performance.
Findings
Missforest outperforms other imputation techniques in accuracy.
Performing imputation before feature selection yields better results.
Imputation methods significantly affect downstream machine learning performance.
Abstract
Missing values or data is one popular characteristic of real-world datasets, especially healthcare data. This could be frustrating when using machine learning algorithms on such datasets, simply because most machine learning models perform poorly in the presence of missing values. The aim of this study is to compare the performance of seven imputation techniques, namely Mean imputation, Median Imputation, Last Observation carried Forward (LOCF) imputation, K-Nearest Neighbor (KNN) imputation, Interpolation imputation, Missforest imputation, and Multiple imputation by Chained Equations (MICE), on three healthcare datasets. Some percentage of missing values - 10\%, 15\%, 20\% and 25\% - were introduced into the dataset, and the imputation techniques were employed to impute these missing values. The comparison of their performance was evaluated by using root mean squared error (RMSE) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning in Healthcare · Artificial Intelligence in Healthcare
MethodsFeature Selection
