missForestPredict—Missing data imputation for prediction settings
Elena Albu, Shan Gao, Laure Wynants, Ben Van Calster, Santiago Callegari, Leona Cilar Budler, Leona Cilar Budler, Leona Cilar Budler, Leona Cilar Budler

TL;DR
The missForestPredict package offers a fast and user-friendly method for imputing missing data in prediction models using random forests.
Contribution
It introduces an adaptation of the missForest algorithm tailored for prediction settings with customizable imputation models.
Findings
missForestPredict provides competitive prediction results compared to other imputation methods.
The algorithm achieves convergence with a unified criterion for continuous and categorical variables.
It performs well on both simulated and public datasets with short computation times.
Abstract
Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occur at model development and at prediction time. The missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a convergence criterion, unified for continuous and categorical variables, is met. The imputation models are saved for each variable and iteration and can be applied later to new observations at prediction time. The missForestPredict package offers extended error monitoring, control over variables used in the imputation and custom initialization. This allows users to tailor the imputation to their specific needs. The missForestPredict algorithm is compared to mean/mode imputation, linear regression…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36
Figure 37
Figure 38
Figure 39
Figure 40
Figure 41
Figure 42
Figure 43Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Data Analysis with R · Markov Chains and Monte Carlo Methods
