# missForestPredict—Missing data imputation for prediction settings

**Authors:** Elena Albu, Shan Gao, Laure Wynants, Ben Van Calster, Santiago Callegari, Leona Cilar Budler, Leona Cilar Budler, Leona Cilar Budler, Leona Cilar Budler

PMC · DOI: 10.1371/journal.pone.0334125 · 2025-11-07

## TL;DR

The missForestPredict package offers a fast and user-friendly method for imputing missing data in prediction models using random forests.

## Contribution

It introduces an adaptation of the missForest algorithm tailored for prediction settings with customizable imputation models.

## Key findings

- missForestPredict provides competitive prediction results compared to other imputation methods.
- The algorithm achieves convergence with a unified criterion for continuous and categorical variables.
- It performs well on both simulated and public datasets with short computation times.

## Abstract

Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occur at model development and at prediction time. The missForestPredict R package proposes an adaptation of the missForest imputation algorithm that is fast, user-friendly and tailored for prediction settings. The algorithm iteratively imputes variables using random forests until a convergence criterion, unified for continuous and categorical variables, is met. The imputation models are saved for each variable and iteration and can be applied later to new observations at prediction time. The missForestPredict package offers extended error monitoring, control over variables used in the imputation and custom initialization. This allows users to tailor the imputation to their specific needs. The missForestPredict algorithm is compared to mean/mode imputation, linear regression imputation, mice, k-nearest neighbours, bagging, miceRanger and IterativeImputer on eight simulated datasets with simulated missingness (48 scenarios) and eight large public datasets using different prediction models. missForestPredict provides competitive results in prediction settings within short computation times.

## Full-text entities

- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Figures

43 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12594382/full.md

---
Source: https://tomesphere.com/paper/PMC12594382