MissForest - nonparametric missing value imputation for mixed-type data
Daniel J. Stekhoven, Peter B\"uhlmann

TL;DR
MissForest is a nonparametric, random forest-based imputation method capable of handling mixed-type data with complex interactions, outperforming existing methods in accuracy, efficiency, and error estimation.
Contribution
This paper introduces missForest, a novel iterative imputation method that simultaneously handles mixed variable types and estimates imputation error without a test set.
Findings
Outperforms other imputation methods in mixed-type data.
Effectively estimates imputation error using out-of-bag error.
Handles high-dimensional data efficiently.
Abstract
Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a solution to this problem. However, the majority of available imputation methods are restricted to one type of variable only: continuous or categorical. For mixed-type data the different types are usually handled separately. Therefore, these methods ignore possible relations between variable types. We propose a nonparametric method which can cope with different types of variables simultaneously. We compare several state of the art methods for the imputation of missing values. We propose and evaluate an iterative imputation method (missForest) based on a random forest. By averaging over many unpruned classification or regression trees random forest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
