On the Consistency of a Random Forest Algorithm in the Presence of Missing Entries
Irving G\'omez-M\'endez, Emilien Joly

TL;DR
This paper proves the consistency of a random forest algorithm for regression when data contains missing entries, using a partial imputation method suitable for missing completely at random (MCAR) data.
Contribution
It introduces a novel partial imputation technique integrated with random forests and proves its consistency under MCAR missing data conditions.
Findings
Proves the consistency of the random forest estimator with missing data
Develops a partial imputation method compatible with random forests
Ensures reliable regression estimation despite missing entries
Abstract
This paper tackles the problem of constructing a non-parametric predictor when the latent variables are given with incomplete information. The convenient predictor for this task is the random forest algorithm in conjunction to the so-called CART criterion. The proposed technique enables a partial imputation of the missing values in the data set in a way that suits both a consistent estimator of the regression function as well as a partial recovery of the missing values. A proof of the consistency of the random forest estimator is given in the case where each latent variable is missing completely at random (MCAR).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Statistical Methods and Inference
