Regression with Missing Data, a Comparison Study of TechniquesBased on   Random Forests

Irving G\'omez-M\'endez; Emilien Joly

arXiv:2110.09333·math.ST·October 19, 2021

Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Irving G\'omez-M\'endez, Emilien Joly

PDF

Open Access 1 Repo

TL;DR

This paper compares various random forest techniques for handling missing data in regression tasks, introducing a new algorithm that demonstrates practical benefits and analyzing its performance and complexity across different missing data mechanisms.

Contribution

The paper introduces a new random forest algorithm for missing data imputation and provides a comprehensive comparison with existing methods in terms of accuracy and complexity.

Findings

01

The new algorithm performs well across different missing data mechanisms.

02

It shows lower quadratic errors and bias compared to existing methods.

03

The algorithm's complexity is analyzed and found to be practical for real-world use.

Abstract

In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper Gomez-Mendez and Joly (2020) on the consistency ofthis new algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IrvingGomez/RandomForestsSimulations
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications