Missing Features Reconstruction and Its Impact on Classification   Accuracy

Magda Friedjungov\'a; Daniel Va\v{s}ata; Marcel Ji\v{r}ina

arXiv:1911.03658·cs.LG·November 12, 2019

Missing Features Reconstruction and Its Impact on Classification Accuracy

Magda Friedjungov\'a, Daniel Va\v{s}ata, Marcel Ji\v{r}ina

PDF

TL;DR

This paper investigates how different feature imputation methods affect classification accuracy when entire features are missing, comparing traditional and modern techniques through extensive experiments.

Contribution

It introduces new approaches for using MLP and XGBT in feature imputation and provides an empirical analysis of their effectiveness versus traditional methods.

Findings

01

MICE and linear regression are generally reliable imputers.

02

MLP and XGBT performance varies significantly across datasets.

03

Imputation method choice critically impacts classification accuracy.

Abstract

In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Nearest Neighbors · Linear Regression