Missing Features Reconstruction and Its Impact on Classification Accuracy
Magda Friedjungov\'a, Daniel Va\v{s}ata, Marcel Ji\v{r}ina

TL;DR
This paper investigates how different feature imputation methods affect classification accuracy when entire features are missing, comparing traditional and modern techniques through extensive experiments.
Contribution
It introduces new approaches for using MLP and XGBT in feature imputation and provides an empirical analysis of their effectiveness versus traditional methods.
Findings
MICE and linear regression are generally reliable imputers.
MLP and XGBT performance varies significantly across datasets.
Imputation method choice critically impacts classification accuracy.
Abstract
In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Nearest Neighbors · Linear Regression
