Imputation using training labels and classification via label imputation
Thu Nguyen, Tuan L. Vo, P{\aa}l Halvorsen, Michael A. Riegler

TL;DR
This paper introduces a novel imputation method that incorporates training labels into the imputation process, improving accuracy in classification tasks with missing data, especially for imbalanced and categorical data.
Contribution
It proposes the CBMI classification strategy and the IUL imputation algorithm, which jointly utilize labels and input data for enhanced imputation and classification performance.
Findings
CBMI achieves high classification accuracy with missing test data.
IUL significantly improves imputation quality over input-only methods.
Methods perform well on imbalanced and categorical datasets.
Abstract
Missing data is a common problem in practical data science settings. Various imputation methods have been developed to deal with missing data. However, even though the labels are available in the training data in many situations, the common practice of imputation usually only relies on the input and ignores the label. We propose Classification Based on MissForest Imputation (CBMI), a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation, allowing the label and the input to be imputed simultaneously. In addition, we propose the imputation using labels (IUL) algorithm, an imputation strategy that stacks the label into the input and illustrates how it can significantly improve the imputation quality. Experiments show that CBMI has classification accuracy when the test set contains missing data, especially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Human Pose and Action Recognition
MethodsSparse Evolutionary Training
