Missing Data Imputation for Classification Problems
Arkopal Choudhury, Michael R. Kosorok

TL;DR
This paper introduces a new iterative kNN imputation method that uses class weighted grey distance and mutual information to improve missing data imputation for classification tasks, especially with heterogeneous data.
Contribution
It proposes a novel imputation technique combining class weighted grey distance and mutual information, enhancing classification accuracy with missing data.
Findings
Outperforms existing kNN, MICE, and missForest imputation methods.
Improves classification performance on simulated and UCI datasets.
Effective with various missing data rates.
Abstract
Imputation of missing data is a common application in various classification problems where the feature training matrix has missingness. A widely used solution to this imputation problem is based on the lazy learning technique, -nearest neighbor (kNN) approach. However, most of the previous work on missing data does not take into account the presence of the class label in the classification problem. Also, existing kNN imputation methods use variants of Minkowski distance as a measure of distance, which does not work well with heterogeneous data. In this paper, we propose a novel iterative kNN imputation technique based on class weighted grey distance between the missing datum and all the training data. Grey distance works well in heterogeneous data with missing instances. The distance is weighted by Mutual Information (MI) which is a measure of feature relevance between the features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Statistical Methods and Inference · Multi-Criteria Decision Making
