Principal Component Analysis based frameworks for efficient missing data imputation algorithms
Thu Nguyen, Hoang Thien Ly, Michael Alexander Riegler, P{\aa}l, Halvorsen, Hugo L. Hammer

TL;DR
This paper introduces PCAI, a PCA-based framework for efficient missing data imputation that scales well to high-dimensional data and can handle categorical and large-missing-feature scenarios, maintaining quality.
Contribution
The paper proposes PCAI, a versatile PCA-based framework that accelerates imputation processes and reduces memory usage without sacrificing accuracy, applicable to various imputation algorithms.
Findings
PCAI significantly speeds up imputation compared to traditional methods.
PCAI maintains competitive imputation accuracy and classification performance.
The framework effectively handles categorical data and large numbers of missing features.
Abstract
Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputation quality in term of MSE. In addition, the frameworks can be used even when some or all of the missing features are categorical, or when the number of missing features is large. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Gene expression and cancer classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Principal Components Analysis
