FCMI: Feature Correlation based Missing Data Imputation

Prateek Mishra; Kumar Divya Mani; Prashant Johri; Dikhsa Arya

arXiv:2107.00100·cs.LG·July 2, 2021·1 cites

FCMI: Feature Correlation based Missing Data Imputation

Prateek Mishra, Kumar Divya Mani, Prashant Johri, Dikhsa Arya

PDF

Open Access

TL;DR

The paper introduces FCMI, a novel missing data imputation method that leverages feature correlation to improve data quality for better analysis and prediction accuracy.

Contribution

It presents a new correlation-based imputation algorithm that selects highly correlated features to maintain data integrity during imputation.

Findings

01

Outperforms existing imputation algorithms in experiments

02

Effective on both classification and regression datasets

03

Maintains feature correlation during imputation

Abstract

Processed data are insightful, and crude data are obtuse. A serious threat to data reliability is missing values. Such data leads to inaccurate analysis and wrong predictions. We propose an efficient technique to impute the missing value in the dataset based on correlation called FCMI (Feature Correlation based Missing Data Imputation). We have considered the correlation of the attributes of the dataset, and that is our central idea. Our proposed algorithm picks the highly correlated attributes of the dataset and uses these attributes to build a regression model whose parameters are optimized such that the correlation of the dataset is maintained. Experiments conducted on both classification and regression datasets show that the proposed imputation technique outperforms existing imputation algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Mobility and Location-Based Analysis · Bayesian Methods and Mixture Models · Data-Driven Disease Surveillance