An Interdisciplinary and Cross-Task Review on Missing Data Imputation
Jicong Fan

TL;DR
This comprehensive review synthesizes missing data imputation methods across disciplines, covering classical, modern, and deep learning approaches, and discusses integration with downstream tasks, challenges, and future directions.
Contribution
It provides an interdisciplinary, cross-task categorization of imputation techniques, connecting statistical foundations with machine learning advances, and highlights future research challenges.
Findings
Classical and deep learning imputation methods are systematically categorized.
Integration of imputation with downstream tasks like classification is analyzed.
Future directions include privacy-preserving methods and generalizable models.
Abstract
Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring. Despite decades of research and numerous imputation methods, the literature remains fragmented across fields, creating a critical need for a comprehensive synthesis that connects statistical foundations with modern machine learning advances. This work systematically reviews core concepts-including missingness mechanisms, single versus multiple imputation, and different imputation goals-and examines problem characteristics across various domains. It provides a thorough categorization of imputation methods, spanning classical techniques (e.g., regression, the EM algorithm) to modern approaches like low-rank and high-rank matrix completion, deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
