A Context-Aware Approach for Enhancing Data Imputation with Pre-trained Language Models
Ahatsham Hayat, Mohammad Rashedul Hasan

TL;DR
This paper introduces CRILM, a novel data imputation method using pre-trained language models to generate contextually relevant descriptors for missing data, improving performance across various missing data scenarios.
Contribution
CRILM leverages pre-trained language models for data imputation, combining large LMs for descriptor generation with fine-tuning small LMs, a novel approach for handling missing tabular data.
Findings
CRILM outperforms baselines with up to 10% accuracy improvement.
CRILM is robust across MCAR, MAR, and MNAR scenarios.
CRILM reduces biases in missing data imputation.
Abstract
This paper presents a novel approach named \textbf{C}ontextually \textbf{R}elevant \textbf{I}mputation leveraging pre-trained \textbf{L}anguage \textbf{M}odels (\textbf{CRILM}) for handling missing data in tabular datasets. Instead of relying on traditional numerical estimations, CRILM uses pre-trained language models (LMs) to create contextually relevant descriptors for missing values. This method aligns datasets with LMs' strengths, allowing large LMs to generate these descriptors and small LMs to be fine-tuned on the enriched datasets for enhanced downstream task performance. Our evaluations demonstrate CRILM's superior performance and robustness across MCAR, MAR, and challenging MNAR scenarios, with up to a 10\% improvement over the best-performing baselines. By mitigating biases, particularly in MNAR settings, CRILM improves downstream task performance and offers a cost-effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
