LLMDR: Large language model driven framework for missing data recovery in mixed data under low resource regime
Durga Keshav, GVD Praneeth, Chetan Kumar Patruni, Vivek Yelleti, U Sai Ram

TL;DR
This paper introduces LLMDR, a novel framework utilizing large language models and clustering to recover missing data in mixed datasets, especially effective under low-resource conditions, outperforming traditional imputation methods.
Contribution
The paper presents a two-stage LLM-based framework combining clustering and consensus mechanisms for improved missing data recovery in mixed datasets, addressing limitations of existing methods.
Findings
Effective data recovery demonstrated on various mixed datasets.
Outperforms traditional imputation methods in accuracy and statistical measures.
Consensus mechanism enhances the reliability of data imputation.
Abstract
The missing data problem is one of the important issues to address for achieving data quality. While imputation-based methods are designed to achieve data completeness, their efficacy is observed to be diminishing as and when there is increasing in the missingness percentage. Further, extant approaches often struggle to handle mixed-type datasets, typically supporting either numerical and/or categorical data. In this work, we propose LLMDR, automatic data recovery framework which operates in two stage approach, wherein the Stage-I: DBSCAN clustering algorithm is employed to select the most representative samples and in the Stage-II: Multi-LLMs are employed for data recovery considering the local and global representative samples; Later, this framework invokes the consensus algorithm for recommending a more accurate value based on other LLMs of local and global effective samples.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Survey Methodology and Nonresponse · Mobile Crowdsensing and Crowdsourcing
