Does Prompt Design Impact Quality of Data Imputation by LLMs?

Shreenidhi Srinivasan; Lydia Manikonda

arXiv:2506.04172·cs.LG·June 5, 2025

Does Prompt Design Impact Quality of Data Imputation by LLMs?

Shreenidhi Srinivasan, Lydia Manikonda

PDF

Open Access

TL;DR

This paper investigates how prompt design influences the quality of data imputation using large language models, proposing a novel token-aware method that improves imputation for class-imbalanced datasets while reducing prompt size.

Contribution

It introduces a structured prompting technique and demonstrates its effectiveness in enhancing LLM-based data imputation for imbalanced datasets, addressing a key gap in the field.

Findings

01

Significantly reduces input prompt size

02

Maintains or improves imputation quality

03

Effective for small, class-imbalanced datasets

Abstract

Generating realistic synthetic tabular data presents a critical challenge in machine learning. It adds another layer of complexity when this data contain class imbalance problems. This paper presents a novel token-aware data imputation method that leverages the in-context learning capabilities of large language models. This is achieved through the combination of a structured group-wise CSV-style prompting technique and the elimination of irrelevant contextual information in the input prompt. We test this approach with two class-imbalanced binary classification datasets and evaluate the effectiveness of imputation using classification-based evaluation metrics. The experimental results demonstrate that our approach significantly reduces the input prompt size while maintaining or improving imputation quality compared to our baseline prompt, especially for datasets that are of relatively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Algorithms · Machine Learning and Data Classification