Empowering Tabular Data Preparation with Language Models: Why and How?

Mengshi Chen; Yuxiang Sun; Tengchao Li; Jianwei Wang; Kai Wang; Xuemin Lin; Ying Zhang; Wenjie Zhang

arXiv:2508.01556·cs.AI·August 5, 2025

Empowering Tabular Data Preparation with Language Models: Why and How?

Mengshi Chen, Yuxiang Sun, Tengchao Li, Jianwei Wang, Kai Wang, Xuemin Lin, Ying Zhang, Wenjie Zhang

PDF

Open Access

TL;DR

This paper systematically explores how Large Language Models can be effectively utilized across all phases of tabular data preparation, addressing current challenges and proposing integrated approaches.

Contribution

It provides a comprehensive analysis of the role of LMs in data acquisition, integration, cleaning, and transformation for tabular data preparation.

Findings

01

LMs can automate complex data cleaning tasks

02

Integrated pipelines enhance data preparation efficiency

03

Key advancements in LM applications for data tasks

Abstract

Data preparation is a critical step in enhancing the usability of tabular data and thus boosts downstream data-driven tasks. Traditional methods often face challenges in capturing the intricate relationships within tables and adapting to the tasks involved. Recent advances in Language Models (LMs), especially in Large Language Models (LLMs), offer new opportunities to automate and support tabular data preparation. However, why LMs suit tabular data preparation (i.e., how their capabilities match task demands) and how to use them effectively across phases still remain to be systematically explored. In this survey, we systematically analyze the role of LMs in enhancing tabular data preparation processes, focusing on four core phases: data acquisition, integration, cleaning, and transformation. For each phase, we present an integrated analysis of how LMs can be combined with other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Computational and Text Analysis Methods · Topic Modeling