UniDM: A Unified Framework for Data Manipulation with Large Language Models
Yichen Qian, Yongyi He, Rong Zhu, Jintao Huang, Zhijian Ma, Haibin, Wang, Yaohua Wang, Xiuyu Sun, Defu Lian, Bolin Ding, Jingren Zhou

TL;DR
UniDM introduces a unified, automatic framework leveraging large language models to perform diverse data manipulation tasks in data lakes, reducing manual effort and improving performance across multiple benchmarks.
Contribution
This paper presents UniDM, the first general framework that formalizes data manipulation tasks and employs LLMs with automatic context retrieval and prompting for broad applicability.
Findings
Achieves state-of-the-art results on various data manipulation benchmarks
Demonstrates high generality across multiple data tasks
Reduces manual effort in data lake management
Abstract
Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models. Recent methods apply Large Language Models (LLMs) to resolve multiple data manipulation tasks. They exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. This is very costly and can not catch up with the requirements of big data lake platforms. In this paper, inspired by the cross-task generality of LLMs on NLP tasks, we pave the first step to design an automatic and general solution to tackle with data manipulation tasks. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks using LLMs. UniDM formalizes a number of data manipulation tasks in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
