AIGT: AI Generative Table Based on Prompt
Mingming Zhang, Zhiqing Xiao, Guoshan Lu, Sai Wu, Weiqiang Wang, Xing, Fu, Can Yi, Junbo Zhao

TL;DR
AIGT leverages prompt-enhanced large language models with novel partitioning algorithms to generate high-quality synthetic tabular data, addressing privacy and scale challenges in enterprise data management.
Contribution
Introduces AIGT, a prompt-based LLM approach with long-token partitioning for scalable, high-quality synthetic tabular data generation.
Findings
Achieves state-of-the-art results on 14 of 20 datasets.
Effectively models large-scale tables with partitioning algorithms.
Demonstrates practical utility in industry risk control systems.
Abstract
Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively gener-ate realistic tabular data by leveraging semantic information and overcoming the challenges of high-dimensional data that arise from one-hot encoding. However, current methods do not fully utilize the rich information available in tables. To address this, we introduce AI Generative Table (AIGT) based on prompt enhancement, a novel approach that utilizes meta data information, such as table descriptions and schemas, as prompts to generate ultra-high quality synthetic data. To overcome the token limit constraints of LLMs, we propose long-token partitioning algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · AI-based Problem Solving and Planning · Data Mining Algorithms and Applications
