Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day
Milad Abdollahzadeh, Abdul Raheem, Zilong Zhao, Uzair Javaid, Kevin Yee, Nalam Venkata Abhishek, Tram Truong-Huu, Biplab Sikdar

TL;DR
This paper demonstrates that instruction tuning of open-source large language models with a high-quality dataset can significantly enhance their ability to generate tabular data, achieving results comparable to GPT-4o in under six hours using limited resources.
Contribution
It introduces a novel approach to improve tabular data generation in LLMs through efficient instruction tuning with limited data and computational resources.
Findings
Achieved performance comparable to GPT-4o in tabular data generation
Used only 7K instructions and less than 6 hours of training on an A100 GPU
Created a high-quality instruction dataset for tabular data understanding
Abstract
Tabular instruction tuning has emerged as a promising research direction for improving LLMs understanding of tabular data. However, the majority of existing works only consider question-answering and reasoning tasks over tabular data, leaving tabular data generation largely unnoticed. In this work, for the first time, we explore the efficacy of instruction tuning in improving LLMs tabular data generation capabilities. More specifically, given the high data and computation requirements of tabular instruction tuning, we aim to address the possibility of instruction tuning for tabular data generation with limited data and computational resources. To achieve this, we first create a high-quality instruction dataset for tabular data, enabling efficient LLM comprehension. We then instruction-tune an open-source LLM (Llama3.1-8B-Instruct) on the training set of this dataset to improve its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
