Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day

Milad Abdollahzadeh; Abdul Raheem; Zilong Zhao; Uzair Javaid; Kevin Yee; Nalam Venkata Abhishek; Tram Truong-Huu; Biplab Sikdar

arXiv:2511.23220·cs.CV·December 1, 2025

Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day

Milad Abdollahzadeh, Abdul Raheem, Zilong Zhao, Uzair Javaid, Kevin Yee, Nalam Venkata Abhishek, Tram Truong-Huu, Biplab Sikdar

PDF

Open Access

TL;DR

This paper demonstrates that instruction tuning of open-source large language models with a high-quality dataset can significantly enhance their ability to generate tabular data, achieving results comparable to GPT-4o in under six hours using limited resources.

Contribution

It introduces a novel approach to improve tabular data generation in LLMs through efficient instruction tuning with limited data and computational resources.

Findings

01

Achieved performance comparable to GPT-4o in tabular data generation

02

Used only 7K instructions and less than 6 hours of training on an A100 GPU

03

Created a high-quality instruction dataset for tabular data understanding

Abstract

Tabular instruction tuning has emerged as a promising research direction for improving LLMs understanding of tabular data. However, the majority of existing works only consider question-answering and reasoning tasks over tabular data, leaving tabular data generation largely unnoticed. In this work, for the first time, we explore the efficacy of instruction tuning in improving LLMs tabular data generation capabilities. More specifically, given the high data and computation requirements of tabular instruction tuning, we aim to address the possibility of instruction tuning for tabular data generation with limited data and computational resources. To achieve this, we first create a high-quality instruction dataset for tabular data, enabling efficient LLM comprehension. We then instruction-tune an open-source LLM (Llama3.1-8B-Instruct) on the training set of this dataset to improve its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification