TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning
Mingyu Zheng, Zhifan Feng, Jia Wang, Lanrui Wang, Zheng Lin, Yang Hao, Weiping Wang

TL;DR
TableDreamer introduces a progressive, weakness-guided data synthesis framework for table instruction tuning, significantly improving data diversity and model performance by iteratively exploring input space based on identified weaknesses.
Contribution
It presents a novel iterative synthesis method that leverages weakness guidance to generate diverse, high-quality training data for table understanding tasks.
Findings
Boosts Llama3.1-8B-instruct accuracy by 11.62%.
Outperforms state-of-the-art data synthesis methods.
Uses only 27K synthetic data for training.
Abstract
Despite the commendable progress of recent LLM-based data synthesis methods, they face two limitations in generating table instruction tuning data. First, they can not thoroughly explore the vast input space of table understanding tasks, leading to limited data diversity. Second, they ignore the weaknesses in table understanding ability of the target LLM and blindly pursue the increase of data quantity, resulting in suboptimal data efficiency. In this paper, we introduce a progressive and weakness-guided data synthesis framework tailored for table instruction tuning, named TableDreamer, to mitigate the above issues. Specifically, we first synthesize diverse tables and related instructions as seed data, and then perform an iterative exploration of the input space under the guidance of the newly identified weakness data, which eventually serve as the final training data for fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Parallel Computing and Optimization Techniques · Natural Language Processing Techniques
