Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

Junjie Xing; Yeye He; Mengyu Zhou; Haoyu Dong; Shi Han; Dongmei Zhang; Surajit Chaudhuri

arXiv:2410.12164·cs.CL·March 25, 2026·2 cites

Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang, Surajit Chaudhuri

PDF

Open Access

TL;DR

This paper introduces Table-LLM-Specialist, a self-training fine-tuning method for language models on table tasks, leveraging dual formulations and iterative generation-validation to improve performance without manual labels.

Contribution

It proposes a novel Generator-Validator paradigm that enhances table task performance by systematic data generation and validation, reducing reliance on labeled data.

Findings

01

Models fine-tuned with Table-LLM-Specialist outperform base models on multiple benchmarks.

02

The approach enables smaller models to achieve high-quality results with lower costs.

03

Fine-tuned models are integrated into Microsoft Excel for real-world data cleaning.

Abstract

Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code and data cleaning) remains suboptimal. Improving performance typically requires task-specific fine-tuning, which depends on expensive human labeling and is prone to overfitting. In this work, we propose Table-LLM-Specialist, a self-trained fine-tuning paradigm designed for table tasks. Our key insight is that many table tasks admit two dual formulations: a generative version and a classification version. Leveraging this duality, we introduce a Generator-Validator paradigm that iteratively generates and validates training data using language models, enabling effective fine-tuning without manually labeled data. Extensive evaluations on Llama, GPT-3.5, and GPT-4 show that Table-LLM-Specialist achieves (1) strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Natural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Cosine Annealing · Absolute Position Encodings · Label Smoothing · Transformer · Dropout