Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation

Congjing Zhang; Ryan Feng Lin; Ruoxuan Bao; Shuai Huang

arXiv:2602.04785·cs.LG·February 5, 2026

Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation

Congjing Zhang, Ryan Feng Lin, Ruoxuan Bao, Shuai Huang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces T$^2$, a novel framework using collaborative LLMs and a rigorous quality control pipeline to generate high-quality synthetic tabular data, addressing data scarcity and quality issues in ML applications.

Contribution

The paper presents a new assembly-line LLM framework with a three-stage QC pipeline for synthesizing superior tabular data, advancing data generation techniques.

Findings

01

T$^2$ outperforms existing methods in data quality metrics.

02

Empirical results show improved downstream model performance.

03

Framework effectively addresses class imbalance and bias issues.

Abstract

While tabular data is fundamental to many real-world machine learning (ML) applications, acquiring high-quality tabular data is usually labor-intensive and expensive. Limited by the scarcity of observations, tabular datasets often exhibit critical deficiencies, such as class imbalance, selection bias, and low fidelity. To address these challenges, building on recent advances in Large Language Models (LLMs), this paper introduces Team-then-Trim (T $^{2}$ ), a framework that synthesizes high-quality tabular data through a collaborative team of LLMs, followed by a rigorous three-stage plug-in data quality control (QC) pipeline. In T $^{2}$ , tabular data generation is conceptualized as a manufacturing process: specialized LLMs, guided by domain knowledge, are tasked with generating different data components sequentially, and the resulting products, i.e., the synthetic data, are systematically…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- The team-then-trim structure separates generation from post-hoc quality control, providing robustness against LLM hallucination. - The three-stage quality control pipeline (sanity, objective-driven filtering, diversity enforcement) is systematic and targets well-known challenges in synthetic data generation, including invalid entries, distributional bias, and limited incremental information. - The use of model-based scoring and information-gain comparison to filter batches offers a principled

Weaknesses

- The quality control pipeline assumes access to a reasonably performant base model and sufficient initial real data to bootstrap quality signals, which can limit applicability in low-data or scarce-label settings (including simulated data incompleteness setting in the paper). - The method incurs non-trivial computational overhead due to repeated generation, batch scoring, and rejection loops. The generation resource trade-offs are not fully addressed. - The reliance on a single trained classifi

Reviewer 02Rating 4Confidence 3

Strengths

- Leverages structural knowledge of the data during generation - Incorporates multi-level quality checks to ensure high-quality data from different points software view: sanity, utility, and diversity - Allows for the recovery of data subgroups missing in the original data

Weaknesses

- Evaluation against related work misses typical tabular generators, e.g., GReaT [1] and Tabula [2], and in particular also any other agentic LLM, e.g., [3] or diffusion-based ones, e.g., [4]. - All LLMs in the evaluation seem to be of the same type, i.e., Llama 3.3 70B Instruct, but the power of this method could also be to use more targeted LLMs for the different roles, coordinator vs worker, or for specific features. No evaluation in this direction has been done. - Following that, the same LL

Reviewer 03Rating 4Confidence 4

Strengths

- Overall idea and analogy of assembly line workers is intuitive enough. - The paper is clear to understand and well presented.

Weaknesses

- [**Experiment on recent baselines**] Addition of more recent baselines, especially the ones that explored the usage of LLMs for tabular generation [1, 2, 3] will strengthen the paper. Moreover, ‘team-then-trim’ has some similarities with [1] in terms of using specialized model components per column/subset of columns (MoEs for [1], worker LLMs here), so it is also important to compare and contrast the pros and cons in related works. - [**Experiment on model sizes**] Varying model sizes will be

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Imbalanced Data Classification Techniques · Machine Learning and Data Classification