Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training

Yunbo Long; Tejumade Afonja,Guangya Hao,Alexandra Brintrup; Mario Fritz

arXiv:2604.18966·cs.LG·May 19, 2026

Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training

Yunbo Long, Tejumade Afonja,Guangya Hao,Alexandra Brintrup, Mario Fritz

PDF

TL;DR

This paper introduces TabGRAA, a reward-guided post-training method for tabular language models that improves synthetic data quality and utility through an iterative generate-score-align protocol.

Contribution

The paper proposes a novel group-relative alignment method, TabGRAA, for self-improving tabular language models using reward-guided post-training, outperforming existing baselines.

Findings

01

TabGRAA improves fidelity and utility trade-offs across benchmarks.

02

Stable group-level updates are crucial for gains.

03

Both classifier-based and classifier-free rewards are effective.

Abstract

Tabular language models can generate synthetic tables by modeling rows as token sequences, but they are typically trained once with supervised fine-tuning and then used as static synthesizers. This is limiting because next-token likelihood does not directly optimize the distributional, utility, and indistinguishability properties used to evaluate synthetic data. We study iterative reward-guided post-training for tabular language models through a generate--score--align protocol, where a generator samples synthetic rows, a task-specified reward ranks them, and the model is updated relative to a fixed supervised reference. Within this protocol, we propose \textbf{TabGRAA} (\textbf{Tab}ular \textbf{G}roup-\textbf{R}elative \textbf{A}dvantage \textbf{A}lignment), a group-relative alignment method that compares high- and low-reward generated groups using group-averaged policy/reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.