Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

Rihui Jin; Zheyu Xin; Xing Xie; Zuoyi Li; Guilin Qi; Yongrui Chen; Xinbang Dai; Tongtong Wu; Gholamreza Haffari

arXiv:2506.06137·cs.LG·June 9, 2025

Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, Gholamreza Haffari

PDF

Open Access

TL;DR

This paper introduces Table-r1, a novel two-stage self-supervised and reinforcement learning approach that significantly improves program-based table reasoning in small language models, narrowing the performance gap with larger models.

Contribution

The paper proposes a new two-stage method, combining self-supervised layout inference and reinforcement learning, to enhance small language models' ability to perform reliable program-based table reasoning.

Findings

01

Table-r1 improves SLM accuracy by at least 15% over baseline models.

02

The method achieves performance comparable to large language models on multiple benchmarks.

03

It effectively generalizes across diverse table layouts and reasoning tasks.

Abstract

Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification