Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors
Pengxiang Cai, Zihao Gao, Wanchen Lian, Jintai Chen

TL;DR
This paper introduces PRPO, a reinforcement learning framework that enhances large language models' numerical reasoning for tabular prediction by leveraging structural priors, achieving strong zero-shot performance and outperforming larger models.
Contribution
It presents PRPO, a novel reinforcement learning approach that encodes structural priors for improved numerical reasoning in LLMs on tabular data, especially in zero-shot scenarios.
Findings
PRPO matches supervised baselines in tabular prediction tasks.
The method outperforms larger LLMs, with up to 53.17% improvement.
8B LLMs with PRPO outperform 685B models in certain tasks.
Abstract
Tabular prediction traditionally relies on gradient-boosted decision trees and deep learning models, which excel in specific tasks but lack interpretability and transferability. Reasoning large language models (LLMs) promise cross-task adaptability with transparent reasoning traces, yet their potential for tabular data remains unrealized. To bridge this gap, we propose a reasoning framework centered on Permutation Relative Policy Optimization (PRPO), a reinforcement learning method that encodes column-permutation invariance as a structural prior. By estimating advantages across label-preserving permutations, PRPO transforms sparse rewards into dense signals, activating latent numerical reasoning capabilities of LLMs with limited supervision. Extensive experiments show that our method matches fully supervised baselines and dominates in zero-shot settings, performing on par with 32-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
