TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified Synthesis

Zerui Cheng; Jiashuo Liu; Jianzhu Yao; Pramod Viswanath; Ge Zhang; Wenhao Huang

arXiv:2602.02523·cs.LG·February 4, 2026

TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified Synthesis

Zerui Cheng, Jiashuo Liu, Jianzhu Yao, Pramod Viswanath, Ge Zhang, Wenhao Huang

PDF

Open Access

TL;DR

This paper introduces TabularMath, a benchmark for evaluating tabular models' ability to perform computational extrapolation on deterministic problems, revealing strengths and limitations of current models like TabPFN and GPT-based ICL.

Contribution

The paper presents a new benchmark, TabularMath, for assessing computational extrapolation in tabular learning, and provides a comparative analysis of nine architectures and GPT-ICL on this benchmark.

Findings

01

TabPFN achieves high R^2 but low exact match accuracy out-of-distribution.

02

GPT-OSS-120B ICL maintains better exact match accuracy under distribution shift.

03

Models excel at smooth function approximation but struggle with precise extrapolation.

Abstract

Standard tabular benchmarks mainly focus on the evaluation of a model's capability to interpolate values inside a data manifold, where models good at performing local statistical smoothing are rewarded. However, there exists a very large category of high-value tabular data, including financial modeling and physical simulations, which are generated based upon deterministic computational processes, as opposed to stochastic and noisy relationships. Therefore, we investigate if tabular models can provide an extension from statistical interpolation to computational extrapolation. We propose TabularMath, a diagnostic benchmark of 114 deterministic problems (233,472 rows) generated from verified programs based on GSM8K and AIME. We evaluate 9 tabular architectures and in-context learning (ICL) with GPT-OSS-120B. On standard regression metrics, TabPFN v2.5 performs remarkably well, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Numerical Methods and Algorithms