The Illusion of Generalization: Re-examining Tabular Language Model Evaluation

Aditya Gorla; Ratish Puduppully

arXiv:2602.04031·cs.LG·February 5, 2026

The Illusion of Generalization: Re-examining Tabular Language Model Evaluation

Aditya Gorla, Ratish Puduppully

PDF

Open Access

TL;DR

This paper critically re-evaluates the claimed generalization abilities of Tabular Language Models, revealing that their performance is largely due to dataset contamination and evaluation artifacts rather than true tabular reasoning.

Contribution

It systematically analyzes TLMs on a large benchmark, exposing evaluation flaws and demonstrating that much of the reported performance is attributable to dataset issues and format familiarity.

Findings

01

Near-zero median lift over baselines in classification tasks

02

Detection of pervasive dataset contamination and leakage

03

Instruction-tuning without tabular data recovers most performance

Abstract

Tabular Language Models (TLMs) have been claimed to achieve emergent generalization for tabular prediction. We conduct a systematic re-evaluation of Tabula-8B as a representative TLM, utilizing 165 datasets from the UniPredict benchmark. Our investigation reveals three findings. First, binary and categorical classification achieve near-zero median lift over majority-class baselines and strong aggregate performance is driven entirely by quartile classification tasks. Second, top-performing datasets exhibit pervasive contamination, including complete train-test overlap and task-level leakage that evades standard deduplication. Third, instruction-tuning without tabular exposure recovers 92.2% of standard classification performance and on quartile classification, format familiarity closes 71.3% of the gap with the residual attributable to contaminated datasets. These findings suggest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification