The Illusion of Generalization: Re-examining Tabular Language Model Evaluation
Aditya Gorla, Ratish Puduppully

TL;DR
This paper critically re-evaluates the claimed generalization abilities of Tabular Language Models, revealing that their performance is largely due to dataset contamination and evaluation artifacts rather than true tabular reasoning.
Contribution
It systematically analyzes TLMs on a large benchmark, exposing evaluation flaws and demonstrating that much of the reported performance is attributable to dataset issues and format familiarity.
Findings
Near-zero median lift over baselines in classification tasks
Detection of pervasive dataset contamination and leakage
Instruction-tuning without tabular data recovers most performance
Abstract
Tabular Language Models (TLMs) have been claimed to achieve emergent generalization for tabular prediction. We conduct a systematic re-evaluation of Tabula-8B as a representative TLM, utilizing 165 datasets from the UniPredict benchmark. Our investigation reveals three findings. First, binary and categorical classification achieve near-zero median lift over majority-class baselines and strong aggregate performance is driven entirely by quartile classification tasks. Second, top-performing datasets exhibit pervasive contamination, including complete train-test overlap and task-level leakage that evades standard deduplication. Third, instruction-tuning without tabular exposure recovers 92.2% of standard classification performance and on quartile classification, format familiarity closes 71.3% of the gap with the residual attributable to contaminated datasets. These findings suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification
