TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

Alan Arazi; Eilam Shapira; Roi Reichart

arXiv:2505.18125·cs.LG·October 31, 2025

TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

Alan Arazi, Eilam Shapira, Roi Reichart

PDF

1 Models 1 Video

TL;DR

TabSTAR introduces a novel tabular foundation model that leverages target-aware textual representations and transfer learning to significantly improve performance on tabular data with text fields, surpassing traditional methods.

Contribution

The paper presents TabSTAR, a new architecture that incorporates target-aware textual embeddings and unfreezes a pretrained text encoder for enhanced tabular data modeling.

Findings

01

Achieves state-of-the-art results on classification benchmarks with text features.

02

Demonstrates effective transfer learning across multiple datasets.

03

Exhibits scaling laws in pretraining, indicating potential for further improvements.

Abstract

While deep learning has achieved remarkable success across many domains, it has historically underperformed on tabular learning tasks, which remain dominated by gradient boosting decision trees. However, recent advancements are paving the way for Tabular Foundation Models, which can leverage real-world knowledge and generalize across diverse datasets, particularly when the data contains free-text. Although incorporating language model capabilities into tabular tasks has been explored, most existing methods utilize static, target-agnostic textual representations, limiting their effectiveness. We introduce TabSTAR: a Tabular Foundation Model with Semantically Target-Aware Representations. TabSTAR is designed to enable transfer learning on tabular data with textual features, with an architecture free of dataset-specific parameters. It unfreezes a pretrained text encoder and takes as input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
alana89/TabSTAR
model· 56k dl· ♡ 24
56k dl♡ 24

Videos

TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields· slideslive