Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification
Quangao Liu, Wei Yang, Chen Liang, Longlong Pang, Zhuozhang Zou

TL;DR
FT-TabPFN introduces a feature tokenization layer to improve the handling of categorical features in tabular classification, building upon the TabPFN model to enhance accuracy and applicability.
Contribution
The paper proposes FT-TabPFN, an improved version of TabPFN with a novel feature tokenization layer for better categorical feature handling.
Findings
Significant accuracy improvements on tabular classification tasks.
Enhanced handling of categorical features through feature tokenization.
Open-source implementation available for community use.
Abstract
Traditional methods for tabular classification usually rely on supervised learning from scratch, which requires extensive training data to determine model parameters. However, a novel approach called Prior-Data Fitted Networks (TabPFN) has changed this paradigm. TabPFN uses a 12-layer transformer trained on large synthetic datasets to learn universal tabular representations. This method enables fast and accurate predictions on new tasks with a single forward pass and no need for additional training. Although TabPFN has been successful on small datasets, it generally shows weaker performance when dealing with categorical features. To overcome this limitation, we propose FT-TabPFN, which is an enhanced version of TabPFN that includes a novel Feature Tokenization layer to better handle classification features. By fine-tuning it for downstream tasks, FT-TabPFN not only expands the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
Methodstabular data Prior-data Fitted Network
