TabFlex: Scaling Tabular Learning to Millions with Linear Attention
Yuchen Zeng, Tuan Dinh, Wonjun Kang, Andreas C Mueller

TL;DR
TabFlex is a scalable, efficient model that enhances large-scale tabular learning by integrating linear attention, enabling rapid processing of massive datasets with improved speed and competitive accuracy.
Contribution
This work introduces TabFlex, a novel approach that scales tabular learning to millions of samples using linear attention mechanisms, outperforming existing methods in speed and efficiency.
Findings
TabFlex processes large datasets in seconds, e.g., poker-hand with over a million samples in 5 seconds.
TabFlex achieves over 2x speedup compared to TabPFN and 1.5x over XGBoost.
TabFlex outperforms 25 baselines in efficiency across diverse datasets.
Abstract
Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger datasets by incorporating linear attention mechanisms as a scalable alternative to complexity-quadratic self-attention. Our model, TabFlex, efficiently handles tabular datasets with thousands of features and hundreds of classes, scaling seamlessly to millions of samples. For instance, TabFlex processes the poker-hand dataset with over a million samples in just 5 seconds. Our extensive evaluations demonstrate that TabFlex can achieve over a 2x speedup compared to TabPFN and a 1.5x speedup over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Topic Modeling
