Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer
Yulun Wu, Doron L. Bergman

TL;DR
This paper introduces an adversarially pre-trained transformer that enables zero-shot tabular prediction, handling diverse datasets and class sizes, and improves performance and efficiency over existing methods.
Contribution
The paper proposes a novel adversarial pre-training approach and a mixture block architecture for zero-shot tabular prediction, addressing class size limitations and enhancing generalization.
Findings
Matches state-of-the-art on small classification tasks
Enhances performance on benchmark datasets in classification and regression
Maintains under one second runtime on average
Abstract
We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without pre-training on any real-world dataset, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation -- a crucial weakness of prior deep tabular zero-shot learners. In experiments, we show that our framework matches state-of-the-art performance on small classification tasks without filtering on dataset characteristics such as number of classes and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
