Robust Tabular Foundation Models
Matthew Peroni, Franck Le, Vadim Sheinin

TL;DR
This paper introduces RTFM, an adversarial training framework for tabular foundation models that uses synthetic data to improve robustness and performance, achieving up to 6% AUC gains.
Contribution
We propose a novel adversarial training method for TFMs that leverages synthetic data generation to enhance model robustness and performance.
Findings
RTFM improves benchmark performance by up to 6% AUC.
Synthetic data generation can be effectively used for adversarial training.
The approach requires less than 100k additional synthetic datasets.
Abstract
The development of tabular foundation models (TFMs) has accelerated in recent years, showing strong potential to outperform traditional ML methods for structured data. A key finding is that TFMs can be pretrained entirely on synthetic datasets, opening opportunities to design data generators that encourage desirable model properties. Prior work has mainly focused on crafting high-quality priors over generators to improve overall pretraining performance. Our insight is that parameterizing the generator distribution enables an adversarial robustness perspective: during training, we can adapt the generator to emphasize datasets that are particularly challenging for the model. We formalize this by introducing an optimality gap measure, given by the difference between TFM performance and the best achievable performance as estimated by strong baselines such as XGBoost, CatBoost, and Random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
