Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models
Magnus B\"uhler, Lennart Purucker, Frank Hutter

TL;DR
This paper introduces CausalMixFT, a causal data augmentation method that improves the robustness and performance of fine-tuning tabular foundation models with limited data by generating causally consistent synthetic samples.
Contribution
The paper presents CausalMixFT, a novel augmentation technique using Structural Causal Models to enhance fine-tuning stability and accuracy in low-data scenarios for tabular models.
Findings
Improves median ROC-AUC from 0.10 to 0.12 over standard fine-tuning.
Reduces validation-test performance gap from 0.67 to 0.30.
Outperforms other synthetic data generators like CTGAN and TabEBM.
Abstract
Fine-tuning tabular foundation models (TFMs) under data scarcity is challenging, as early stopping on even scarcer validation data often fails to capture true generalization performance. We propose CausalMixFT, a method that enhances fine-tuning robustness and downstream performance by generating structurally consistent synthetic samples using Structural Causal Models (SCMs) fitted on the target dataset. This approach augments limited real data with causally informed synthetic examples, preserving feature dependencies while expanding training diversity. Evaluated across 33 classification datasets from TabArena and over 2300 fine-tuning runs, our CausalMixFT method consistently improves median normalized ROC-AUC from 0.10 (standard fine-tuning) to 0.12, outperforming purely statistical generators such as CTGAN (-0.01), TabEBM (-0.04), and TableAugment (-0.09). Moreover, it narrows the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Materials Science · Topic Modeling
