Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models

Magnus B\"uhler; Lennart Purucker; Frank Hutter

arXiv:2601.04110·cs.LG·January 22, 2026

Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models

Magnus B\"uhler, Lennart Purucker, Frank Hutter

PDF

Open Access

TL;DR

This paper introduces CausalMixFT, a causal data augmentation method that improves the robustness and performance of fine-tuning tabular foundation models with limited data by generating causally consistent synthetic samples.

Contribution

The paper presents CausalMixFT, a novel augmentation technique using Structural Causal Models to enhance fine-tuning stability and accuracy in low-data scenarios for tabular models.

Findings

01

Improves median ROC-AUC from 0.10 to 0.12 over standard fine-tuning.

02

Reduces validation-test performance gap from 0.67 to 0.30.

03

Outperforms other synthetic data generators like CTGAN and TabEBM.

Abstract

Fine-tuning tabular foundation models (TFMs) under data scarcity is challenging, as early stopping on even scarcer validation data often fails to capture true generalization performance. We propose CausalMixFT, a method that enhances fine-tuning robustness and downstream performance by generating structurally consistent synthetic samples using Structural Causal Models (SCMs) fitted on the target dataset. This approach augments limited real data with causally informed synthetic examples, preserving feature dependencies while expanding training diversity. Evaluated across 33 classification datasets from TabArena and over 2300 fine-tuning runs, our CausalMixFT method consistently improves median normalized ROC-AUC from 0.10 (standard fine-tuning) to 0.12, outperforming purely statistical generators such as CTGAN (-0.01), TabEBM (-0.04), and TableAugment (-0.09). Moreover, it narrows the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Materials Science · Topic Modeling