Causal Fine-Tuning under Latent Confounded Shift
Jialin Yu, Yuxiang Zhou, Haoxuan Li, Junchi Yu, Mengyue Yang, Yulan He, Nevin L. Zhang, Philip Torr, Ricardo Silva

TL;DR
This paper introduces Causal Fine-Tuning (CFT), a method leveraging causal models to improve robustness against latent confounded shifts in AI models, especially in text classification tasks.
Contribution
It proposes a novel fine-tuning framework that decomposes representations into causal and spurious components based on causal structure, enhancing model robustness.
Findings
CFT outperforms domain generalization baselines in text tasks.
Learning causal and spurious representations improves robustness.
The method effectively identifies stable high-level features.
Abstract
Adapting to latent confounded shift remains a core challenge in modern AI. This setting is driven by hidden variables that induce spurious correlations between inputs and outputs during training, leading models to rely on non-causal shortcuts. For example, a model may learn to treat metadata (e.g., data source like "Amazon") as a proxy for positive sentiment, causing failure when the source becomes predominantly negative during deployment. To address this latent confounded shift, we introduce Causal Fine-Tuning(CFT). Using a structural causal model as an inductive bias, we derive sufficient identification conditions that motivate a fine-tuning objective for decomposing representations into high-level stable and low-level shift-sensitive components. Instantiating this framework in BERT, we show that learning such causal/spurious representations and adjusting them accordingly yield a more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
