StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment
Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin

TL;DR
StarFT is a novel fine-tuning framework that enhances robustness of zero-shot models by aligning output distributions to prevent learning spurious features, leading to significant improvements in group robustness and accuracy.
Contribution
StarFT introduces a regularization method that aligns outputs for spuriosity-injected labels, preventing models from learning irrelevant features during fine-tuning.
Findings
Boosts worst-group accuracy by 14.30% in Waterbirds
Improves average accuracy by 3.02%
Maintains robustness against spuriosity during fine-tuning
Abstract
Learning robust representations from data often requires scale, which has led to the success of recent zero-shot models such as CLIP. However, the obtained robustness can easily be deteriorated when these models are fine-tuned on other downstream tasks (e.g., of smaller scales). Previous works often interpret this phenomenon in the context of domain shift, developing fine-tuning methods that aim to preserve the original domain as much as possible. However, in a different context, fine-tuned models with limited data are also prone to learning features that are spurious to humans, such as background or texture. In this paper, we propose StarFT (Spurious Textual Alignment Regularization), a novel framework for fine-tuning zero-shot models to enhance robustness by preventing them from learning spuriosity. We introduce a regularization that aligns the output distribution for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Medical Imaging Techniques and Applications · Advanced Neural Network Applications
MethodsContrastive Language-Image Pre-training
