Context-Aware Robust Fine-Tuning
Xiaofeng Mao, Yuefeng Chen, Xiaojun Jia, Rong Zhang, Hui Xue, Zhao Li

TL;DR
This paper introduces CAR-FT, a fine-tuning method for CLIP models that preserves their context-awareness and improves robustness and accuracy across diverse datasets, addressing the robustness loss caused by traditional fine-tuning.
Contribution
The paper proposes a novel regularization technique, CAR-FT, that maintains CLIP's context-aware features during fine-tuning, enhancing robustness and accuracy in downstream tasks.
Findings
CAR-FT achieves superior robustness on five OOD datasets.
CAR-FT improves accuracy on nine downstream tasks.
CAR-FT surpasses previous domain generalization methods, setting new state-of-the-art.
Abstract
Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsTest · Attentive Walk-Aggregating Graph Neural Network · Contrastive Language-Image Pre-training
