Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan, Shiguang Shan

TL;DR
This paper introduces PMG-AFT, a novel fine-tuning method guided by pre-trained models to enhance zero-shot adversarial robustness of vision-language models like CLIP, without sacrificing generalization.
Contribution
The paper proposes a pre-trained model guided adversarial fine-tuning approach that preserves generalization features while improving robustness against adversarial attacks.
Findings
Significantly outperforms state-of-the-art methods in zero-shot robustness.
Improves clean accuracy alongside adversarial robustness.
Demonstrates effectiveness across 15 zero-shot datasets.
Abstract
Large-scale pre-trained vision-language models like CLIP have demonstrated impressive performance across various tasks, and exhibit remarkable zero-shot generalization capability, while they are also vulnerable to imperceptible adversarial examples. Existing works typically employ adversarial training (fine-tuning) as a defense method against adversarial examples. However, direct application to the CLIP model may result in overfitting, compromising the model's capacity for generalization. In this paper, we propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) method, which leverages supervision from the original pre-trained model by carefully designing an auxiliary branch, to enhance the model's zero-shot adversarial robustness. Specifically, PMG-AFT minimizes the distance between the features of adversarial examples in the target model and those in the pre-trained model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsContrastive Language-Image Pre-training
