Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness
Lu Yu, Haiyang Zhang, Changsheng Xu

TL;DR
This paper introduces a novel text-guided attention framework to improve zero-shot adversarial robustness of vision-language models like CLIP, achieving significant accuracy gains by refining attention mechanisms and incorporating complementary attention strategies.
Contribution
It proposes TGA-ZSR and Comp-TGA, innovative attention-based methods that enhance robustness and generalization of pre-trained vision-language models against adversarial attacks.
Findings
TGA-ZSR improves zero-shot robustness by 9.58%.
Comp-TGA achieves an 11.95% robustness increase.
Methods outperform state-of-the-art across 16 datasets.
Abstract
Due to the impressive zero-shot capabilities, pre-trained vision-language models (e.g., CLIP), have attracted widespread attention and adoption across various domains. Nonetheless, CLIP has been observed to be susceptible to adversarial examples. Through experimental analysis, we have observed a phenomenon wherein adversarial perturbations induce shifts in text-guided attention. Building upon this observation, we propose a simple yet effective strategy: Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR). This framework incorporates two components: Local Attention Refinement Module and Global Attention Constraint Module. Our goal is to maintain the generalization of the CLIP model and enhance its adversarial robustness. Additionally, the Global Attention Constraint Module acquires text-guided attention from both the target and original models using clean examples. Its objective is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
