Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models
Zane Xu, Jason Sun

TL;DR
This paper reviews defense strategies for improving zero-shot adversarial robustness in vision-language models, highlighting the trade-offs, evolution of methods, and future research directions.
Contribution
It synthesizes key defense paradigms and methods for VLMs, providing a comprehensive overview of the field's evolution and challenges.
Findings
Analysis of adversarial fine-tuning and test-time defenses
Comparison of alignment-preserving and embedding re-engineering methods
Identification of future directions like hybrid strategies and adversarial pre-training
Abstract
This report synthesizes eight seminal papers on the zero-shot adversarial robustness of vision-language models (VLMs) like CLIP. A central challenge in this domain is the inherent trade-off between enhancing adversarial robustness and preserving the model's zero-shot generalization capabilities. We analyze two primary defense paradigms: Adversarial Fine-Tuning (AFT), which modifies model parameters, and Training-Free/Test-Time Defenses, which preserve them. We trace the evolution from alignment-preserving methods (TeCoA) to embedding space re-engineering (LAAT, TIMA), and from input heuristics (AOM, TTC) to latent-space purification (CLIPure). Finally, we identify key challenges and future directions including hybrid defense strategies and adversarial pre-training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
