Adversarial Defense in Vision-Language Models: An Overview

Xiaowei Fu; Lei Zhang

arXiv:2601.12443·cs.CV·January 21, 2026

Adversarial Defense in Vision-Language Models: An Overview

Xiaowei Fu, Lei Zhang

PDF

Open Access

TL;DR

This paper reviews recent strategies to defend vision-language models like CLIP against adversarial attacks, categorizing defenses into training-time, test-time adaptation, and training-free methods, and discusses their strengths and limitations.

Contribution

It provides a comprehensive overview of current adversarial defense techniques for VLMs, highlighting recent advancements and ongoing challenges in robustness enhancement.

Findings

01

Training-time defenses improve robustness but are resource-intensive.

02

Test-time adaptation offers flexibility but increases complexity.

03

Training-free methods mitigate attacks without additional training.

Abstract

The widespread use of Vision Language Models (VLMs, e.g. CLIP) has raised concerns about their vulnerability to sophisticated and imperceptible adversarial attacks. These attacks could compromise model performance and system security in cross-modal tasks. To address this challenge, three main defense paradigms have been proposed: Training-time Defense, Test-time Adaptation Defense, and Training-free Defense. Training-time Defense involves modifying the training process, typically through adversarial fine-tuning to improve the robustness to adversarial examples. While effective, this approach requires substantial computational resources and may not generalize across all adversarial attacks. Test-time Adaptation Defense focuses on adapting the model at inference time by updating its parameters to handle unlabeled adversarial examples, offering flexibility but often at the cost of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Multimodal Machine Learning Applications