Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness
Zeyu Wang, Cihang Xie, Brian Bartoldson, Bhavya Kailkhura

TL;DR
This paper introduces a novel double visual defense approach involving large-scale adversarial pre-training and instruction tuning, significantly improving the robustness of vision-language models against adversarial attacks.
Contribution
It presents a new adversarial pre-training and instruction tuning framework that enhances vision-language model robustness beyond existing lightweight fine-tuning methods.
Findings
$$CLIP surpasses previous models by ~20% in adversarial robustness on ImageNet-1k.
$^2$LLaVA improves robustness by ~30% in image captioning and ~20% in visual question answering.
Models demonstrate stronger zero-shot recognition, fewer hallucinations, and better reasoning.
Abstract
This paper investigates the robustness of vision-language models against adversarial visual perturbations and introduces a novel ``double visual defense" to enhance this robustness. Unlike previous approaches that resort to lightweight adversarial fine-tuning of a pre-trained CLIP model, we perform large-scale adversarial vision-language pre-training from scratch using web-scale data. We then strengthen the defense by incorporating adversarial visual instruction tuning. The resulting models from each stage, CLIP and LLaVA, show substantially enhanced zero-shot robustness and set a new state-of-the-art in adversarial defense for vision-language models. For example, the adversarial robustness of CLIP surpasses that of the previous best models on ImageNet-1k by ~20%. %For example, CLIP surpasses the previous best models on ImageNet-1k by ~20% in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training
