Double Visual Defense: Adversarial Pre-training and Instruction Tuning   for Improving Vision-Language Model Robustness

Zeyu Wang; Cihang Xie; Brian Bartoldson; Bhavya Kailkhura

arXiv:2501.09446·cs.CV·April 9, 2025

Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness

Zeyu Wang, Cihang Xie, Brian Bartoldson, Bhavya Kailkhura

PDF

Open Access 5 Models

TL;DR

This paper introduces a novel double visual defense approach involving large-scale adversarial pre-training and instruction tuning, significantly improving the robustness of vision-language models against adversarial attacks.

Contribution

It presents a new adversarial pre-training and instruction tuning framework that enhances vision-language model robustness beyond existing lightweight fine-tuning methods.

Findings

01

$$CLIP surpasses previous models by ~20% in adversarial robustness on ImageNet-1k.

02

$^2$LLaVA improves robustness by ~30% in image captioning and ~20% in visual question answering.

03

Models demonstrate stronger zero-shot recognition, fewer hallucinations, and better reasoning.

Abstract

This paper investigates the robustness of vision-language models against adversarial visual perturbations and introduces a novel ``double visual defense" to enhance this robustness. Unlike previous approaches that resort to lightweight adversarial fine-tuning of a pre-trained CLIP model, we perform large-scale adversarial vision-language pre-training from scratch using web-scale data. We then strengthen the defense by incorporating adversarial visual instruction tuning. The resulting models from each stage, $Δ$ CLIP and $Δ^{2}$ LLaVA, show substantially enhanced zero-shot robustness and set a new state-of-the-art in adversarial defense for vision-language models. For example, the adversarial robustness of $Δ$ CLIP surpasses that of the previous best models on ImageNet-1k by ~20%. %For example, $Δ$ CLIP surpasses the previous best models on ImageNet-1k by ~20% in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training