Securing Vision-Language Models with a Robust Encoder Against Jailbreak   and Adversarial Attacks

Md Zarif Hossain; Ahmed Imteaj

arXiv:2409.07353·cs.CV·September 12, 2024

Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks

Md Zarif Hossain, Ahmed Imteaj

PDF

Open Access

TL;DR

This paper introduces Sim-CLIP+, a robust vision encoder fine-tuned adversarially to defend large vision-language models against jailbreak and adversarial attacks, maintaining accuracy and enhancing security.

Contribution

We propose Sim-CLIP+, a novel adversarial fine-tuning method for CLIP's vision encoder that improves robustness without modifying existing LVLM architectures.

Findings

01

Sim-CLIP+ effectively defends against gradient-based adversarial attacks.

02

The method maintains high accuracy on clean datasets.

03

It significantly reduces vulnerability to jailbreak techniques.

Abstract

Large Vision-Language Models (LVLMs), trained on multimodal big datasets, have significantly advanced AI by excelling in vision-language tasks. However, these models remain vulnerable to adversarial attacks, particularly jailbreak attacks, which bypass safety protocols and cause the model to generate misleading or harmful responses. This vulnerability stems from both the inherent susceptibilities of LLMs and the expanded attack surface introduced by the visual modality. We propose Sim-CLIP+, a novel defense mechanism that adversarially fine-tunes the CLIP vision encoder by leveraging a Siamese architecture. This approach maximizes cosine similarity between perturbed and clean samples, facilitating resilience against adversarial manipulations. Sim-CLIP+ offers a plug-and-play solution, allowing seamless integration into existing LVLM architectures as a robust vision encoder. Unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsContrastive Language-Image Pre-training