Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning   for Vision-Language Models

Jonggyu Jang; Hyeonsu Lyu; Jungyeon Koh; Hyun Jong Yang

arXiv:2411.00898·cs.CV·November 5, 2024

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models

Jonggyu Jang, Hyeonsu Lyu, Jungyeon Koh, Hyun Jong Yang

PDF

Open Access

TL;DR

This paper introduces a novel adversarial attack method for vision-language models that preserves image integrity and enhances visual reasoning by replacing objects with inpainting and using a contrastive loss.

Contribution

It proposes Replace-then-Perturb, a new attack procedure, and Contrastive-Adv, a contrastive learning-based loss, improving targeted adversarial attacks on VLMs.

Findings

01

Outperforms baseline adversarial attack algorithms.

02

Maintains overall image integrity during attacks.

03

Enhances the ability to generate visually coherent adversarial examples.

Abstract

The conventional targeted adversarial attacks add a small perturbation to an image to make neural network models estimate the image as a predefined target class, even if it is not the correct target class. Recently, for visual-language models (VLMs), the focus of targeted adversarial attacks is to generate a perturbation that makes VLMs answer intended target text outputs. For example, they aim to make a small perturbation on an image to make VLMs' answers change from "there is an apple" to "there is a baseball." However, answering just intended text outputs is insufficient for tricky questions like "if there is a baseball, tell me what is below it." This is because the target of the adversarial attacks does not consider the overall integrity of the original image, thereby leading to a lack of visual reasoning. In this work, we focus on generating targeted adversarial examples with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsFocus