Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Maan Qraitem; Nazia Tasnim; Piotr Teterwak; Kate Saenko; Bryan A.; Plummer

arXiv:2402.00626·cs.CV·February 14, 2025·2 cites

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A., Plummer

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that large vision-language models are vulnerable to novel self-generated typographic attacks that significantly impair their classification accuracy, raising concerns about misinformation risks.

Contribution

It introduces a new experimental setup and two innovative self-generated attack methods that exploit LVLMs' reasoning abilities to deceive them more effectively.

Findings

01

Attacks reduce classification accuracy by up to 60%

02

Self-generated attacks are effective across multiple LVLMs

03

Proposes a new framework for testing model robustness against typographic deception

Abstract

Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in personal assistant applications. Previous attacks use simple strategies, such as random misleading words, which don't fully exploit LVLMs' language reasoning abilities. We introduce an experimental setup for testing typographic attacks on LVLMs and propose two novel self-generated attacks: (1) Class-based attacks, where the model identifies a similar class to deceive itself, and (2) Reasoned attacks, where an advanced LVLM suggests an attack combining a deceiving class and description. Our experiments show these attacks significantly reduce classification performance by up to 60\% and are effective across different models, including InstructBLIP and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mqraitem/self-gen-typo-attack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training