Typographic Attacks in a Multi-Image Setting

Xiaomeng Wang; Zhengyu Zhao; Martha Larson

arXiv:2502.08193·cs.CR·February 13, 2025

Typographic Attacks in a Multi-Image Setting

Xiaomeng Wang, Zhengyu Zhao, Martha Larson

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores multi-image typographic attacks on large vision-language models, proposing new strategies that improve attack success rates and stealthiness in multi-image scenarios, with demonstrated transferability across models.

Contribution

Introduces a multi-image setting for typographic attacks, developing strategies that leverage text-image similarity to enhance attack success and stealth in vision-language models.

Findings

01

Text-image similarity improves attack success by 21%.

02

Multi-image attacks are more stealthy and transferable.

03

Proposes two novel attack strategies for multi-image scenarios.

Abstract

Large Vision-Language Models (LVLMs) are susceptible to typographic attacks, which are misclassifications caused by an attack text that is added to an image. In this paper, we introduce a multi-image setting for studying typographic attacks, broadening the current emphasis of the literature on attacking individual images. Specifically, our focus is on attacking image sets without repeating the attack query. Such non-repeating attacks are stealthier, as they are more likely to evade a gatekeeper than attacks that repeat the same attack text. We introduce two attack strategies for the multi-image setting, leveraging the difficulty of the target image, the strength of the attack text, and text-image similarity. Our text-image similarity approach improves attack success rates by 21% over random, non-specific methods on the CLIP model using ImageNet while maintaining stealth in a multi-image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaomengwang-ai/typographic-attacks-in-a-multi-image-setting
pytorchOfficial

Videos

Typographic Attacks in a Multi-Image Setting· underline

Taxonomy

TopicsDigital Media Forensic Detection