Self-interpreting Adversarial Images

Tingwei Zhang; Collin Zhang; John X. Morris; Eugene Bagdasarian; Vitaly Shmatikov

arXiv:2407.08970·cs.CR·June 16, 2025·1 cites

Self-interpreting Adversarial Images

Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasarian, Vitaly Shmatikov

PDF

Open Access 1 Repo

TL;DR

This paper introduces self-interpreting adversarial images that embed hidden instructions to manipulate visual language models' responses and style, posing new security challenges and requiring defenses.

Contribution

It presents a novel attack method using self-interpreting images as soft prompts to control model outputs and interpretations, expanding prompt injection techniques.

Findings

01

Self-interpreting images can effectively steer model responses and styles.

02

The attacks are natural-looking and coherent, making detection difficult.

03

Potential for misuse in spreading misinformation and spam.

Abstract

We introduce a new type of indirect, cross-modal injection attacks against visual language models that enable creation of self-interpreting images. These images contain hidden "meta-instructions" that control how models answer users' questions about the image and steer models' outputs to express an adversary-chosen style, sentiment, or point of view. Self-interpreting images act as soft prompts, conditioning the model to satisfy the adversary's (meta-)objective while still producing answers based on the image's visual content. Meta-instructions are thus a stronger form of prompt injection. Adversarial images look natural and the model's answers are coherent and plausible, yet they also follow the adversary-chosen interpretation, e.g., political spin, or even objectives that are not achievable with explicit text instructions. We evaluate the efficacy of self-interpreting images for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tingwei-zhang/soft-prompts-go-hard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling