Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM   Agents Exponentially Fast

Xiangming Gu; Xiaosen Zheng; Tianyu Pang; Chao Du; Qian Liu; Ye Wang,; Jing Jiang; Min Lin

arXiv:2402.08567·cs.CL·June 4, 2024·2 cites

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang,, Jing Jiang, Min Lin

PDF

Open Access 1 Repo

TL;DR

This paper reveals a new severe safety risk called infectious jailbreak in multimodal large language model agents, demonstrating how a single adversarial image can rapidly infect and cause harmful behaviors across up to one million agents in simulated environments.

Contribution

It introduces the concept of infectious jailbreak in multi-agent LLM systems, showing how adversarial images can propagate harmful behaviors exponentially without further intervention.

Findings

01

Feeding an adversarial image to one agent can infect all agents rapidly.

02

Infection spread is exponential in multi-agent environments.

03

A simple principle for potential defense mechanisms is derived.

Abstract

A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming efforts have revealed that adversarial images/prompts can jailbreak an MLLM and cause unaligned behaviors. In this work, we report an even more severe safety issue in multi-agent environments, referred to as infectious jailbreak. It entails the adversary simply jailbreaking a single agent, and without any further intervention from the adversary, (almost) all agents will become infected exponentially fast and exhibit harmful behaviors. To validate the feasibility of infectious jailbreak, we simulate multi-agent environments containing up to one million LLaVA-1.5 agents, and employ randomized pair-wise chat as a proof-of-concept instantiation for multi-agent interaction. Our results show that feeding an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sail-sg/agent-smith
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Vehicle License Plate Recognition · Artificial Intelligence in Law