The Wolf Within: Covert Injection of Malice into MLLM Societies via an   MLLM Operative

Zhen Tan; Chengshuai Zhao; Raha Moraffah; Yifan Li; Yu Kong; Tianlong; Chen; Huan Liu

arXiv:2402.14859·cs.CR·June 4, 2024·1 cites

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong, Chen, Huan Liu

PDF

Open Access 1 Repo

TL;DR

This paper uncovers a novel vulnerability in Multimodal Large Language Model (MLLM) societies, where a single malicious agent can covertly influence others to propagate harmful content, posing significant security risks.

Contribution

It introduces the concept of covert injection of malice in MLLM societies through indirect prompt manipulation, highlighting a new threat dimension in AI security.

Findings

01

MLLM agents can be manipulated to generate malicious prompts.

02

Malicious prompts can be transferred and propagated within the society.

03

This vulnerability enables widespread dissemination of harmful content.

Abstract

Due to their unprecedented ability to process and respond to various types of data, Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI). As these advanced generative models increasingly form collaborative networks for complex tasks, the integrity and security of these systems are crucial. Our paper, ``The Wolf Within'', explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content. Unlike direct harmful output generation for MLLMs, our research demonstrates how a single MLLM agent can be subtly influenced to generate prompts that, in turn, induce other MLLM agents in the society to output malicious content. Our findings reveal that, an MLLM agent, when manipulated to produce specific prompts or instructions, can effectively ``infect'' other agents within a society of MLLMs. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chengshuaizhao0/the-wolf-within
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Cryptography and Data Security · Cellular Automata and Applications