A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns

Tianyi Men; Pengfei Cao; Zhuoran Jin; Yubo Chen; Kang Liu; Jun Zhao

arXiv:2410.16155·cs.CL·June 27, 2025

A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns

Tianyi Men, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces TMCHT, a large-scale multi-agent attack framework, and proposes ARCJ to effectively induce jailbreaks in multi-agent systems, highlighting security vulnerabilities in such environments.

Contribution

The paper presents TMCHT for evaluating multi-agent jailbreak attacks and introduces ARCJ, a novel method to enhance attack effectiveness in complex multi-agent topologies.

Findings

01

ARCJ improves attack success rates significantly.

02

TMCHT reveals vulnerabilities in multi-agent memory systems.

03

Proposes solutions for large-scale multi-agent security challenges.

Abstract

With the development of large language models, they are widely used as agents in various fields. A key component of agents is memory, which stores vital information but is susceptible to jailbreak attacks. Existing research mainly focuses on single-agent attacks and shared memory attacks. However, real-world scenarios often involve independent memory. In this paper, we propose the Troublemaker Makes Chaos in Honest Town (TMCHT) task, a large-scale, multi-agent, multi-topology text-based attack evaluation framework. TMCHT involves one attacker agent attempting to mislead an entire society of agents. We identify two major challenges in multi-agent attacks: (1) Non-complete graph structure, (2) Large-scale systems. We attribute these challenges to a phenomenon we term toxicity disappearing. To address these issues, we propose an Adversarial Replication Contagious Jailbreak (ARCJ) method,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns· underline

Taxonomy

TopicsCrime Patterns and Interventions · Crime, Illicit Activities, and Governance

MethodsSoftmax · Attention Is All You Need