Evil Geniuses: Delving into the Safety of LLM-based Agents
Yu Tian, Xiao Yang, Jingyuan Zhang, Yinpeng Dong, Hang Su

TL;DR
This paper investigates the safety risks of LLM-based agents by proposing attack strategies that reveal their vulnerabilities, demonstrating that these agents are less robust and more prone to harmful behaviors than standalone LLMs.
Contribution
It introduces Evil Geniuses, an autonomous attack method that tests the safety of LLM-based agents across various roles and attack levels, highlighting their safety challenges.
Findings
Agents are less robust and more vulnerable to attacks.
LLM-based agents can generate more harmful and stealthier content.
The proposed attack methods achieve high success rates in evaluations.
Abstract
Rapid advancements in large language models (LLMs) have revitalized in LLM-based agents, exhibiting impressive human-like behaviors and cooperative capabilities in various scenarios. However, these agents also bring some exclusive risks, stemming from the complexity of interaction environments and the usability of tools. This paper delves into the safety of LLM-based agents from three perspectives: agent quantity, role definition, and attack level. Specifically, we initially propose to employ a template-based attack strategy on LLM-based agents to find the influence of agent quantity. In addition, to address interaction environment and role specificity issues, we introduce Evil Geniuses (EG), an effective attack method that autonomously generates prompts related to the original role to examine the impact across various role definitions and attack levels. EG leverages Red-Blue exercises,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification
MethodsAttention Is All You Need · Cosine Annealing · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Residual Connection · Linear Warmup With Cosine Annealing · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing
