Evil Geniuses: Delving into the Safety of LLM-based Agents

Yu Tian; Xiao Yang; Jingyuan Zhang; Yinpeng Dong; Hang Su

arXiv:2311.11855·cs.CL·February 5, 2024·6 cites

Evil Geniuses: Delving into the Safety of LLM-based Agents

Yu Tian, Xiao Yang, Jingyuan Zhang, Yinpeng Dong, Hang Su

PDF

Open Access 1 Repo

TL;DR

This paper investigates the safety risks of LLM-based agents by proposing attack strategies that reveal their vulnerabilities, demonstrating that these agents are less robust and more prone to harmful behaviors than standalone LLMs.

Contribution

It introduces Evil Geniuses, an autonomous attack method that tests the safety of LLM-based agents across various roles and attack levels, highlighting their safety challenges.

Findings

01

Agents are less robust and more vulnerable to attacks.

02

LLM-based agents can generate more harmful and stealthier content.

03

The proposed attack methods achieve high success rates in evaluations.

Abstract

Rapid advancements in large language models (LLMs) have revitalized in LLM-based agents, exhibiting impressive human-like behaviors and cooperative capabilities in various scenarios. However, these agents also bring some exclusive risks, stemming from the complexity of interaction environments and the usability of tools. This paper delves into the safety of LLM-based agents from three perspectives: agent quantity, role definition, and attack level. Specifically, we initially propose to employ a template-based attack strategy on LLM-based agents to find the influence of agent quantity. In addition, to address interaction environment and role specificity issues, we introduce Evil Geniuses (EG), an effective attack method that autonomously generates prompts related to the original role to examine the impact across various role definitions and attack levels. EG leverages Red-Blue exercises,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

t1ans1r/evil-geniuses
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification

MethodsAttention Is All You Need · Cosine Annealing · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Residual Connection · Linear Warmup With Cosine Annealing · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing