Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming

Rui Li; Peiyi Wang; Jingyuan Ma; Di Zhang; Lei Sha; Zhifang Sui

arXiv:2502.16109·cs.CL·February 25, 2025

Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming

Rui Li, Peiyi Wang, Jingyuan Ma, Di Zhang, Lei Sha, Zhifang Sui

PDF

Open Access

TL;DR

This paper introduces RTPE, a scalable prompt evolution framework that automatically generates diverse and high-quality red teaming prompts to evaluate and improve the safety of large language models.

Contribution

RTPE is a novel framework that automates the creation of diverse red teaming prompts, improving scalability and effectiveness over manual methods.

Findings

01

RTPE outperforms existing methods in attack success rate.

02

RTPE generates more diverse prompts.

03

Analysis of 4,800 prompts across 8 LLMs and topics.

Abstract

Large Language Models (LLMs) have gained increasing attention for their remarkable capacity, alongside concerns about safety arising from their potential to produce harmful content. Red teaming aims to find prompts that could elicit harmful responses from LLMs, and is essential to discover and mitigate safety risks before real-world deployment. However, manual red teaming is both time-consuming and expensive, rendering it unscalable. In this paper, we propose RTPE, a scalable evolution framework to evolve red teaming prompts across both breadth and depth dimensions, facilitating the automatic generation of numerous high-quality and diverse red teaming prompts. Specifically, in-breadth evolving employs a novel enhanced in-context learning method to create a multitude of quality prompts, whereas in-depth evolving applies customized transformation operations to enhance both content and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making

MethodsSoftmax · Attention Is All You Need