Ruby Teaming: Improving Quality Diversity Search with Memory for   Automated Red Teaming

Vernon Toh Yan Han; Rishabh Bhardwaj; Soujanya Poria

arXiv:2406.11654·cs.CL·June 18, 2024·1 cites

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Vernon Toh Yan Han, Rishabh Bhardwaj, Soujanya Poria

PDF

Open Access

TL;DR

Ruby Teaming enhances automated red teaming by incorporating a memory cache, significantly increasing attack success rates and diversity of prompts compared to previous methods.

Contribution

It introduces a memory-augmented approach to improve quality and diversity in automated red teaming prompts.

Findings

01

Achieved 74% attack success rate, 20% higher than baseline.

02

Outperformed Rainbow Teaming by 6% and 3% on diversity indices.

03

Demonstrated improved prompt quality and diversity in red teaming tasks.

Abstract

We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity, Ruby Teaming outperforms Rainbow Teaming by 6% and 3% on Shannon's Evenness Index (SEI) and Simpson's Diversity Index (SDI), respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications