Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours
Raja Sekhar Rao Dheekonda, Will Pearce, Nick Landers

TL;DR
This paper presents an AI red teaming agent that automates workflow creation, significantly reducing the time from weeks to hours, and supports probing diverse AI systems with minimal human effort.
Contribution
The paper introduces an agentic framework that automates red teaming workflows, unifies probing methods for different AI systems, and demonstrates effectiveness through a Llama Scout case study.
Findings
Operator time for red teaming reduced from weeks to hours.
Achieved 85% attack success rate on Meta Llama Scout.
Unified framework supports diverse AI system probing.
Abstract
AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is a primary defense, current approaches force operators into manual, library-specific workflows. Operators spend weeks hand-crafting workflows - assembling attacks, transforms, and scorers. When results fall short, workflows must be rebuilt. As a result, operators spend more time constructing workflows than probing targets for security and safety vulnerabilities. We introduce an AI red teaming agent built on the open-source Dreadnode SDK. The agent creates workflows grounded in 45+ adversarial attacks, 450+ transforms, and 130+ scorers. Operators can probe multi-agent systems, multilingual, and multimodal targets, focusing on what to probe rather than how to implement it. We make three contributions: 1. Agentic interface. Operators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
