Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

Raja Sekhar Rao Dheekonda; Will Pearce; Nick Landers

arXiv:2605.04019·cs.AI·May 6, 2026

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

Raja Sekhar Rao Dheekonda, Will Pearce, Nick Landers

PDF

TL;DR

This paper presents an AI red teaming agent that automates workflow creation, significantly reducing the time from weeks to hours, and supports probing diverse AI systems with minimal human effort.

Contribution

The paper introduces an agentic framework that automates red teaming workflows, unifies probing methods for different AI systems, and demonstrates effectiveness through a Llama Scout case study.

Findings

01

Operator time for red teaming reduced from weeks to hours.

02

Achieved 85% attack success rate on Meta Llama Scout.

03

Unified framework supports diverse AI system probing.

Abstract

AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is a primary defense, current approaches force operators into manual, library-specific workflows. Operators spend weeks hand-crafting workflows - assembling attacks, transforms, and scorers. When results fall short, workflows must be rebuilt. As a result, operators spend more time constructing workflows than probing targets for security and safety vulnerabilities. We introduce an AI red teaming agent built on the open-source Dreadnode SDK. The agent creates workflows grounded in 45+ adversarial attacks, 450+ transforms, and 130+ scorers. Operators can probe multi-agent systems, multilingual, and multimodal targets, focusing on what to probe rather than how to implement it. We make three contributions: 1. Agentic interface. Operators…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.