DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Zhaorun Chen; Xun Liu; Haibo Tong; Chengquan Guo; Yuzhou Nie; Jiawei Zhang; Mintong Kang; Chejian Xu; Qichang Liu; Xiaogeng Liu; Tianneng Shi; Chaowei Xiao; Sanmi Koyejo; Percy Liang; Wenbo Guo; Dawn Song; Bo Li

arXiv:2605.04808·cs.AI·May 7, 2026

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie, Jiawei Zhang, Mintong Kang, Chejian Xu, Qichang Liu, Xiaogeng Liu, Tianneng Shi, Chaowei Xiao, Sanmi Koyejo, Percy Liang, Wenbo Guo, Dawn Song, Bo Li

PDF

1 Repo

TL;DR

The paper introduces DTap, a controllable, interactive red-teaming platform for AI agents across multiple domains, along with DTap-Red, an autonomous attack agent, to evaluate and improve agent security.

Contribution

It presents the first comprehensive, scalable platform and autonomous red-teaming agent for assessing AI agent security in realistic, diverse environments.

Findings

01

Identifies systematic vulnerabilities in popular AI agents.

02

Creates a large-scale red-teaming dataset with automated validation.

03

Provides insights for developing more secure AI agents.

Abstract

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have shown that adversaries can easily manipulate agents into performing harmful actions, such as leaking API keys, deleting user data, or initiating unauthorized transactions. Evaluating agent security is inherently challenging, as agents operate in dynamic, untrusted environments involving external tools, heterogeneous data sources, and frequent user interactions. However, realistic, controllable, and reproducible environments for large-scale risk assessment remain largely underexplored. To address this gap, we introduce the DecodingTrust-Agent Platform (DTap), the first controllable and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-secure/DecodingTrust-Agent
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.