AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Dongrui Liu; Qihan Ren; Chen Qian; Shuai Shao; Yuejin Xie; Yu Li; Zhonghao Yang; Haoyu Luo; Peng Wang; Qingyu Liu; Binxin Hu; Ling Tang; Jilin Mei; Dadi Guo; Leitao Yuan; Junyao Yang; Guanxu Chen; Qihao Lin; Yi Yu; Bo Zhang; Jiaxuan Guo; Jie Zhang; Wenqi Shao; Huiqi Deng; Zhiheng Xi; Wenjie Wang; Wenxuan Wang; Wen Shen; Zhikai Chen; Haoyu Xie; Jialing Tao; Juntao Dai; Jiaming Ji; Zhongjie Ba; Linfeng Zhang; Yong Liu; Quanshi Zhang; Lei Zhu; Zhihua Wei; Hui Xue; Chaochao Lu; Jing Shao; Xia Hu

arXiv:2601.18491·cs.AI·April 24, 2026

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Dongrui Liu, Qihan Ren, Chen Qian, Shuai Shao, Yuejin Xie, Yu Li, Zhonghao Yang, Haoyu Luo, Peng Wang, Qingyu Liu, Binxin Hu, Ling Tang, Jilin Mei, Dadi Guo, Leitao Yuan, Junyao Yang, Guanxu Chen, Qihao Lin, Yi Yu, Bo Zhang, Jiaxuan Guo, Jie Zhang, Wenqi Shao, Huiqi Deng

PDF

1 Repo 9 Models 7 Datasets

TL;DR

AgentDoG introduces a hierarchical taxonomy, a new safety benchmark, and a diagnostic framework for improving AI agent safety, transparency, and root cause analysis in complex scenarios.

Contribution

It presents a novel agentic risk taxonomy, a fine-grained safety benchmark, and a diagnostic guardrail framework with open-source models for enhanced AI safety.

Findings

01

AgentDoG achieves state-of-the-art safety moderation performance.

02

Models and datasets are openly released for community use.

03

The framework enables root cause diagnosis of unsafe actions.

Abstract

The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Guided by this structured and hierarchical taxonomy, we introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG). AgentDoG provides fine-grained and contextual monitoring across agent trajectories. More Crucially, AgentDoG can diagnose the root causes of unsafe actions and seemingly safe but unreasonable actions, offering provenance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai45lab/AgentDoG
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.