AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Dongrui Liu, Qihan Ren, Chen Qian, Shuai Shao, Yuejin Xie, Yu Li, Zhonghao Yang, Haoyu Luo, Peng Wang, Qingyu Liu, Binxin Hu, Ling Tang, Jilin Mei, Dadi Guo, Leitao Yuan, Junyao Yang, Guanxu Chen, Qihao Lin, Yi Yu, Bo Zhang, Jiaxuan Guo, Jie Zhang, Wenqi Shao, Huiqi Deng

TL;DR
AgentDoG introduces a hierarchical taxonomy, a new safety benchmark, and a diagnostic framework for improving AI agent safety, transparency, and root cause analysis in complex scenarios.
Contribution
It presents a novel agentic risk taxonomy, a fine-grained safety benchmark, and a diagnostic guardrail framework with open-source models for enhanced AI safety.
Findings
AgentDoG achieves state-of-the-art safety moderation performance.
Models and datasets are openly released for community use.
The framework enables root cause diagnosis of unsafe actions.
Abstract
The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Guided by this structured and hierarchical taxonomy, we introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG). AgentDoG provides fine-grained and contextual monitoring across agent trajectories. More Crucially, AgentDoG can diagnose the root causes of unsafe actions and seemingly safe but unreasonable actions, offering provenance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗AI45Research/AgentDoG-Qwen3-4Bmodel· 453 dl· ♡ 23453 dl♡ 23
- 🤗AI45Research/AgentDoG-Qwen2.5-7Bmodel· 19 dl· ♡ 1019 dl♡ 10
- 🤗AI45Research/AgentDoG-Llama3.1-8Bmodel· 9 dl· ♡ 119 dl♡ 11
- 🤗AI45Research/AgentDoG-FG-Llama3.1-8Bmodel· 2 dl· ♡ 92 dl♡ 9
- 🤗AI45Research/AgentDoG-FG-Qwen3-4Bmodel· 78 dl· ♡ 978 dl♡ 9
- 🤗AI45Research/AgentDoG-FG-Qwen2.5-7Bmodel· 3 dl· ♡ 83 dl♡ 8
- 🤗Prince-1/AgentDoG-Qwen2.5-7B-Onnxmodel
- 🤗Prince-1/AgentDoG-Qwen2.5-7B-RKLLMmodel
- 🤗sudarsonisb/agentic-ai-security-governance-financialmodel
- AI45Research/ATBenchdataset· 1.9k dl1.9k dl
- tianyyuu/clawdbot_safety_testingdataset· 57 dl57 dl
- AI45Research/ATBench-Clawdataset· 596 dl596 dl
- AI45Research/ATBench-Codexdataset· 175 dl175 dl
- molmohsen/awesome-ai-agent-papersdataset· 41 dl41 dl
- HeySig/ATBench-Clawdataset· 136 dl136 dl
- AI45Research/AgentDoG1.0-Training-Datadataset· 52 dl52 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
