GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Zhen Xiang; Linzhi Zheng; Yanjie Li; Junyuan Hong; Qinbin Li; Han Xie; Jiawei Zhang; Zidi Xiong; Chulin Xie; Carl Yang; Dawn Song; Bo Li

arXiv:2406.09187·cs.LG·May 30, 2025

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li

PDF

Open Access

TL;DR

This paper introduces GuardAgent, a novel safeguard system for LLM agents that uses knowledge-enabled reasoning to generate and execute safety guardrails, significantly reducing unsafe actions in various agent environments.

Contribution

We propose GuardAgent, the first guardrail agent that dynamically generates and executes safety policies for LLM agents using reasoning and memory-based retrieval.

Findings

01

Achieves over 98% guardrail accuracy on healthcare agents.

02

Attains over 83% guardrail accuracy on web agents.

03

Effectively moderates unsafe actions across different agent types.

Abstract

The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security. In this paper, we propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution. By performing the code execution, GuardAgent can deterministically follow the safety guard request and safeguard target agents. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing experiences from previous tasks. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience · Network Security and Intrusion Detection · Access Control and Trust

MethodsSparse Evolutionary Training