Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong; Yining She; Eunsuk Kang; Christopher S. Timperley; Christian K\"astner

arXiv:2604.15579·cs.SE·April 20, 2026

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong, Yining She, Eunsuk Kang, Christopher S. Timperley, Christian K\"astner

PDF

1 Repo

TL;DR

This paper advocates for symbolic guardrails as a practical method to enhance safety and security guarantees in domain-specific AI agents without compromising their utility.

Contribution

It provides a systematic review of safety benchmarks, analyzes the enforceability of policies by symbolic guardrails, and evaluates their impact on agent safety and success.

Findings

01

85% of benchmarks lack concrete policies

02

74% of policy requirements can be enforced by symbolic guardrails

03

Symbolic guardrails improve safety and security without utility loss

Abstract

AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on $τ^{2}$ -Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyn0027/agent-symbolic-guardrails
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.