Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows
Akshey Sigdel, Rista Baral

TL;DR
This paper introduces Policy-First Tooling, a model-agnostic permission layer that enhances safety and control in tool-using automation systems through explicit policies, risk management, and auditability.
Contribution
It presents a novel policy DSL, a runtime enforcement architecture, and a reproducible benchmark for evaluating safety and utility trade-offs in automated workflows.
Findings
Stricter policies significantly reduce violations.
Task success decreases with stricter policies.
Leakage recall improves with targeted secret output detection.
Abstract
Tool-using automation systems, from scripts and CI bots to agentic assistants, fail in recurring patterns. Common failures include unsafe side effects, invalid arguments, uncontrolled retries, and leakage of sensitive outputs. Many mitigations are model-centric and prompt-dependent, so they are brittle and do not generalize to non-LLM callers. We present Policy-First Tooling, a model-agnostic permission layer that mediates tool invocation through explicit constraints, risk-aware gating, recovery controls, and auditable explanations. The paper contributes a compact policy DSL, a runtime enforcement architecture with actionable rationale and fix hints, and a reproducible benchmark based on trace replay with controlled fault and misuse injection. In 225 controlled runs across five policy packs and three fault profiles, stricter packs improve violation prevention from 0.000 in P0 to 0.681…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Software System Performance and Reliability · Advanced Software Engineering Methodologies
