Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

Xisen Jin; Michael Duan; Qin Lin; Aaron Chan; Zhenglun Chen; Junyi Du; Xiang Ren

arXiv:2603.05786·cs.CR·March 9, 2026

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

Xisen Jin, Michael Duan, Qin Lin, Aaron Chan, Zhenglun Chen, Junyi Du, Xiang Ren

PDF

Open Access

TL;DR

This paper introduces proof-of-guardrail, a cryptographic system that verifies AI safety guardrails are correctly executed within a Trusted Execution Environment, enhancing trustworthiness of AI responses while maintaining developer privacy.

Contribution

It presents a novel cryptographic proof system for verifying AI safety guardrails using TEEs, addressing trust issues in AI safety claims.

Findings

01

Proof-of-guardrail effectively verifies guardrail execution.

02

System incurs acceptable latency overhead.

03

Maintains developer privacy while ensuring guardrail integrity.

Abstract

As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces a threat where safety measures are falsely advertised. To address the threat, we propose proof-of-guardrail, a system that enables developers to provide cryptographic proof that a response is generated after a specific open-source guardrail. To generate proof, the developer runs the agent and guardrail inside a Trusted Execution Environment (TEE), which produces a TEE-signed attestation of guardrail code execution verifiable by any user offline. We implement proof-of-guardrail for OpenClaw agents and evaluate latency overhead and deployment cost. Proof-of-guardrail ensures integrity of guardrail execution while keeping the developer's agent private, but we also highlight a risk of deception about safety, for example, when malicious…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques