Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Christoph Schnabl; Daniel Hugenroth; Bill Marino; Alastair R. Beresford

arXiv:2506.23706·cs.AI·July 1, 2025

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Christoph Schnabl, Daniel Hugenroth, Bill Marino, Alastair R. Beresford

PDF

Open Access

TL;DR

This paper introduces Attestable Audits, a method using Trusted Execution Environments to provide verifiable and confidential AI safety benchmarks, addressing trust and privacy issues in AI model evaluation.

Contribution

It presents a novel approach to verifiable AI safety benchmarks that preserves confidentiality and trust using Trusted Execution Environments.

Findings

01

Prototype demonstrates feasibility on Llama-3.1 benchmarks.

02

Ensures verifiable interaction with AI models in untrusted settings.

03

Protects sensitive data during audits.

Abstract

Benchmarks are important measures to evaluate safety and compliance of AI models at scale. However, they typically do not offer verifiable results and lack confidentiality for model IP and benchmark datasets. We propose Attestable Audits, which run inside Trusted Execution Environments and enable users to verify interaction with a compliant AI model. Our work protects sensitive data even when model provider and auditor do not trust each other. This addresses verification challenges raised in recent AI governance frameworks. We build a prototype demonstrating feasibility on typical audit benchmarks against Llama-3.1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques