Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems

Abhivansh Gupta

arXiv:2512.17259·cs.MA·December 22, 2025

Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems

Abhivansh Gupta

PDF

Open Access 1 Video

TL;DR

This paper introduces a verifiability-first architecture for autonomous LLM agents, combining cryptographic attestations, lightweight audit agents, and challenge protocols to improve controllability and detect misalignment swiftly.

Contribution

It proposes a novel architecture integrating cryptographic and symbolic attestations, lightweight verification agents, and a new benchmark suite for measuring detection and resilience of misalignment.

Findings

01

Enhanced detection speed of misalignment behaviors

02

Improved resilience against adversarial prompt injections

03

Benchmark suite OPERA effectively measures verifiability performance

Abstract

As LLM-based agents grow more autonomous and multi-modal, ensuring they remain controllable, auditable, and faithful to deployer intent becomes critical. Prior benchmarks measured the propensity for misaligned behavior and showed that agent personalities and tool access significantly influence misalignment. Building on these insights, we propose a Verifiability-First architecture that (1) integrates run-time attestations of agent actions using cryptographic and symbolic methods, (2) embeds lightweight Audit Agents that continuously verify intent versus behavior using constrained reasoning, and (3) enforces challenge-response attestation protocols for high-risk operations. We introduce OPERA (Observability, Provable Execution, Red-team, Attestation), a benchmark suite and evaluation protocol designed to measure (i) detectability of misalignment, (ii) time to detection under stealthy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques