The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime

Phongsakon Mark Konrad; Tim Lukas Adam; Ane Cathrine Holst Merrild; Riccardo Terrenzi; Rebecca De Rosa; Toygar Tanyel; Serkan Ayvaz

arXiv:2605.10601·cs.AI·May 12, 2026

The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime

Phongsakon Mark Konrad, Tim Lukas Adam, Ane Cathrine Holst Merrild, Riccardo Terrenzi, Rebecca De Rosa, Toygar Tanyel, Serkan Ayvaz

PDF

TL;DR

The paper argues that AI deployment safety should rely on calibrated verification regimes focusing on domain-specific authorization, monitoring, and accountability rather than solely on mechanistic interpretability.

Contribution

It introduces the concept of verification coverage as a standard for AI deployment, emphasizing domain-scoped checks over internal model explanations.

Findings

01

A 53-percentage-point gap exists between internal representations and output correction.

02

Only 9.0% of FDA-approved AI device documents include post-market surveillance.

03

Verification should be domain-specific, monitorable, and revocable, not just interpretable.

Abstract

AI deployment in sensitive domains such as health care, credit, employment, and criminal justice is often treated as unsafe to authorize until model internals can be explained. This often leads to an excessive reliance on mechanistic interpretability to address a deployment challenge beyond its intended scope. We argue that the gate should instead be calibrated verification: authorization should be domain-scoped, independently checkable, monitored after release, accountable, contestable, and revocable. The reason is twofold. First, model capability is uneven across nearby tasks, so authorization must attach to a specific use rather than to a model in general. Second, societies have long governed opaque expertise through credentials, monitoring, liability, appeal, and revocation rather than mechanism-level explanation. Recent evidence reinforces this distinction between mechanistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.