The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
Phongsakon Mark Konrad, Tim Lukas Adam, Ane Cathrine Holst Merrild, Riccardo Terrenzi, Rebecca De Rosa, Toygar Tanyel, Serkan Ayvaz

TL;DR
The paper argues that AI deployment safety should rely on calibrated verification regimes focusing on domain-specific authorization, monitoring, and accountability rather than solely on mechanistic interpretability.
Contribution
It introduces the concept of verification coverage as a standard for AI deployment, emphasizing domain-scoped checks over internal model explanations.
Findings
A 53-percentage-point gap exists between internal representations and output correction.
Only 9.0% of FDA-approved AI device documents include post-market surveillance.
Verification should be domain-specific, monitorable, and revocable, not just interpretable.
Abstract
AI deployment in sensitive domains such as health care, credit, employment, and criminal justice is often treated as unsafe to authorize until model internals can be explained. This often leads to an excessive reliance on mechanistic interpretability to address a deployment challenge beyond its intended scope. We argue that the gate should instead be calibrated verification: authorization should be domain-scoped, independently checkable, monitored after release, accountable, contestable, and revocable. The reason is twofold. First, model capability is uneven across nearby tasks, so authorization must attach to a specific use rather than to a model in general. Second, societies have long governed opaque expertise through credentials, monitoring, liability, appeal, and revocation rather than mechanism-level explanation. Recent evidence reinforces this distinction between mechanistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
