Mythos and the Unverified Cage: Z3-Based Pre-Deployment Verification for Frontier-Model Sandbox Infrastructure
Dominik Blain

TL;DR
This paper introduces COBALT, a Z3-based formal verification tool for detecting arithmetic vulnerabilities in AI infrastructure, demonstrated on real-world codebases, and proposes a containment framework for frontier-model safety.
Contribution
The paper presents a novel Z3 SMT-based verification engine, COBALT, for pre-deployment detection of arithmetic vulnerabilities in AI infrastructure, and a comprehensive containment framework.
Findings
COBALT detects vulnerability patterns with concrete witnesses and guarantees.
Validated on four production codebases with reproducible results.
The Mythos escape class aligns with Z3-expressible CWE-190 vulnerabilities.
Abstract
The April 2026 Claude Mythos sandbox escape exposed a critical weakness in frontier AI containment: the infrastructure surrounding advanced models remains susceptible to formally characterizable arithmetic vulnerabilities. Anthropic has not publicly characterized the escape vector; some secondary accounts hypothesize a CWE-190 arithmetic vulnerability in sandbox networking code. We treat this as unverified and analyze the vulnerability class rather than the specific escape. This paper presents COBALT, a Z3 SMT-based formal verification engine for identifying CWE-190/191/195 arithmetic vulnerability patterns in C/C++ infrastructure prior to deployment. We distinguish two classes of contribution. Validated: COBALT detects arithmetic vulnerability patterns in production codebases, producing SAT verdicts with concrete witnesses and UNSAT guarantees under explicit safety bounds. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
