Geographic Blind Spots in AI Control Monitors: A Cross-National Audit of Claude Opus 4.6
Jason Hung

TL;DR
This study audits the geographic knowledge gaps of Claude Opus 4.6, revealing unexpected higher fabrication rates in Global North contexts, which pose risks for AI control reliability across countries.
Contribution
It introduces the AI Control Knowledge Framework (ACKF) and provides the first cross-national audit of AI control monitor geographic knowledge gaps.
Findings
Claude Opus 4.6 has higher fabrication rates for Global North queries.
Global North contexts are more prone to incorrect responses, contrary to initial expectations.
The study highlights a vulnerability that could be exploited to evade AI control detection.
Abstract
Artificial intelligence (AI) control protocols assume that trusted large language model (LLM) monitors reliably assess proposed actions across all deployment contexts. This paper tests that assumption in the geographic dimension. We audit Claude Opus 4.6-the monitor specified in Apart Research's AI Control Hackathon Track 3 benchmark-for systematic gaps in its factual knowledge of the global AI landscape. We develop the AI Control Knowledge Framework (ACKF), a six-dimension thematic scheme, and operationalise it with 17 verified indicators drawn from the Global AI Dataset v2 (GAID v2): 24,453 indicators across 227 countries published on Harvard Dataverse. A five-category response classification scheme distinguishes verifiable fabrication (VF) from honest refusal (HR); logistic regression with country-clustered standard errors combined with difference-in-differences (DiD) estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
