AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang

TL;DR
AutoControl Arena introduces an automated, logic-narrative decoupled framework for evaluating frontier AI risks, significantly reducing hallucinations and revealing diverse risk patterns across models and scenarios.
Contribution
It presents a novel automated framework that combines deterministic code with LLMs for scalable, reliable AI risk evaluation, outperforming existing simulators.
Findings
Over 98% success rate in risk evaluation
Risk rates increase under environmental stress, especially in capable models
Different models exhibit distinct misalignment behaviors
Abstract
As Large Language Models (LLMs) evolve into autonomous agents, existing safety evaluations face a fundamental trade-off: manual benchmarks are costly, while LLM-based simulators are scalable but suffer from logic hallucination. We present AutoControl Arena, an automated framework for frontier AI risk evaluation built on the principle of logic-narrative decoupling. By grounding deterministic state in executable code while delegating generative dynamics to LLMs, we mitigate hallucination while maintaining flexibility. This principle, instantiated through a three-agent framework, achieves over 98% end-to-end success and 60% human preference over existing simulators. To elicit latent risks, we vary environmental Stress and Temptation across X-Bench (70 scenarios, 7 risk categories). Evaluating 9 frontier models reveals: (1) Alignment Illusion: risk rates surge from 21.7% to 54.5% under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
