AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

Changyi Li; Pengfei Lu; Xudong Pan; Fazl Barez; Min Yang

arXiv:2603.07427·cs.AI·March 17, 2026

AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang

PDF

Open Access

TL;DR

AutoControl Arena introduces an automated, logic-narrative decoupled framework for evaluating frontier AI risks, significantly reducing hallucinations and revealing diverse risk patterns across models and scenarios.

Contribution

It presents a novel automated framework that combines deterministic code with LLMs for scalable, reliable AI risk evaluation, outperforming existing simulators.

Findings

01

Over 98% success rate in risk evaluation

02

Risk rates increase under environmental stress, especially in capable models

03

Different models exhibit distinct misalignment behaviors

Abstract

As Large Language Models (LLMs) evolve into autonomous agents, existing safety evaluations face a fundamental trade-off: manual benchmarks are costly, while LLM-based simulators are scalable but suffer from logic hallucination. We present AutoControl Arena, an automated framework for frontier AI risk evaluation built on the principle of logic-narrative decoupling. By grounding deterministic state in executable code while delegating generative dynamics to LLMs, we mitigate hallucination while maintaining flexibility. This principle, instantiated through a three-agent framework, achieves over 98% end-to-end success and 60% human preference over existing simulators. To elicit latent risks, we vary environmental Stress and Temptation across X-Bench (70 scenarios, 7 risk categories). Evaluating 9 frontier models reveals: (1) Alignment Illusion: risk rates surge from 21.7% to 54.5% under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI