Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
Asa Cooper Stickland, Jan Michelfeit, Arathi Mani, Charlie Griffin, Ollie Matthews, Tomek Korbak, Rogan Inglis, Oliver Makins, Alan Cooney

TL;DR
This paper explores asynchronous monitoring as a means to detect and prevent sabotage by LLM agents in software engineering environments, using an adversarial game framework and a suite of simulated tasks.
Contribution
It introduces a novel adversarial game approach for monitor development and evaluates an ensemble monitor's effectiveness in high-stakes scenarios.
Findings
Ensemble monitor achieved 6% false negative rate at 1% false positive rate.
Developed a risk estimation model for sabotage detection.
Simulated diverse software engineering environments for realistic testing.
Abstract
LLM-based software engineering agents are increasingly used in real-world development tasks, often with access to sensitive data or security-critical codebases. Such agents could intentionally sabotage these codebases if they were misaligned. We investigate asynchronous monitoring, in which a monitoring system reviews agent actions after the fact. Unlike synchronous monitoring, this approach does not impose runtime latency, while still attempting to disrupt attacks before irreversible harm occurs. We treat monitor development as an adversarial game between a blue team (who design monitors) and a red team (who create sabotaging agents). We attempt to set the game rules such that they upper bound the sabotage potential of an agent based on Claude 4.1 Opus. To ground this game in a realistic, high-stakes deployment scenario, we develop a suite of 5 diverse software engineering environments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Software Engineering Research
