Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

Jafar Isbarov; Murat Kantarcioglu

arXiv:2602.05066·cs.CR·February 26, 2026

Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

Jafar Isbarov, Murat Kantarcioglu

PDF

Open Access

TL;DR

This paper reveals that current monitoring defenses for AI agents can be bypassed through novel Agent-as-a-Proxy attacks, exposing fundamental vulnerabilities in existing oversight methods regardless of model size.

Contribution

It introduces the Agent-as-a-Proxy attack method, demonstrating its effectiveness against large-scale monitoring models and exposing fundamental flaws in current AI oversight strategies.

Findings

01

Agent-as-a-Proxy attacks bypass monitoring defenses

02

Even large-scale monitors are vulnerable

03

High attack success rates on benchmark tests

Abstract

As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring protocols that jointly evaluate an agent's Chain-of-Thought (CoT) and tool-use actions to ensure alignment with user intent. We demonstrate that these monitoring-based defenses can be bypassed via a novel Agent-as-a-Proxy attack, where prompt injection attacks treat the agent as a delivery mechanism, bypassing both agent and monitor simultaneously. While prior work on scalable oversight has focused on whether small monitors can supervise large agents, we show that even frontier-scale monitors are vulnerable. Large-scale monitoring models like Qwen2.5-72B can be bypassed by agents with similar capabilities, such as GPT-4o mini and Llama-3.1-70B. On the AgentDojo benchmark, we achieve a high attack success rate against AlignmentCheck and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques