When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

Mingyu Luo; Zihan Zhang; Zesen Liu; Yuchong Xie; Zhixiang Zhang; Dung Hiu Hilton Yeung; Wai Ip Lai; Ping Chen; Ming Wen; Dongdong She

arXiv:2605.02187·cs.CR·May 5, 2026

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

Mingyu Luo, Zihan Zhang, Zesen Liu, Yuchong Xie, Zhixiang Zhang, Dung Hiu Hilton Yeung, Wai Ip Lai, Ping Chen, Ming Wen, Dongdong She

PDF

TL;DR

This paper exposes a critical security vulnerability in BYOK LLM agent architectures where malicious relays can tamper with responses post-alignment, leading to significant integrity risks.

Contribution

The paper formalizes the relay tampering attack (RTA), demonstrates its effectiveness across multiple LLMs, and proposes a time-based detection defense to mitigate this threat.

Findings

01

RTA achieves up to 99.1% success rate in attacks

02

Existing defenses do not fully prevent RTA

03

A time-based detection method can mitigate RTA

Abstract

Bring-Your-Own-Key (BYOK) agent architectures let users route LLM traffic through third-party relays, creating a critical integrity gap: a malicious relay can modify an aligned LLM response after generation but before agent execution. We formalize this post-alignment tampering threat and show that, without end-to-end integrity, the relay can observe, suppress, or replace downstream messages, making even perfectly aligned LLMs ineffective against such attacks. We instantiate this threat as the Relay Tampering Attack (RTA), which performs multi-round strategic rewriting, minimal security-critical edits, and stealth restoration by resubmitting tampered outputs to the upstream LLM. Across AgentDojo and ASB with six LLMs, RTA achieves up to 99.1% attack success, outperforming prompt-injection baselines with modest overhead. Case studies on OpenClaw and Claude Code demonstrate real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.