BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents
Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, Yu-Gang Jiang

TL;DR
This paper introduces BackdoorAgent, a comprehensive framework for analyzing backdoor vulnerabilities in LLM-based agents across multiple workflow stages, revealing significant persistence and propagation of triggers.
Contribution
It provides a unified, stage-aware analysis framework and benchmark for backdoor attacks in LLM agents, addressing the fragmented prior studies.
Findings
Triggers can persist across multiple steps in agent workflows.
High propagation rates of backdoor triggers in different stages.
Vulnerabilities exist throughout the agentic workflow.
Abstract
Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surface for backdoor threats. Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs. However, existing studies remain fragmented and typically analyze individual attack vectors in isolation, leaving the cross-stage interaction and propagation of backdoor triggers poorly understood from an agent-centric perspective. To fill this gap, we propose \textbf{BackdoorAgent}, a modular and stage-aware framework that provides a unified, agent-centric view of backdoor threats in LLM agents. BackdoorAgent structures the attack surface into three functional stages of agentic workflows, including \textbf{planning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Security and Verification in Computing
