BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng; Yige Li; Yutao Wu; Yingshui Tan; Yanming Guo; Yifan Ding; Kun Zhai; Xingjun Ma; Yu-Gang Jiang

arXiv:2601.04566·cs.AI·January 13, 2026

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper introduces BackdoorAgent, a comprehensive framework for analyzing backdoor vulnerabilities in LLM-based agents across multiple workflow stages, revealing significant persistence and propagation of triggers.

Contribution

It provides a unified, stage-aware analysis framework and benchmark for backdoor attacks in LLM agents, addressing the fragmented prior studies.

Findings

01

Triggers can persist across multiple steps in agent workflows.

02

High propagation rates of backdoor triggers in different stages.

03

Vulnerabilities exist throughout the agentic workflow.

Abstract

Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surface for backdoor threats. Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs. However, existing studies remain fragmented and typically analyze individual attack vectors in isolation, leaving the cross-stage interaction and propagation of backdoor triggers poorly understood from an agent-centric perspective. To fill this gap, we propose \textbf{BackdoorAgent}, a modular and stage-aware framework that provides a unified, agent-centric view of backdoor threats in LLM agents. BackdoorAgent structures the attack surface into three functional stages of agentic workflows, including \textbf{planning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Security and Verification in Computing