DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent
Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su

TL;DR
This paper introduces DemonAgent, a novel backdoor attack on LLM-based agents that uses dynamic encryption and multi-fragmentation to evade safety audits with nearly 100% success and zero detection.
Contribution
It presents a new backdoor implantation method combining dynamic encryption and sub-backdoor decomposition to bypass safety audits in LLM agents.
Findings
Achieves near 100% attack success rate.
Maintains 0% detection rate by safety audits.
Highlights vulnerabilities in current safety mechanisms.
Abstract
As LLM-based agents become increasingly prevalent, backdoors can be implanted into agents through user queries or environment feedback, raising critical concerns regarding safety vulnerabilities. However, backdoor attacks are typically detectable by safety audits that analyze the reasoning process of agents. To this end, we propose a novel backdoor implantation strategy called \textbf{Dynamically Encrypted Multi-Backdoor Implantation Attack}. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits. To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly. Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModular Robots and Swarm Intelligence
