Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems
Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan, Wang, Qi Alfred Chen, Qi Zhu

TL;DR
This paper introduces a comprehensive framework for backdoor attacks on embodied LLM-based decision systems, revealing significant security vulnerabilities and demonstrating highly effective attack methods across multiple models and tasks.
Contribution
It systematically explores attack surfaces and proposes three novel backdoor attack mechanisms, highlighting vulnerabilities in embodied AI systems.
Findings
Nearly 100% success rate for word and knowledge injection attacks
Scenario manipulation attacks exceed 65% success rate, up to 90%
Attacks are resilient against existing defenses
Abstract
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied artificial intelligence, especially when fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications. However, this fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems. In this work, we propose the first comprehensive framework for Backdoor Attacks against LLM-based Decision-making systems (BALD) in embodied AI, systematically exploring the attack surfaces and trigger mechanisms. Specifically, we propose three distinct attack mechanisms: word injection, scenario manipulation, and knowledge injection, targeting various components in the LLM-based decision-making pipeline. We perform extensive experiments on representative LLMs (GPT-3.5,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Information and Cyber Security
