Jailbreaking Embodied LLMs via Action-level Manipulation
Xinyu Huang, Qiang Yang, Leming Shen, Zijing Ma, Yuanqing Zheng

TL;DR
This paper presents Blindfold, an automated attack framework exploiting embodied LLMs' limited causal reasoning to induce harmful physical actions, revealing critical security vulnerabilities in real-world AI systems.
Contribution
Introduces Blindfold, a novel attack method using surrogate models and action-level manipulation to expose vulnerabilities in embodied LLMs beyond language safety.
Findings
Blindfold achieves up to 53% higher attack success rates than SOTA methods.
The attack demonstrates significant risks in embodied LLMs' physical interactions.
Results highlight the need for consequence-aware defense mechanisms.
Abstract
Embodied Large Language Models (LLMs) enable AI agents to interact with the physical world through natural language instructions and actions. However, beyond the language-level risks inherent to LLMs themselves, embodied LLMs with real-world actuation introduce a new vulnerability: instructions that appear semantically benign may still lead to dangerous real-world consequences, revealing a fundamental misalignment between linguistic security and physical outcomes. In this paper, we introduce Blindfold, an automated attack framework that leverages the limited causal reasoning capabilities of embodied LLMs in real-world action contexts. Rather than iterative trial-and-error jailbreaking of black-box embodied LLMs, Blindfold adopts an Adversarial Proxy Planning strategy: it compromises a local surrogate LLM to perform action-level manipulations that appear semantically safe but could…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
