Jailbreaking Embodied LLMs via Action-level Manipulation

Xinyu Huang; Qiang Yang; Leming Shen; Zijing Ma; Yuanqing Zheng

arXiv:2603.01414·cs.RO·March 3, 2026

Jailbreaking Embodied LLMs via Action-level Manipulation

Xinyu Huang, Qiang Yang, Leming Shen, Zijing Ma, Yuanqing Zheng

PDF

Open Access

TL;DR

This paper presents Blindfold, an automated attack framework exploiting embodied LLMs' limited causal reasoning to induce harmful physical actions, revealing critical security vulnerabilities in real-world AI systems.

Contribution

Introduces Blindfold, a novel attack method using surrogate models and action-level manipulation to expose vulnerabilities in embodied LLMs beyond language safety.

Findings

01

Blindfold achieves up to 53% higher attack success rates than SOTA methods.

02

The attack demonstrates significant risks in embodied LLMs' physical interactions.

03

Results highlight the need for consequence-aware defense mechanisms.

Abstract

Embodied Large Language Models (LLMs) enable AI agents to interact with the physical world through natural language instructions and actions. However, beyond the language-level risks inherent to LLMs themselves, embodied LLMs with real-world actuation introduce a new vulnerability: instructions that appear semantically benign may still lead to dangerous real-world consequences, revealing a fundamental misalignment between linguistic security and physical outcomes. In this paper, we introduce Blindfold, an automated attack framework that leverages the limited causal reasoning capabilities of embodied LLMs in real-world action contexts. Rather than iterative trial-and-error jailbreaking of black-box embodied LLMs, Blindfold adopts an Adversarial Proxy Planning strategy: it compromises a local surrogate LLM to perform action-level manipulations that appear semantically safe but could…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling