The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

Chunyang Li; Zifeng Kang; Junwei Zhang; Zhuo Ma; Anda Cheng; Xinghua Li; Jianfeng Ma

arXiv:2511.16347·cs.CR·November 21, 2025

The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

Chunyang Li, Zifeng Kang, Junwei Zhang, Zhuo Ma, Anda Cheng, Xinghua Li, Jianfeng Ma

PDF

Open Access

TL;DR

This paper introduces the concept of indirect environmental jailbreaks in embodied AI, demonstrating how malicious environmental prompts can compromise vision-language models, and provides automated tools and benchmarks for systematic evaluation.

Contribution

It is the first to study and benchmark indirect jailbreaks in embodied AI, proposing automated frameworks and a comprehensive benchmark for evaluating such attacks.

Findings

01

SHAWSHANK outperforms existing methods in attack success rate

02

All tested VLMs are vulnerable to the proposed attack

03

Current defenses only partially mitigate the jailbreaks

Abstract

The adoption of Vision-Language Models (VLMs) in embodied AI agents, while being effective, brings safety concerns such as jailbreaking. Prior work have explored the possibility of directly jailbreaking the embodied agents through elaborated multi-modal prompts. However, no prior work has studied or even reported indirect jailbreaks in embodied AI, where a black-box attacker induces a jailbreak without issuing direct prompts to the embodied agent. In this paper, we propose, for the first time, indirect environmental jailbreak (IEJ), a novel attack to jailbreak embodied AI via indirect prompt injected into the environment, such as malicious instructions written on a wall. Our key insight is that embodied AI does not ''think twice'' about the instructions provided by the environment -- a blind trust that attackers can exploit to jailbreak the embodied agent. We further design and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)