JailWAM: Jailbreaking World Action Models in Robot Control
Hanqing Liu, Songping Wang, Jiahuan Long, Jiacheng Hou, Jialiang Sun, Chao Li, Yang Yang, Wei Peng, Xu Liu, Tingsong Jiang, Wen Yao, Yao Mu

TL;DR
JailWAM introduces a comprehensive framework to evaluate and exploit vulnerabilities in World Action Models for robot control, highlighting safety risks and proposing defenses.
Contribution
It is the first dedicated jailbreak attack and evaluation framework for WAM, including a benchmark for safety assessment under attacks.
Findings
Achieved 84.2% attack success rate on LingBot-VA.
Framework efficiently exposes physical vulnerabilities in WAM.
Proposes effective defense mechanisms for safe robot control.
Abstract
The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. Such powerful physical interaction ability is a double-edged sword: if safety is ignored, it will directly threaten personal safety, property security and environmental safety. However, existing research pays extremely limited attention to the critical security gap: the vulnerability of WAM to jailbreak attacks. To fill this gap, we define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions. Furthermore, we propose JailWAM, the first dedicated jailbreak attack and evaluation framework for WAM, which consists of three core components: (1) Visual-Trajectory Mapping, which unifies heterogeneous action spaces into visual trajectory representations and enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
