POEX: Towards Policy Executable Jailbreak Attacks Against the LLM-based Robots
Xuancun Lu, Zhengxian Huang, Xinfeng Li, Chi Zhang, Xiaoyu ji, Wenyuan Xu

TL;DR
This paper investigates the security vulnerabilities of LLM-based robots to jailbreak attacks, introduces a new framework called POEX for inducing harmful policies, and proposes defenses to enhance safety.
Contribution
It presents the first systematic analysis of jailbreak attacks on LLM-based robots, introduces POEX for effective red-teaming, and offers defense strategies to improve security.
Findings
Traditional jailbreaks are ineffective against robot scenarios.
POEX successfully induces harmful executable policies in real-world robots.
Security vulnerabilities are significant and transferable across LLMs.
Abstract
The integration of LLMs into robots has witnessed significant growth, where LLMs can convert instructions into executable robot policies. However, the inherent vulnerability of LLMs to jailbreak attacks brings critical security risks from the digital domain to the physical world. An attacked LLM-based robot could execute harmful policies and cause physical harm. In this paper, we investigate the feasibility and rationale of jailbreak attacks against LLM-based robots and answer three research questions: (1) How applicable are existing LLM jailbreak attacks against LLM-based robots? (2) What unique challenges arise if they are not directly applicable? (3) How to defend against such jailbreak attacks? To this end, we first construct a "human-object-environment" robot risks-oriented Harmful-RLbench and then conduct a measurement study on LLM-based robot systems. Our findings conclude that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Digital and Cyber Forensics
