BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
Wenqi Lyu, Zerui Li, Yanyuan Qiao, Qi Wu

TL;DR
This paper demonstrates that multimodal large language models used in vision-and-language navigation are highly vulnerable to jailbreak attacks, which can induce harmful actions and pose safety risks in real-world scenarios.
Contribution
It introduces the first systematic jailbreak attack framework for MLLM-driven navigators, revealing significant security vulnerabilities in embodied AI systems.
Findings
Average attack success rate over 90% in simulation
Attacks can induce harmful actions in physical robots
Vulnerabilities pose safety risks beyond toxic content
Abstract
Multimodal large language models (MLLMs) have recently gained attention for their generalization and reasoning capabilities in Vision-and-Language Navigation (VLN) tasks, leading to the rise of MLLM-driven navigators. However, MLLMs are vulnerable to jailbreak attacks, where crafted prompts bypass safety mechanisms and trigger undesired outputs. In embodied scenarios, such vulnerabilities pose greater risks: unlike plain text models that generate toxic content, embodied agents may interpret malicious instructions as executable commands, potentially leading to real-world harm. In this paper, we present the first systematic jailbreak attack paradigm targeting MLLM-driven navigator. We propose a three-tiered attack framework and construct malicious queries across four intent categories, concatenated with standard navigation instructions. In the Matterport3D simulator, we evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Topic Modeling
