TL;DR
RoboJailBench is a comprehensive benchmark framework for evaluating adversarial jailbreak attacks and defenses in embodied robotic AI systems, addressing current gaps in security assessment tools.
Contribution
It introduces a new security taxonomy, an intent contrast dataset pipeline, and a standardized evaluation framework for embodied AI jailbreak attacks.
Findings
Constructed a new taxonomy-balanced dataset for embodied AI security.
Evaluated five datasets with four attacks and two defenses.
Provided the first standardized framework for jailbreak evaluation in embodied AI.
Abstract
Recent advances in Vision-Language Models (VLMs) facilitate a new class of embodied AI systems, where these models are integrated into physical platforms, e.g. robots and autonomous vehicles, to interpret visual scenes and execute natural language commands in diverse environments. Previous research has introduced jailbreak attacks and defenses for embodied AI. Their evaluations, however, rely on ad-hoc datasets, limited metrics, and emphasize attack success while neglecting the trade-off between security and the ability to follow benign commands. Existing benchmarks and evaluation frameworks either target traditional chat-based models or focus on non-adversarial safety evaluation for embodied AI; neither captures the adversarial risks, inputs, consequences, and evaluation criteria necessary for jailbreak attacks in embodied AI systems. In this paper, we address this gap with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
