RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

Doguhuan Yeke; Yanming Zhou; Leo Y. Lin; Hongyu Cai; Antonio Bianchi; Z. Berkay Celik

arXiv:2605.19328·cs.CR·May 20, 2026

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

Doguhuan Yeke, Yanming Zhou, Leo Y. Lin, Hongyu Cai, Antonio Bianchi, Z. Berkay Celik

PDF

1 Repo

TL;DR

RoboJailBench is a comprehensive benchmark framework for evaluating adversarial jailbreak attacks and defenses in embodied robotic AI systems, addressing current gaps in security assessment tools.

Contribution

It introduces a new security taxonomy, an intent contrast dataset pipeline, and a standardized evaluation framework for embodied AI jailbreak attacks.

Findings

01

Constructed a new taxonomy-balanced dataset for embodied AI security.

02

Evaluated five datasets with four attacks and two defenses.

03

Provided the first standardized framework for jailbreak evaluation in embodied AI.

Abstract

Recent advances in Vision-Language Models (VLMs) facilitate a new class of embodied AI systems, where these models are integrated into physical platforms, e.g. robots and autonomous vehicles, to interpret visual scenes and execute natural language commands in diverse environments. Previous research has introduced jailbreak attacks and defenses for embodied AI. Their evaluations, however, rely on ad-hoc datasets, limited metrics, and emphasize attack success while neglecting the trade-off between security and the ability to follow benign commands. Existing benchmarks and evaluation frameworks either target traditional chat-based models or focus on non-adversarial safety evaluation for embodied AI; neither captures the adversarial risks, inputs, consequences, and evaluation criteria necessary for jailbreak attacks in embodied AI systems. In this paper, we address this gap with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://purseclab.github.io/benchmark-for-robotics-security
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.