ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models

Lei Zhang; Ju Dong; Kaixin Bai; Minheng Ni; Zoltan-Csaba Marton; Zhaopeng Chen; Jianwei Zhang

arXiv:2512.04308·cs.RO·December 5, 2025

ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models

Lei Zhang, Ju Dong, Kaixin Bai, Minheng Ni, Zoltan-Csaba Marton, Zhaopeng Chen, Jianwei Zhang

PDF

Open Access

TL;DR

ResponsibleRobotBench is a comprehensive benchmark designed to evaluate and advance responsible robotic manipulation, emphasizing safety, risk mitigation, and reasoning across diverse tasks and modalities in simulation and real-world settings.

Contribution

The paper introduces ResponsibleRobotBench, a novel benchmark with diverse tasks, multimodal evaluation framework, and standardized metrics to promote trustworthy and responsible robotic manipulation.

Findings

01

Benchmark covers 23 multi-stage risk-aware tasks.

02

Framework supports multimodal perception, reasoning, and physical execution.

03

Enables analysis of safety and generalization in robotic agents.

Abstract

Recent advances in large multimodal models have enabled new opportunities in embodied AI, particularly in robotic manipulation. These models have shown strong potential in generalization and reasoning, but achieving reliable and responsible robotic behavior in real-world settings remains an open challenge. In high-stakes environments, robotic agents must go beyond basic task execution to perform risk-aware reasoning, moral decision-making, and physically grounded planning. We introduce ResponsibleRobotBench, a systematic benchmark designed to evaluate and accelerate progress in responsible robotic manipulation from simulation to real world. This benchmark consists of 23 multi-stage tasks spanning diverse risk types, including electrical, chemical, and human-related hazards, and varying levels of physical and planning complexity. These tasks require agents to detect and mitigate risks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Explainable Artificial Intelligence (XAI)