ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models
Lei Zhang, Ju Dong, Kaixin Bai, Minheng Ni, Zoltan-Csaba Marton, Zhaopeng Chen, Jianwei Zhang

TL;DR
ResponsibleRobotBench is a comprehensive benchmark designed to evaluate and advance responsible robotic manipulation, emphasizing safety, risk mitigation, and reasoning across diverse tasks and modalities in simulation and real-world settings.
Contribution
The paper introduces ResponsibleRobotBench, a novel benchmark with diverse tasks, multimodal evaluation framework, and standardized metrics to promote trustworthy and responsible robotic manipulation.
Findings
Benchmark covers 23 multi-stage risk-aware tasks.
Framework supports multimodal perception, reasoning, and physical execution.
Enables analysis of safety and generalization in robotic agents.
Abstract
Recent advances in large multimodal models have enabled new opportunities in embodied AI, particularly in robotic manipulation. These models have shown strong potential in generalization and reasoning, but achieving reliable and responsible robotic behavior in real-world settings remains an open challenge. In high-stakes environments, robotic agents must go beyond basic task execution to perform risk-aware reasoning, moral decision-making, and physically grounded planning. We introduce ResponsibleRobotBench, a systematic benchmark designed to evaluate and accelerate progress in responsible robotic manipulation from simulation to real world. This benchmark consists of 23 multi-stage tasks spanning diverse risk types, including electrical, chemical, and human-related hazards, and varying levels of physical and planning complexity. These tasks require agents to detect and mitigate risks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Explainable Artificial Intelligence (XAI)
