IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
Xiaoya Lu, Zeren Chen, Xuhao Hu, Yijin Zhou, Weichen Zhang, Dongrui Liu, Lu Sheng, Jing Shao

TL;DR
IS-Bench is a novel multi-modal benchmark designed to evaluate the interactive safety of VLM-driven embodied agents in household tasks, focusing on their ability to perceive and mitigate emergent risks during task execution.
Contribution
This paper introduces IS-Bench, the first benchmark for assessing interactive safety in embodied agents, including a high-fidelity simulator and a process-oriented evaluation methodology.
Findings
Current agents lack interactive safety awareness.
Safety-aware Chain-of-Thought improves safety but may reduce task success.
Many agents fail to perform risk mitigation actions at critical steps.
Abstract
Flawed planning from VLM-driven embodied agents poses significant safety hazards, hindering their deployment in real-world household tasks. However, existing static, non-interactive evaluation paradigms fail to adequately assess risks within these interactive environments, since they cannot simulate dynamic risks that emerge from an agent's actions and rely on unreliable post-hoc evaluations that ignore unsafe intermediate steps. To bridge this critical gap, we propose evaluating an agent's interactive safety: its ability to perceive emergent risks and execute mitigation steps in the correct procedural order. We thus present IS-Bench, the first multi-modal benchmark designed for interactive safety, featuring 161 challenging scenarios with 388 unique safety risks instantiated in a high-fidelity simulator. Crucially, it facilitates a novel process-oriented evaluation that verifies whether…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSocial Robot Interaction and HRI · Human-Automation Interaction and Safety · Reinforcement Learning in Robotics
