AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions
Zonghao Ying, Le Wang, Yisong Xiao, Jiakai Wang, Yuqing Ma, Jinyang Guo, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu

TL;DR
SAFE is a comprehensive benchmark suite designed to evaluate the safety of embodied vision-language models in hazardous scenarios, revealing critical safety failures and guiding improvements in embodied agent safety.
Contribution
The paper introduces SAFE, a multi-component benchmark for systematic safety assessment of embodied VLM agents, covering perception, planning, and execution in hazardous environments.
Findings
Current VLMs exhibit systematic safety failures.
Existing benchmarks lack multi-stage safety evaluation.
Safety limitations highlight need for improved safety alignment.
Abstract
The integration of vision-language models (VLMs) is driving a new generation of embodied agents capable of operating in human-centered environments. However, as deployment expands, these systems face growing safety risks, particularly when executing hazardous instructions. Current safety evaluation benchmarks remain limited: they cover only narrow scopes of hazards and focus primarily on final outcomes, neglecting the agent's full perception-planning-execution process and thereby obscuring critical failure modes. Therefore, we present SAFE, a benchmark for systematically assessing the safety of embodied VLM agents on hazardous instructions. SAFE comprises three components: SAFE-THOR, an extensible adversarial simulation sandbox with a universal adapter that maps high-level VLM outputs to low-level embodied controls, supporting diverse agent workflow integration; SAFE-VERSE, a risk-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems · Social Robot Interaction and HRI
