AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

Zonghao Ying; Le Wang; Yisong Xiao; Jiakai Wang; Yuqing Ma; Jinyang Guo; Zhenfei Yin; Mingchuan Zhang; Aishan Liu; Xianglong Liu

arXiv:2506.14697·cs.CR·October 21, 2025

AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

Zonghao Ying, Le Wang, Yisong Xiao, Jiakai Wang, Yuqing Ma, Jinyang Guo, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu

PDF

Open Access

TL;DR

SAFE is a comprehensive benchmark suite designed to evaluate the safety of embodied vision-language models in hazardous scenarios, revealing critical safety failures and guiding improvements in embodied agent safety.

Contribution

The paper introduces SAFE, a multi-component benchmark for systematic safety assessment of embodied VLM agents, covering perception, planning, and execution in hazardous environments.

Findings

01

Current VLMs exhibit systematic safety failures.

02

Existing benchmarks lack multi-stage safety evaluation.

03

Safety limitations highlight need for improved safety alignment.

Abstract

The integration of vision-language models (VLMs) is driving a new generation of embodied agents capable of operating in human-centered environments. However, as deployment expands, these systems face growing safety risks, particularly when executing hazardous instructions. Current safety evaluation benchmarks remain limited: they cover only narrow scopes of hazards and focus primarily on final outcomes, neglecting the agent's full perception-planning-execution process and thereby obscuring critical failure modes. Therefore, we present SAFE, a benchmark for systematically assessing the safety of embodied VLM agents on hazardous instructions. SAFE comprises three components: SAFE-THOR, an extensible adversarial simulation sandbox with a universal adapter that maps high-level VLM outputs to low-level embodied controls, supporting diverse agent workflow integration; SAFE-VERSE, a risk-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems · Social Robot Interaction and HRI