RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Le Wang; Zonghao Ying; Xiao Yang; Quanchen Zou; Zhenfei Yin; Tianlin Li; Jian Yang; Yaodong Yang; Aishan Liu; Xianglong Liu

arXiv:2512.21220·cs.AI·December 29, 2025

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu

PDF

Open Access

TL;DR

RoboSafe introduces an executable safety logic framework for embodied agents that proactively detects and prevents hazardous behaviors by reasoning over recent and future trajectories, significantly improving safety without sacrificing task performance.

Contribution

The paper presents RoboSafe, a novel hybrid reasoning safety system combining backward and forward modules to enhance runtime safety in embodied agents using executable predicate-based logic.

Findings

01

Reduces hazardous actions by 36.8% compared to baselines

02

Maintains near-original task performance

03

Proven effective on physical robotic arms

Abstract

Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI