HomeGuard: VLM-based Embodied Safeguard for Identifying Contextual Risk in Household Task
Xiaoya Lu, Yijin Zhou, Zeren Chen, Ruocheng Wang, Bingrui Sima, Enshen Zhou, Lu Sheng, Dongrui Liu, Jing Shao

TL;DR
HomeGuard introduces a novel architecture-agnostic safeguard using Context-Guided Chain-of-Thought to improve safety in embodied agents by accurately identifying contextual risks in household tasks.
Contribution
The paper proposes a new safeguard mechanism with CG-CoT that enhances risk detection and grounding in VLM-based embodied agents, supported by a curated dataset and reinforcement fine-tuning.
Findings
Risk match rates improved by over 30%
Enhanced safety and reduced oversafety in hazard detection
Generated visual anchors enable explicit collision avoidance
Abstract
Vision-Language Models (VLMs) empower embodied agents to execute complex instructions, yet they remain vulnerable to contextual safety risks where benign commands become hazardous due to subtle environmental states. Existing safeguards often prove inadequate. Rule-based methods lack scalability in object-dense scenes, whereas model-based approaches relying on prompt engineering suffer from unfocused perception, resulting in missed risks or hallucinations. To address this, we propose an architecture-agnostic safeguard featuring Context-Guided Chain-of-Thought (CG-CoT). This mechanism decomposes risk assessment into active perception that sequentially anchors attention to interaction targets and relevant spatial neighborhoods, followed by semantic judgment based on this visual evidence. We support this approach with a curated grounding dataset and a two-stage training strategy utilizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Social Robot Interaction and HRI
