SPOC: Safety-Aware Planning Under Partial Observability And Physical Constraints
Hyungmin Kim, Hobeom Jeon, Dohyung Kim, Minsu Jang, Jeahong Kim

TL;DR
SPOC is a comprehensive benchmark designed to evaluate safety-aware embodied task planning in real-world scenarios, addressing partial observability and physical constraints to improve safety in AI systems.
Contribution
We introduce SPOC, a novel benchmark integrating safety constraints and partial observability for evaluating embodied task planning with large language models.
Findings
Current LLMs struggle with safety-aware planning under constraints.
SPOC enables rigorous safety assessment in diverse household hazards.
Benchmark facilitates development of safer embodied AI systems.
Abstract
Embodied Task Planning with large language models faces safety challenges in real-world environments, where partial observability and physical constraints must be respected. Existing benchmarks often overlook these critical factors, limiting their ability to evaluate both feasibility and safety. We introduce SPOC, a benchmark for safety-aware embodied task planning, which integrates strict partial observability, physical constraints, step-by-step planning, and goal-condition-based evaluation. Covering diverse household hazards such as fire, fluid, injury, object damage, and pollution, SPOC enables rigorous assessment through both state and constraint-based online metrics. Experiments with state-of-the-art LLMs reveal that current models struggle to ensure safety-aware planning, particularly under implicit constraints. Code and dataset are available at https://github.com/khm159/SPOC
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning
