SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
Chengyue Huang, Khang Vo Huynh, Sebastian Elbaum, Zsolt Kira, Lu Feng

TL;DR
SafeManip is a benchmark that evaluates temporal safety in robotic manipulation using property templates and LTLf, revealing safety issues even in successful task executions.
Contribution
It introduces a property-driven, reusable safety evaluation framework that generalizes across tasks and detects temporal safety violations in robotic manipulation.
Findings
Many models behave unsafely despite task success.
Safer task success does not always mean safer execution.
Complex tasks reveal more safety violations.
Abstract
Robotic manipulation is typically evaluated by task success, but successful completion does not guarantee safe execution. Many safety failures are temporal: a robot may touch a clean surface after contamination or release an object before it is fully inside an enclosure. We introduce SafeManip, a property-driven benchmark to explicitly evaluate temporal safety properties in robotic manipulation, moving beyond prior evaluations that largely focus on task completion or per-state constraint violations. SafeManip defines reusable safety templates over finite executions using Linear Temporal Logic over finite traces (LTLf). It maps observed rollouts to symbolic predicate traces and evaluates them with LTLf-based monitors. Its property suite covers eight manipulation safety categories: collision and contact safety, grasp stability, release stability, cross-contamination, action onset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
