SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

Chengyue Huang; Khang Vo Huynh; Sebastian Elbaum; Zsolt Kira; Lu Feng

arXiv:2605.12386·cs.RO·May 13, 2026

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

Chengyue Huang, Khang Vo Huynh, Sebastian Elbaum, Zsolt Kira, Lu Feng

PDF

TL;DR

SafeManip is a benchmark that evaluates temporal safety in robotic manipulation using property templates and LTLf, revealing safety issues even in successful task executions.

Contribution

It introduces a property-driven, reusable safety evaluation framework that generalizes across tasks and detects temporal safety violations in robotic manipulation.

Findings

01

Many models behave unsafely despite task success.

02

Safer task success does not always mean safer execution.

03

Complex tasks reveal more safety violations.

Abstract

Robotic manipulation is typically evaluated by task success, but successful completion does not guarantee safe execution. Many safety failures are temporal: a robot may touch a clean surface after contamination or release an object before it is fully inside an enclosure. We introduce SafeManip, a property-driven benchmark to explicitly evaluate temporal safety properties in robotic manipulation, moving beyond prior evaluations that largely focus on task completion or per-state constraint violations. SafeManip defines reusable safety templates over finite executions using Linear Temporal Logic over finite traces (LTLf). It maps observed rollouts to symbolic predicate traces and evaluates them with LTLf-based monitors. Its property suite covers eight manipulation safety categories: collision and contact safety, grasp stability, release stability, cross-contamination, action onset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.