# SPGrasp: Spatiotemporal Prompt-driven Grasp Synthesis in Dynamic Scenes

**Authors:** Yunpeng Mei, Hongjie Cao, Yinqiu Xia, Wei Xiao, Zhaohan Feng, Gang Wang, Jie Chen

arXiv: 2508.20547 · 2025-09-03

## TL;DR

SPGrasp is a real-time, prompt-driven framework that extends SAMv2 for dynamic scene grasp synthesis, achieving low latency, high accuracy, and temporal consistency in interactive grasping tasks.

## Contribution

It introduces a novel spatiotemporal prompt integration method for real-time grasp synthesis in dynamic scenes, significantly reducing latency while maintaining high accuracy.

## Key findings

- Achieves 90.6% grasp accuracy on OCID and 93.8% on Jacquard datasets.
- Reduces per-frame latency by 58.5% compared to prior methods.
- Demonstrates 94.8% success rate in real-world dynamic grasping experiments.

## Abstract

Real-time interactive grasp synthesis for dynamic objects remains challenging as existing methods fail to achieve low-latency inference while maintaining promptability. To bridge this gap, we propose SPGrasp (spatiotemporal prompt-driven dynamic grasp synthesis), a novel framework extending segment anything model v2 (SAMv2) for video stream grasp estimation. Our core innovation integrates user prompts with spatiotemporal context, enabling real-time interaction with end-to-end latency as low as 59 ms while ensuring temporal consistency for dynamic objects. In benchmark evaluations, SPGrasp achieves instance-level grasp accuracies of 90.6% on OCID and 93.8% on Jacquard. On the challenging GraspNet-1Billion dataset under continuous tracking, SPGrasp achieves 92.0% accuracy with 73.1 ms per-frame latency, representing a 58.5% reduction compared to the prior state-of-the-art promptable method RoG-SAM while maintaining competitive accuracy. Real-world experiments involving 13 moving objects demonstrate a 94.8% success rate in interactive grasping scenarios. These results confirm SPGrasp effectively resolves the latency-interactivity trade-off in dynamic grasp synthesis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20547/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20547/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/2508.20547/full.md

---
Source: https://tomesphere.com/paper/2508.20547