CLASP: Closed-loop Asynchronous Spatial Perception for Open-vocabulary Desktop Object Grasping

Yiran Ling; Wenxuan Li; Siying Dong; Yize Zhang; Xiaoyao Huang; Jing Jiang; Ruonan Li; and Jie Liu

arXiv:2604.11320·cs.RO·April 14, 2026

CLASP: Closed-loop Asynchronous Spatial Perception for Open-vocabulary Desktop Object Grasping

Yiran Ling, Wenxuan Li, Siying Dong, Yize Zhang, Xiaoyao Huang, Jing Jiang, Ruonan Li, and Jie Liu

PDF

TL;DR

CLASP is a novel closed-loop framework that enhances robotic desktop object grasping by integrating multimodal perception, logical reasoning, and feedback to improve success rates and robustness in dynamic environments.

Contribution

The paper introduces CLASP, a new asynchronous closed-loop system with hierarchical perception and error correction, advancing open-vocabulary robotic grasping in complex settings.

Findings

01

Achieves 87.0% success rate in grasping tasks.

02

Demonstrates strong generalization across diverse objects.

03

Bridges the sim-to-real gap effectively.

Abstract

Robot grasping of desktop object is widely used in intelligent manufacturing, logistics, and agriculture.Although vision-language models (VLMs) show strong potential for robotic manipulation, their deployment in low-level grasping faces key challenges: scarce high-quality multimodal demonstrations, spatial hallucination caused by weak geometric grounding, and the fragility of open-loop execution in dynamic environments. To address these challenges, we propose Closed-Loop Asynchronous Spatial Perception(CLASP), a novel asynchronous closed-loop framework that integrates multimodal perception, logical reasoning, and state-reflective feedback. First, we design a Dual-Pathway Hierarchical Perception module that decouples high-level semantic intent from geometric grounding. The design guides the output of the inference model and the definite action tuples, reducing spatial illusions. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.