Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation
Zebin Jiang, Tianle Jin, Xiangtong Yao, Alois Knoll, Hu Cao

TL;DR
This paper introduces a hierarchical, language-guided grasp detection method for robots that improves semantic understanding and grasp accuracy in complex environments by integrating CLIP embeddings and a dynamic convolution head.
Contribution
It proposes a novel coarse-to-fine learning framework with a CLIP-based fusion pipeline and a language-conditioned convolution head for improved language-guided grasping.
Findings
Outperforms existing methods on OCID-VLG and Grasp-Anything++ datasets.
Demonstrates strong generalization to unseen objects and language queries.
Shows practical effectiveness on real robotic platforms.
Abstract
Grasping is one of the most fundamental challenging capabilities in robotic manipulation, especially in unstructured, cluttered, and semantically diverse environments. Recent researches have increasingly explored language-guided manipulation, where robots not only perceive the scene but also interpret task-relevant natural language instructions. However, existing language-conditioned grasping methods typically rely on shallow fusion strategies, leading to limited semantic grounding and weak alignment between linguistic intent and visual grasp reasoning.In this work, we propose Language-Guided Grasp Detection (LGGD) with a coarse-to-fine learning paradigm for robotic manipulation. LGGD leverages CLIP-based visual and textual embeddings within a hierarchical cross-modal fusion pipeline, progressively injecting linguistic cues into the visual feature reconstruction process. This design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Motor Control and Adaptation
