MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping

Vineet Bhat; Naman Patel; Prashanth Krishnamurthy; Ramesh Karri; Farshad Khorrami

arXiv:2506.06535·cs.RO·August 26, 2025

MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping

Vineet Bhat, Naman Patel, Prashanth Krishnamurthy, Ramesh Karri, Farshad Khorrami

PDF

Open Access

TL;DR

MapleGrasp introduces a mask-guided feature pooling framework for language-driven robotic grasping, improving efficiency and accuracy in unseen object manipulation through vision-language integration and a new large-scale dataset.

Contribution

The paper presents a novel mask-guided feature pooling method and a large open-source dataset, enhancing generalization and efficiency in language-driven robotic grasping tasks.

Findings

01

7% improvement over prior approaches on OCID-VLG benchmark

02

89% grasping accuracy on RefGraspNet

03

73% success rate in real-world experiments with unseen objects

Abstract

Robotic manipulation of unseen objects via natural language commands remains challenging. Language driven robotic grasping (LDRG) predicts stable grasp poses from natural language queries and RGB-D images. We propose MapleGrasp, a novel framework that leverages mask-guided feature pooling for efficient vision-language driven grasping. Our two-stage training first predicts segmentation masks from CLIP-based vision-language features. The second stage pools features within these masks to generate pixel-level grasp predictions, improving efficiency, and reducing computation. Incorporating mask pooling results in a 7% improvement over prior approaches on the OCID-VLG benchmark. Furthermore, we introduce RefGraspNet, an open-source dataset eight times larger than existing alternatives, significantly enhancing model generalization for open-vocabulary grasping. MapleGrasp scores a strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Motor Control and Adaptation · Multimodal Machine Learning Applications