Language-driven Grasp Detection with Mask-guided Attention
Tuan Van Vo, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo,, Anh Nguyen

TL;DR
This paper presents a novel transformer-based framework that integrates visual data, segmentation masks, and natural language instructions to improve robotic grasp detection, especially under occlusions, demonstrated by significant experimental success.
Contribution
The paper introduces a new language-driven grasp detection framework using mask-guided attention and transformer mechanisms, advancing the integration of natural language in robotic grasping.
Findings
Outperforms recent baselines with a 10% success rate increase
Effective in real-world robotic experiments
Significantly improves grasp detection accuracy under occlusions
Abstract
Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention by utilizing the transformer attention mechanism with semantic segmentation features. Our approach integrates visual data, segmentation mask features, and natural language instructions, significantly improving grasp detection accuracy. Our work introduces a new framework for language-driven grasp detection, paving the way for language-driven robotic applications. Intensive experiments show that our method outperforms other recent baselines by a clear margin, with a 10.0% success score…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Teaching and Learning Programming
MethodsSoftmax · Attention Is All You Need
