GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
Huy Hoang Nguyen, An Vuong, Anh Nguyen, Ian Reid, Minh Nhat Vu

TL;DR
GraspMamba is a novel Mamba-based framework that combines hierarchical visual and textual feature fusion to improve language-driven robotic grasp detection, especially in cluttered scenes, with faster inference and better accuracy.
Contribution
It introduces the first Mamba-based grasp detection model that extracts multi-scale vision and language features for enhanced multimodal fusion.
Findings
Outperforms recent methods in accuracy
Achieves rapid inference speed
Validated through real-world robotic experiments
Abstract
Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Teaching and Learning Programming · Multimodal Machine Learning Applications
MethodsHierarchical Feature Fusion · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
