HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments
Songsong Xiong, Hamidreza Kasaei

TL;DR
HMT-Grasp introduces a hybrid Mamba-Transformer model that effectively combines local and global features, significantly enhancing robot grasping performance in cluttered environments across diverse scenarios.
Contribution
The paper presents a novel hybrid Mamba-Transformer architecture that improves visual grasp detection by integrating global and local information, outperforming existing methods.
Findings
Outperforms state-of-the-art on Cornell, Jacquard, and OCID-Grasp datasets.
Demonstrates superior performance in both simulated and real-world robotic experiments.
Enhances adaptability and precision in complex cluttered environments.
Abstract
Robot grasping, whether handling isolated objects, cluttered items, or stacked objects, plays a critical role in industrial and service applications. However, current visual grasp detection methods based on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) often struggle to adapt to diverse scenarios, as they tend to emphasize either local or global features exclusively, neglecting complementary cues. In this paper, we propose a novel hybrid Mamba-Transformer approach to address these challenges. Our method improves robotic visual grasping by effectively capturing both global and local information through the integration of Vision Mamba and parallel convolutional-transformer blocks. This hybrid architecture significantly improves adaptability, precision, and flexibility across various robotic tasks. To ensure a fair evaluation, we conducted extensive experiments on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Locomotion and Control · Robotics and Automated Systems
