6-DoF Robotic Grasping with Transformer
Zhenjie Zhao, Hang Yu, Hang Wu, Xuebo Zhang

TL;DR
This paper introduces a transformer-based approach for 6-DoF robotic grasping that effectively encodes 3D spatial information, improving success and declutter rates over existing methods.
Contribution
It extends transformer models to 6-DoF grasping by serializing 3D voxel data and integrating skip-connections, enhancing grasp detection in complex scenes.
Findings
Surpasses existing methods by about 5% in success rates
Demonstrates improved declutter rates
Shows strong generalization and efficiency
Abstract
Robotic grasping aims to detect graspable points and their corresponding gripper configurations in a particular scene, and is fundamental for robot manipulation. Existing research works have demonstrated the potential of using a transformer model for robotic grasping, which can efficiently learn both global and local features. However, such methods are still limited in grasp detection on a 2D plane. In this paper, we extend a transformer model for 6-Degree-of-Freedom (6-DoF) robotic grasping, which makes it more flexible and suitable for tasks that concern safety. The key designs of our method are a serialization module that turns a 3D voxelized space into a sequence of feature tokens that a transformer model can consume and skip-connections that merge multiscale features effectively. In particular, our method takes a Truncated Signed Distance Function (TSDF) as input. After serializing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
