6-DoF Robotic Grasping with Transformer

Zhenjie Zhao; Hang Yu; Hang Wu; Xuebo Zhang

arXiv:2301.12476·cs.RO·January 31, 2023·1 cites

6-DoF Robotic Grasping with Transformer

Zhenjie Zhao, Hang Yu, Hang Wu, Xuebo Zhang

PDF

Open Access

TL;DR

This paper introduces a transformer-based approach for 6-DoF robotic grasping that effectively encodes 3D spatial information, improving success and declutter rates over existing methods.

Contribution

It extends transformer models to 6-DoF grasping by serializing 3D voxel data and integrating skip-connections, enhancing grasp detection in complex scenes.

Findings

01

Surpasses existing methods by about 5% in success rates

02

Demonstrates improved declutter rates

03

Shows strong generalization and efficiency

Abstract

Robotic grasping aims to detect graspable points and their corresponding gripper configurations in a particular scene, and is fundamental for robot manipulation. Existing research works have demonstrated the potential of using a transformer model for robotic grasping, which can efficiently learn both global and local features. However, such methods are still limited in grasp detection on a 2D plane. In this paper, we extend a transformer model for 6-Degree-of-Freedom (6-DoF) robotic grasping, which makes it more flexible and suitable for tasks that concern safety. The key designs of our method are a serialization module that turns a 3D voxelized space into a sequence of feature tokens that a transformer model can consume and skip-connections that merge multiscale features effectively. In particular, our method takes a Truncated Signed Distance Function (TSDF) as input. After serializing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition