Robotic grasp detection based on Transformer

Mingshuai Dong; Xiuli Yu

arXiv:2205.15112·cs.RO·May 31, 2022

Robotic grasp detection based on Transformer

Mingshuai Dong, Xiuli Yu

PDF

Open Access

TL;DR

This paper introduces a Transformer-based encoder-decoder model for robotic grasp detection that effectively handles cluttered scenes and achieves high accuracy, combining global context extraction with convolutional inductive bias.

Contribution

The paper proposes a novel encoder-decoder grasp detection model that integrates Transformer and convolutional networks to improve performance in cluttered environments.

Findings

01

Outperforms existing methods in overlapping object scenes

02

Achieves 98.1% accuracy on the Cornell Grasp dataset

03

Demonstrates effectiveness of combining Transformer with CNNs

Abstract

Grasp detection in a cluttered environment is still a great challenge for robots. Currently, the Transformer mechanism has been successfully applied to visual tasks, and its excellent ability of global context information extraction provides a feasible way to improve the performance of robotic grasp detection in cluttered scenes. However, the insufficient inductive bias ability of the original Transformer model requires large-scale datasets training, which is difficult to obtain for grasp detection. In this paper, we propose a grasp detection model based on encoder-decoder structure. The encoder uses a Transformer network to extract global context information. The decoder uses a fully convolutional neural network to improve the inductive bias capability of the model and combine features extracted by the encoder to predict the final grasp configuration. Experiments on the VMRD dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Hand Gesture Recognition Systems · Human Pose and Action Recognition