When Transformer Meets Robotic Grasping: Exploits Context for Efficient   Grasp Detection

Shaochen Wang; Zhangli Zhou; and Zhen Kan

arXiv:2202.11911·cs.RO·September 14, 2022

When Transformer Meets Robotic Grasping: Exploits Context for Efficient Grasp Detection

Shaochen Wang, Zhangli Zhou, and Zhen Kan

PDF

Open Access 1 Repo

TL;DR

This paper introduces TF-Grasp, a transformer-based architecture for robotic grasp detection that effectively captures local and global context, achieving state-of-the-art accuracy and demonstrating real-world applicability.

Contribution

The paper proposes a novel transformer architecture with local and cross window attention for improved grasp detection in cluttered environments.

Findings

01

Achieves 97.99% accuracy on Cornell dataset

02

Attains 94.6% accuracy on Jacquard dataset

03

Demonstrates successful real-world grasping with a robotic arm

Abstract

In this paper, we present a transformer-based architecture, namely TF-Grasp, for robotic grasp detection. The developed TF-Grasp framework has two elaborate designs making it well suitable for visual grasping tasks. The first key design is that we adopt the local window attention to capture local contextual information and detailed features of graspable objects. Then, we apply the cross window attention to model the long-term dependencies between distant pixels. Object knowledge, environmental configuration, and relationships between different visual entities are aggregated for subsequent grasp detection. The second key design is that we build a hierarchical encoder-decoder architecture with skip-connections, delivering shallow features from encoder to decoder to enable a multi-scale feature fusion. Due to the powerful attention mechanism, the TF-Grasp can simultaneously obtain the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangshaosun/grasp-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications