GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for   Task-Oriented Grasping

Chao Tang; Dehao Huang; Wenqi Ge; Weiyu Liu; Hong Zhang

arXiv:2307.13204·cs.RO·September 21, 2023·1 cites

GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping

Chao Tang, Dehao Huang, Wenqi Ge, Weiyu Liu, Hong Zhang

PDF

Open Access

TL;DR

GraspGPT introduces a large language model-based framework for task-oriented grasping that leverages open-ended semantic knowledge to enable zero-shot generalization to unseen concepts, outperforming existing methods.

Contribution

This work presents GraspGPT, the first LLM-based TOG framework that utilizes open-ended semantic knowledge for zero-shot grasping of novel objects and tasks.

Findings

01

Outperforms existing TOG methods on held-out datasets

02

Achieves zero-shot generalization to unseen concepts

03

Validated in real-robot experiments

Abstract

Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications