GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping
Chao Tang, Dehao Huang, Wenqi Ge, Weiyu Liu, Hong Zhang

TL;DR
GraspGPT introduces a large language model-based framework for task-oriented grasping that leverages open-ended semantic knowledge to enable zero-shot generalization to unseen concepts, outperforming existing methods.
Contribution
This work presents GraspGPT, the first LLM-based TOG framework that utilizes open-ended semantic knowledge for zero-shot grasping of novel objects and tasks.
Findings
Outperforms existing TOG methods on held-out datasets
Achieves zero-shot generalization to unseen concepts
Validated in real-robot experiments
Abstract
Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications
