PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large   Multimodal Models

Dingkun Guo; Yuqi Xiang; Shuqi Zhao; Xinghao Zhu; Masayoshi Tomizuka,; Mingyu Ding; Wei Zhan

arXiv:2402.16836·cs.RO·February 27, 2024·1 cites

PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models

Dingkun Guo, Yuqi Xiang, Shuqi Zhao, Xinghao Zhu, Masayoshi Tomizuka,, Mingyu Ding, Wei Zhan

PDF

Open Access

TL;DR

PhyGrasp is a multimodal large model that integrates natural language and 3D point cloud data to improve robotic grasping, especially in complex, long-tailed scenarios, by infusing physics-informed reasoning.

Contribution

It introduces PhyGrasp, a novel multimodal model combining language and 3D data with a new dataset, enhancing grasping generalization and physical reasoning in robotics.

Findings

01

Achieves state-of-the-art success rates in simulation and real robots.

02

Improves grasp success by about 10% in long-tailed cases.

03

Effectively interprets human instructions for grasping tasks.

Abstract

Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before. This work delves into infusing such physical commonsense reasoning into robotic manipulation. We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds, seamlessly integrated through a bridge module. The language modality exhibits robust reasoning capabilities concerning the impacts of diverse physical properties on grasping, while the 3D modality comprehends object shapes and parts. With…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsALIGN