PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Dingkun Guo, Yuqi Xiang, Shuqi Zhao, Xinghao Zhu, Masayoshi Tomizuka,, Mingyu Ding, Wei Zhan

TL;DR
PhyGrasp is a multimodal large model that integrates natural language and 3D point cloud data to improve robotic grasping, especially in complex, long-tailed scenarios, by infusing physics-informed reasoning.
Contribution
It introduces PhyGrasp, a novel multimodal model combining language and 3D data with a new dataset, enhancing grasping generalization and physical reasoning in robotics.
Findings
Achieves state-of-the-art success rates in simulation and real robots.
Improves grasp success by about 10% in long-tailed cases.
Effectively interprets human instructions for grasping tasks.
Abstract
Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before. This work delves into infusing such physical commonsense reasoning into robotic manipulation. We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds, seamlessly integrated through a bridge module. The language modality exhibits robust reasoning capabilities concerning the impacts of diverse physical properties on grasping, while the 3D modality comprehends object shapes and parts. With…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsALIGN
