Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations
Chao Tang, Jiacheng Xu, Haofei Lu, Bolin Zou, Wenlong Dong, Hong Zhang, and Danica Kragic

TL;DR
This paper introduces GraspDreamer, a zero-shot robotic grasping method that uses human demonstrations generated by visual models, achieving high efficiency and generalization without extensive data collection.
Contribution
It leverages visual generative models to synthesize human demonstrations, enabling functional grasping in open-world environments with minimal data.
Findings
Outperforms previous methods in data efficiency and generalization
Validated on public benchmarks and real robots
Supports extension to downstream manipulation tasks
Abstract
Building generalist robots capable of performing functional grasping in everyday, open-world environments remains a significant challenge due to the vast diversity of objects and tasks. Existing methods are either constrained to narrow object/task sets or rely on prohibitively large-scale data collection to capture real-world variability. In this work, we present an alternative approach, GraspDreamer, a method that leverages human demonstrations synthesized by visual generative models (VGMs) (e.g., video generation models) to enable zero-shot functional grasping without labor-intensive data collection. The key idea is that VGMs pre-trained on internet-scale human data implicitly encode generalized priors about how humans interact with the physical world, which can be combined with embodiment-specific action optimization to enable functional grasping with minimal effort. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
