Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

Chao Tang; Jiacheng Xu; Haofei Lu; Bolin Zou; Wenlong Dong; Hong Zhang; and Danica Kragic

arXiv:2604.07517·cs.RO·April 10, 2026

Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

Chao Tang, Jiacheng Xu, Haofei Lu, Bolin Zou, Wenlong Dong, Hong Zhang, and Danica Kragic

PDF

TL;DR

This paper introduces GraspDreamer, a zero-shot robotic grasping method that uses human demonstrations generated by visual models, achieving high efficiency and generalization without extensive data collection.

Contribution

It leverages visual generative models to synthesize human demonstrations, enabling functional grasping in open-world environments with minimal data.

Findings

01

Outperforms previous methods in data efficiency and generalization

02

Validated on public benchmarks and real robots

03

Supports extension to downstream manipulation tasks

Abstract

Building generalist robots capable of performing functional grasping in everyday, open-world environments remains a significant challenge due to the vast diversity of objects and tasks. Existing methods are either constrained to narrow object/task sets or rely on prohibitively large-scale data collection to capture real-world variability. In this work, we present an alternative approach, GraspDreamer, a method that leverages human demonstrations synthesized by visual generative models (VGMs) (e.g., video generation models) to enable zero-shot functional grasping without labor-intensive data collection. The key idea is that VGMs pre-trained on internet-scale human data implicitly encode generalized priors about how humans interact with the physical world, which can be combined with embodiment-specific action optimization to enable functional grasping with minimal effort. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.