DreamGen: Unlocking Generalization in Robot Learning through Video World Models

Joel Jang; Seonghyeon Ye; Zongyu Lin; Jiannan Xiang; Johan Bjorck; Yu Fang; Fengyuan Hu; Spencer Huang; Kaushil Kundalia; Yen-Chen Lin; Loic Magne; Ajay Mandlekar; Avnish Narayan; You Liang Tan; Guanzhi Wang; Jing Wang; Qi Wang; Yinzhen Xu; Xiaohui Zeng; Kaiyuan Zheng; Ruijie Zheng; Ming-Yu Liu; Luke Zettlemoyer; Dieter Fox; Jan Kautz; Scott Reed; Yuke Zhu; Linxi Fan

arXiv:2505.12705·cs.RO·June 19, 2025

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, Loic Magne, Ajay Mandlekar, Avnish Narayan, You Liang Tan, Guanzhi Wang, Jing Wang, Qi Wang, Yinzhen Xu, Xiaohui Zeng, Kaiyuan Zheng

PDF

Open Access 1 Repo

TL;DR

DreamGen is a novel pipeline that uses video world models to generate synthetic robot data, enabling policies to generalize across behaviors and environments with minimal real data, advancing robot learning scalability.

Contribution

We propose DreamGen, a 4-stage pipeline utilizing image-to-video generative models and pseudo-action recovery to enhance robot policy generalization across diverse tasks and environments.

Findings

01

Robots performed 22 new behaviors in unseen environments.

02

Strong correlation between video generation benchmark and policy success.

03

Effective generalization achieved with minimal teleoperation data.

Abstract

We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot data generated from video world models. DreamGen leverages state-of-the-art image-to-video generative models, adapting them to the target robot embodiment to produce photorealistic synthetic videos of familiar or novel tasks in diverse environments. Since these models generate only videos, we recover pseudo-action sequences using either a latent action model or an inverse-dynamics model (IDM). Despite its simplicity, DreamGen unlocks strong behavior and environment generalization: a humanoid robot can perform 22 new behaviors in both seen and unseen environments, while requiring teleoperation data from only a single pick-and-place task in one environment. To evaluate the pipeline systematically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvidia/gr00t-dreams
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics