Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Karthik Dharmarajan; Wenlong Huang; Jiajun Wu; Li Fei-Fei; Ruohan Zhang

arXiv:2512.24766·cs.RO·January 1, 2026

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Karthik Dharmarajan, Wenlong Huang, Jiajun Wu, Li Fei-Fei, Ruohan Zhang

PDF

Open Access

TL;DR

Dream2Flow leverages generative video models and 3D object flow to enable zero-shot robotic manipulation across diverse object categories by translating video synthesis into actionable control commands.

Contribution

It introduces a novel framework that uses 3D object flow as an intermediate to connect video generation with robotic control, overcoming embodiment gaps.

Findings

01

Effective manipulation of various object types including rigid, articulated, and deformable objects.

02

Zero-shot guidance from pre-trained video models to robotic control.

03

Successful real-world and simulation experiments demonstrating the approach.

Abstract

Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate such human-led motions into the low-level actions demanded by robotic systems. We observe that given an initial image and task instruction, these models excel at synthesizing sensible object motions. Thus, we introduce Dream2Flow, a framework that bridges video generation and robotic control through 3D object flow as an intermediate representation. Our method reconstructs 3D object motions from generated videos and formulates manipulation as object trajectory tracking. By separating the state changes from the actuators that realize those changes, Dream2Flow overcomes the embodiment gap and enables zero-shot guidance from pre-trained video models to manipulate objects of diverse categories-including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis