V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors
Songjia He, Zixuan Chen, Hongyu Ding, Dian Shao, Jieqi Shi, Chenxu Li, Jing Huo, and Yang Gao

TL;DR
V-Dreamer introduces an automated framework that generates diverse, simulation-ready manipulation environments and trajectories from natural language, enabling scalable robot training and effective sim-to-real transfer.
Contribution
The paper presents a novel generative pipeline combining language models, 3D generative models, and visual priors to create diverse, physically grounded simulation environments and trajectories from natural language instructions.
Findings
Policies trained on synthesized data generalize to unseen objects.
The approach achieves effective sim-to-real transfer in manipulation tasks.
Generated environments support high visual diversity and physical fidelity.
Abstract
Training generalist robots demands large-scale, diverse manipulation data, yet real-world collection is prohibitively expensive, and existing simulators are often constrained by fixed asset libraries and manual heuristics. To bridge this gap, we present V-Dreamer, a fully automated framework that generates open-vocabulary, simulation-ready manipulation environments and executable expert trajectories directly from natural language instructions. V-Dreamer employs a novel generative pipeline that constructs physically grounded 3D scenes using large language models and 3D generative models, validated by geometric constraints to ensure stable, collision-free layouts. Crucially, for behavior synthesis, we leverage video generation models as rich motion priors. These visual predictions are then mapped into executable robot trajectories via a robust Sim-to-Gen visual-kinematic alignment module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Motion and Animation · Multimodal Machine Learning Applications
