Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Xuanchi Ren; Yifan Lu; Tianshi Cao; Ruiyuan Gao; Shengyu Huang; Amirmojtaba Sabour; Tianchang Shen; Tobias Pfaff; Jay Zhangjie Wu; Runjian Chen; Seung Wook Kim; Jun Gao; Laura Leal-Taixe; Mike Chen; Sanja Fidler; Huan Ling

arXiv:2506.09042·cs.CV·June 19, 2025

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling

PDF

Open Access 1 Repo 3 Datasets

TL;DR

Cosmos-Drive-Dreams introduces a synthetic data generation pipeline powered by specialized world foundation models to create diverse, high-fidelity driving scenarios, improving autonomous vehicle training and testing by addressing data scarcity and edge cases.

Contribution

The paper presents a novel scalable synthetic data generation pipeline using NVIDIA's Cosmos world foundation models tailored for the driving domain, enabling controllable, multi-view, and spatiotemporally consistent driving video synthesis.

Findings

01

Generated data improves model generalization in downstream tasks.

02

Synthetic data helps mitigate long-tail distribution issues.

03

Pipeline and models are open-sourced for community use.

Abstract

Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nv-tlabs/cosmos-drive-dreams
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications