DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video   Generation

Guosheng Zhao; Xiaofeng Wang; Zheng Zhu; Xinze Chen; Guan Huang,; Xiaoyi Bao; Xingang Wang

arXiv:2403.06845·cs.CV·April 12, 2024·3 cites

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang,, Xiaoyi Bao, Xingang Wang

PDF

Open Access 1 Repo

TL;DR

DriveDreamer-2 introduces a novel LLM-enhanced world model for generating diverse, customized driving videos that improve autonomous driving training and outperform existing methods in quality.

Contribution

It is the first to integrate LLMs with world models for user-defined, multi-view driving video generation, enhancing customization and video quality.

Findings

01

Generates diverse, user-defined driving videos with high realism.

02

Improves training for perception tasks like 3D detection and tracking.

03

Achieves superior FID and FVD scores compared to state-of-the-art methods.

Abstract

World models have demonstrated superiority in autonomous driving, particularly in the generation of multi-view driving videos. However, significant challenges still exist in generating customized driving videos. In this paper, we propose DriveDreamer-2, which builds upon the framework of DriveDreamer and incorporates a Large Language Model (LLM) to generate user-defined driving videos. Specifically, an LLM interface is initially incorporated to convert a user's query into agent trajectories. Subsequently, a HDMap, adhering to traffic regulations, is generated based on the trajectories. Ultimately, we propose the Unified Multi-View Model to enhance temporal and spatial coherence in the generated driving videos. DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e.g., vehicles abruptly cut in) in a user-friendly manner.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

f1yfisher/drivedreamer2
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Human Motion and Animation