DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Jianbiao Mei; Tao Hu; Xuemeng Yang; Licheng Wen; Yu Yang; Tiantian Wei; Yukai Ma; Min Dou; Botian Shi; Yong Liu

arXiv:2409.04003·cs.CV·May 30, 2025

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Jianbiao Mei, Tao Hu, Xuemeng Yang, Licheng Wen, Yu Yang, Tiantian Wei, Yukai Ma, Min Dou, Botian Shi, Yong Liu

PDF

Open Access 1 Repo

TL;DR

DreamForge is a diffusion-based autoregressive model that generates long, realistic driving scene videos with 3D control, improved foreground modeling, and motion awareness, advancing the realism and length of generated driving videos.

Contribution

It introduces perspective guidance, object-wise position encoding, and motion-aware temporal attention for improved long-term driving scene video generation.

Findings

01

Generated videos over 200 frames with higher quality than baselines.

02

Enhanced foreground and lane modeling through perspective guidance.

03

Effective long-term video generation using autoregressive paradigm.

Abstract

Recent advances in diffusion models have improved controllable streetscape generation and supported downstream perception and planning tasks. However, challenges remain in accurately modeling driving scenes and generating long videos. To alleviate these issues, we propose DreamForge, an advanced diffusion-based autoregressive video generation model tailored for 3D-controllable long-term generation. To enhance the lane and foreground generation, we introduce perspective guidance and integrate object-wise position encoding to incorporate local 3D correlation and improve foreground object modeling. We also propose motion-aware temporal attention to capture motion cues and appearance changes in videos. By leveraging motion frames and an autoregressive generation paradigm,we can autoregressively generate long videos (over 200 frames) using a model trained in short sequences, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PJLab-ADG/DriveArena
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion