DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
Jianbiao Mei, Tao Hu, Xuemeng Yang, Licheng Wen, Yu Yang, Tiantian Wei, Yukai Ma, Min Dou, Botian Shi, Yong Liu

TL;DR
DreamForge is a diffusion-based autoregressive model that generates long, realistic driving scene videos with 3D control, improved foreground modeling, and motion awareness, advancing the realism and length of generated driving videos.
Contribution
It introduces perspective guidance, object-wise position encoding, and motion-aware temporal attention for improved long-term driving scene video generation.
Findings
Generated videos over 200 frames with higher quality than baselines.
Enhanced foreground and lane modeling through perspective guidance.
Effective long-term video generation using autoregressive paradigm.
Abstract
Recent advances in diffusion models have improved controllable streetscape generation and supported downstream perception and planning tasks. However, challenges remain in accurately modeling driving scenes and generating long videos. To alleviate these issues, we propose DreamForge, an advanced diffusion-based autoregressive video generation model tailored for 3D-controllable long-term generation. To enhance the lane and foreground generation, we introduce perspective guidance and integrate object-wise position encoding to incorporate local 3D correlation and improve foreground object modeling. We also propose motion-aware temporal attention to capture motion cues and appearance changes in videos. By leveraging motion frames and an autoregressive generation paradigm,we can autoregressively generate long videos (over 200 frames) using a model trained in short sequences, achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
