Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong, Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

TL;DR
Panacea is a novel method for generating high-quality, panoramic, and controllable driving videos that enhance autonomous vehicle training datasets by ensuring coherence and alignment with annotations.
Contribution
It introduces a new approach combining 4D attention, a two-stage pipeline, and ControlNet for controllable, coherent panoramic video generation in driving scenarios.
Findings
Effective generation of diverse, annotated driving videos.
Maintains temporal and cross-view consistency.
Improves autonomous driving perception models.
Abstract
The field of autonomous driving increasingly demands high-quality annotated training data. In this paper, we propose Panacea, an innovative approach to generate panoramic and controllable videos in driving scenarios, capable of yielding an unlimited numbers of diverse, annotated samples pivotal for autonomous driving advancements. Panacea addresses two critical challenges: 'Consistency' and 'Controllability.' Consistency ensures temporal and cross-view coherence, while Controllability ensures the alignment of generated content with corresponding annotations. Our approach integrates a novel 4D attention and a two-stage generation pipeline to maintain coherence, supplemented by the ControlNet framework for meticulous control by the Bird's-Eye-View (BEV) layouts. Extensive qualitative and quantitative evaluations of Panacea on the nuScenes dataset prove its effectiveness in generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Multimodal Machine Learning Applications
