Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang,, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

TL;DR
This paper introduces Delphi, a diffusion-based long video generation method that produces highly consistent videos up to 40 frames long, significantly improving autonomous driving planning performance with minimal additional data.
Contribution
Delphi is a novel long video generation approach with enhanced spatial and temporal consistency, enabling effective data augmentation for end-to-end autonomous driving models.
Findings
Delphi generates videos up to 40 frames, five times longer than previous methods.
Using only 4% of training data, Delphi improves driving planning performance by 25%.
The method surpasses previous state-of-the-art in long video quality.
Abstract
Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and temporal inconsistencies are not negligible. To this end, we propose Delphi, a novel diffusion-based long video generation method with a shared noise modeling mechanism across the multi-views to increase spatial consistency, and a feature-aligned module to achieves both precise controllability and temporal consistency. Our method can generate up to 40 frames of video without loss of consistency which is about 5 times longer compared with state-of-the-art methods. Instead of randomly generating new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications
