HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
Zehuan Wu, Jingcheng Ni, Xiaodong Wang, Yuxin Guo, Rui Chen, Lewei Lu,, Jifeng Dai, Yuwen Xiong

TL;DR
HoloDrive is a novel framework that jointly generates 2D camera images and 3D LiDAR point clouds for autonomous driving, leveraging multi-modal data to improve generation quality.
Contribution
The paper introduces a multi-modal generative framework with BEV-to-Camera transforms and depth prediction, enabling joint 2D-3D scene generation for autonomous driving.
Findings
Significant performance improvements over SOTA in generation metrics.
Effective joint 2D-3D generation using multi-modal data.
Enhanced future scene prediction with temporal structure.
Abstract
Generative models have significantly improved the generation and prediction quality on either camera images or LiDAR point clouds for autonomous driving. However, a real-world autonomous driving system uses multiple kinds of input modality, usually cameras and LiDARs, where they contain complementary information for generation, while existing generation methods ignore this crucial feature, resulting in the generated results only covering separate 2D or 3D information. In order to fill the gap in 2D-3D multi-modal joint generation for autonomous driving, in this paper, we propose our framework, \emph{HoloDrive}, to jointly generate the camera images and LiDAR point clouds. We employ BEV-to-Camera and Camera-to-BEV transform modules between heterogeneous generative models, and introduce a depth prediction branch in the 2D generative model to disambiguate the un-projecting from image space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Remote Sensing and LiDAR Applications · 3D Modeling in Geospatial Applications
