Veila: Panoramic LiDAR Generation from a Monocular RGB Image
Youquan Liu, Lingdong Kong, Weidong Yang, Ao Liang, Jianxiong Gao, Yang Wu, Xiang Xu, Xin Li, Linfeng Li, Runnan Chen, Ben Fei

TL;DR
Veila is a novel diffusion-based framework that generates realistic panoramic LiDAR data from monocular RGB images, addressing challenges in spatial control, alignment, and structural coherence for autonomous driving applications.
Contribution
It introduces a new conditional diffusion model with adaptive conditioning, robust cross-modal alignment, and global structural coherence for RGB-to-LiDAR data synthesis.
Findings
Achieves state-of-the-art fidelity and consistency in LiDAR generation
Improves downstream LiDAR semantic segmentation performance
Demonstrates effectiveness on multiple benchmark datasets
Abstract
Realistic and controllable panoramic LiDAR data generation is critical for scalable 3D perception in autonomous driving and robotics. Existing methods either perform unconditional generation with poor controllability or adopt text-guided synthesis, which lacks fine-grained spatial control. Leveraging a monocular RGB image as a spatial control signal offers a scalable and low-cost alternative, which remains an open problem. However, it faces three core challenges: (i) semantic and depth cues from RGB are vary spatially, complicating reliable conditioning generation; (ii) modality gaps between RGB appearance and LiDAR geometry amplify alignment errors under noisy diffusion; and (iii) maintaining structural coherence between monocular RGB and panoramic LiDAR is challenging, particularly in non-overlap regions between images and LiDAR. To address these challenges, we propose Veila, a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
