TL;DR
This paper introduces a self-supervised multimodal NeRF framework for autonomous driving that effectively models static and dynamic scenes using LiDAR and camera data, achieving superior performance without requiring 3D labels.
Contribution
The proposed framework is the first to combine self-supervised learning with multimodal NeRFs for dynamic autonomous driving scenes, improving efficiency and accuracy.
Findings
Outperforms baseline models on KITTI-360 dataset
Eliminates need for 3D labels in training
Achieves faster convergence with heuristic sampling
Abstract
In this paper, we propose a Neural Radiance Fields (NeRF) based framework, referred to as Novel View Synthesis Framework (NVSF). It jointly learns the implicit neural representation of space and time-varying scene for both LiDAR and Camera. We test this on a real-world autonomous driving scenario containing both static and dynamic scenes. Compared to existing multimodal dynamic NeRFs, our framework is self-supervised, thus eliminating the need for 3D labels. For efficient training and faster convergence, we introduce heuristic-based image pixel sampling to focus on pixels with rich information. To preserve the local features of LiDAR points, a Double Gradient based mask is employed. Extensive experiments on the KITTI-360 dataset show that, compared to the baseline models, our framework has reported best performance on both LiDAR and Camera domain. Code of the model is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
