TL;DR
VR-Drive is an end-to-end autonomous driving framework that enhances viewpoint robustness by integrating 3D scene reconstruction, view synthesis, and temporal memory, enabling better planning under diverse camera angles.
Contribution
It introduces a novel feed-forward approach with a viewpoint-mixed memory bank and knowledge distillation for improved viewpoint generalization in E2E autonomous driving.
Findings
Improves planning accuracy under viewpoint shifts
Supports online training with sparse views without extra annotations
Outperforms existing methods on a new viewpoint robustness benchmark
Abstract
End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic, data-driven framework. However, achieving robustness to varying camera viewpoints, a common real-world challenge due to diverse vehicle configurations, remains an open problem. In this work, we propose VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further improve viewpoint consistency, we introduce a viewpoint-mixed memory bank that facilitates temporal interaction across multiple viewpoints and a viewpoint-consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
