Towards Viewpoint-Robust End-to-End Autonomous Driving with 3D Foundation Model Priors
Hiroki Hashimoto, Hiromichi Goto, Hiroyuki Sugai, Hiroshi Kera, Kazuhiko Kawamoto

TL;DR
This paper proposes a viewpoint-robust autonomous driving method using 3D foundation model priors, improving trajectory planning under camera viewpoint changes without data augmentation.
Contribution
It introduces a geometric prior-based approach that enhances robustness to camera viewpoint variations in end-to-end autonomous driving models.
Findings
Reduced performance degradation under viewpoint perturbations.
Significant improvements under pitch and height perturbations.
Smaller gains under longitudinal translation, indicating need for further viewpoint-agnostic methods.
Abstract
Robust trajectory planning under camera viewpoint changes is important for scalable end-to-end autonomous driving. However, existing models often depend heavily on the camera viewpoints seen during training. We investigate an augmentation-free approach that leverages geometric priors from a 3D foundation model. The method injects per-pixel 3D positions derived from depth estimates as positional embeddings and fuses intermediate geometric features through cross-attention. Experiments on the VR-Drive camera viewpoint perturbation benchmark show reduced performance degradation under most perturbation conditions, with clear improvements under pitch and height perturbations. Gains under longitudinal translation are smaller, suggesting that more viewpoint-agnostic integration is needed for robustness to camera viewpoint changes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
