Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations
Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska

TL;DR
This paper investigates how self-supervised visual representations can improve zero-shot cross-city generalization in end-to-end autonomous driving, revealing that self-supervised pretraining significantly reduces transfer gaps across different cities and driving conditions.
Contribution
It demonstrates that self-supervised visual representations enhance zero-shot cross-city transfer in autonomous driving, outperforming traditional supervised backbones in diverse geographic settings.
Findings
Self-supervised pretraining reduces transfer gap inflation from 9.77x to 1.20x in open-loop evaluations.
Self-supervised models improve closed-loop PDMS by up to 4% across different cities.
Traditional supervised backbones show severe performance drops when transferring between cities with different driving conventions.
Abstract
End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real domain shifts when generalizing to new locations. In this work we investigate zero-shot cross-city generalization in end-to-end trajectory planning and ask whether self-supervised visual representations improve transfer across cities. We conduct a comprehensive study by integrating self-supervised backbones (I-JEPA, DINOv2, and MAE) into planning frameworks. We evaluate performance under strict geographic splits on nuScenes in the open-loop setting and on NAVSIM in the closed-loop evaluation protocol. Our experiments reveal a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Traffic Prediction and Management Techniques
